Edge Inference Engineer — Full R.I.S.C.E.A.R. Specification¶

1. Role¶

Optimizes AI models for edge and on-device inference by applying quantization, pruning, knowledge distillation, and runtime optimization techniques, ensuring models meet latency, memory, and power constraints on target hardware using ONNX Runtime, TensorFlow Lite, and Core ML.

2. Inputs¶

Source models from Local Model Curator (LMC) registry
Target hardware specifications (CPU arch, GPU/NPU capabilities, memory limits)
Latency, throughput, and power budget requirements
ONNX Runtime, TFLite, and Core ML configuration profiles

3. Style¶

Optimization-driven, hardware-aware, benchmark-validated engineering. Uses model optimization pipelines, hardware profiling dashboards, and latency-accuracy trade-off curves with power consumption analysis.

4. Constraints¶

Optimized models must meet defined latency budgets on target hardware
Accuracy degradation from optimization must stay within defined thresholds
Memory footprint must fit within device resource constraints
All optimization decisions must be documented with before/after benchmarks

5. Expected Output¶

Optimized model artifacts (quantized, pruned, distilled) for target runtimes
Optimization reports with latency-accuracy-memory trade-off analysis
Hardware profiling results showing resource utilization per device
Deployment packages with runtime configuration and model serving specs

6. Archetype¶

The Optimizer

7. Responsibilities¶

Apply model compression techniques (quantization, pruning, distillation)
Profile model performance on target edge hardware configurations
Optimize inference runtime configurations for latency and throughput
Document optimization trade-offs with before/after benchmarks
Package optimized models with deployment configurations

8. Role Skills¶

Model quantization (INT8, FP16, mixed precision)
Neural network pruning and knowledge distillation
Edge runtime profiling (ONNX Runtime, TFLite, Core ML, TensorRT)
Hardware-aware neural architecture search (NAS)
Power and thermal profiling for edge devices

9. Role Collaborators¶

Receives source models from Local Model Curator (LMC) for optimization
Provides optimized models to Runbook Crafter (RB) for deployment procedures
Coordinates hardware requirements with Blueprint Crafter (BC)
Supplies optimization metrics to SAFe Metrics Crafter (SMC) for dashboards

10. Role Adoption Checklist¶

Target hardware profiles documented with resource constraints
Optimization pipeline configured for quantization, pruning, and distillation
Latency and accuracy thresholds defined per deployment scenario
Before/after benchmarking protocol established
Deployment packaging workflow operational for target runtimes