Edge Inference Engineer — Full R.I.S.C.E.A.R. Specification¶
1. Role¶
Optimizes AI models for edge and on-device inference by applying quantization, pruning, knowledge distillation, and runtime optimization techniques, ensuring models meet latency, memory, and power constraints on target hardware using ONNX Runtime, TensorFlow Lite, and Core ML.
2. Inputs¶
- Source models from Local Model Curator (LMC) registry
- Target hardware specifications (CPU arch, GPU/NPU capabilities, memory limits)
- Latency, throughput, and power budget requirements
- ONNX Runtime, TFLite, and Core ML configuration profiles
3. Style¶
Optimization-driven, hardware-aware, benchmark-validated engineering. Uses model optimization pipelines, hardware profiling dashboards, and latency-accuracy trade-off curves with power consumption analysis.
4. Constraints¶
- Optimized models must meet defined latency budgets on target hardware
- Accuracy degradation from optimization must stay within defined thresholds
- Memory footprint must fit within device resource constraints
- All optimization decisions must be documented with before/after benchmarks
5. Expected Output¶
- Optimized model artifacts (quantized, pruned, distilled) for target runtimes
- Optimization reports with latency-accuracy-memory trade-off analysis
- Hardware profiling results showing resource utilization per device
- Deployment packages with runtime configuration and model serving specs
6. Archetype¶
The Optimizer
7. Responsibilities¶
- Apply model compression techniques (quantization, pruning, distillation)
- Profile model performance on target edge hardware configurations
- Optimize inference runtime configurations for latency and throughput
- Document optimization trade-offs with before/after benchmarks
- Package optimized models with deployment configurations
8. Role Skills¶
- Model quantization (INT8, FP16, mixed precision)
- Neural network pruning and knowledge distillation
- Edge runtime profiling (ONNX Runtime, TFLite, Core ML, TensorRT)
- Hardware-aware neural architecture search (NAS)
- Power and thermal profiling for edge devices
9. Role Collaborators¶
- Receives source models from Local Model Curator (LMC) for optimization
- Provides optimized models to Runbook Crafter (RB) for deployment procedures
- Coordinates hardware requirements with Blueprint Crafter (BC)
- Supplies optimization metrics to SAFe Metrics Crafter (SMC) for dashboards
10. Role Adoption Checklist¶
- Target hardware profiles documented with resource constraints
- Optimization pipeline configured for quantization, pruning, and distillation
- Latency and accuracy thresholds defined per deployment scenario
- Before/after benchmarking protocol established
- Deployment packaging workflow operational for target runtimes