Model Compression Quantization Mixture of Experts Edge AI Energy Efficiency On-device Inference ARM Inference Model Surgery Embedded Systems Autonomous Agents Fine-tuning MLOps