| --- |
| license: mit |
| base_model: microsoft/phi-4 |
| tags: |
| - network-engineering |
| - cisco |
| - grpo |
| - orpo |
| - phi-4 |
| pipeline_tag: text-generation |
| --- |
| |
| # Phi-4 Network Architect v2 |
|
|
| Fine-tuned [microsoft/phi-4](https://huggingface.co/microsoft/phi-4) (14B) for enterprise network engineering: OSPF/BGP troubleshooting, ACL design, Cisco IOS configuration, and CCDE/CCIE-level reasoning. |
|
|
| ## Training Pipeline |
|
|
| Three-stage pipeline on AWS EC2 g5.2xlarge (NVIDIA A10G 24GB) using [Unsloth](https://github.com/unslothai/unsloth) + TRL 0.24. |
|
|
| ### Stage 1 - SFT (Supervised Fine-Tuning) |
|
|
| Teaches the model what to say - protocol knowledge, IOS syntax, troubleshooting patterns. |
|
|
| | Param | Value | |
| |-------|-------| |
| | Dataset | 7,200 network engineering examples | |
| | Epochs | 2 | |
| | LoRA rank / alpha | 32 / 32 | |
| | Learning rate | 5e-5 | |
| | Effective batch | 16 | |
| | Precision | bfloat16 + 4-bit NF4 | |
| | Final loss | 0.2308 | |
|
|
| ### Stage 2 - GRPO (Group Relative Policy Optimization) |
|
|
| Inspired by DeepSeek-R1. Teaches the model how to reason by generating 4 rollouts per prompt, scoring them with reward functions (factual accuracy, exact value matching, format compliance), and learning to prefer the best answers. |
|
|
| | Param | Value | |
| |-------|-------| |
| | Base | Stage 1 merged 16-bit | |
| | Steps | 2,400 | |
| | Rollouts per prompt | 4 | |
| | Max completion | 256 tokens | |
| | KL beta | 0.1 | |
| | Final loss | 0.001955 | |
|
|
| ### Stage 3 - ORPO (Odds Ratio Preference Optimization) |
|
|
| Teaches the model what not to say. Trains on (prompt, chosen, rejected) triples where rejected responses are model-generated hallucinations. Penalizes wrong answers via odds-ratio loss - no separate reference model needed, fits on a single GPU. |
|
|
| | Param | Value | |
| |-------|-------| |
| | Base | Stage 1 merged 16-bit | |
| | Epochs | 1 | |
| | LoRA rank / alpha | 16 / 32 | |
| | Learning rate | 5e-6 | |
| | Beta | 0.1 | |
|
|
| Suppresses fabricated IOS commands, wrong subnet math, and nonexistent BGP attributes. |
|
|
| ## Intended Uses |
|
|
| - Network fault diagnosis and root cause analysis |
| - Cisco IOS/IOS-XE configuration generation |
| - BGP/OSPF/EIGRP design recommendations |
| - ACL and security policy review |
| - CCDE/CCIE level architecture Q&A |
| - Agentic NetOps pipelines (ACP/A2A/MCP protocols) |
|
|
| ## Limitations |
|
|
| - Optimized for Cisco IOS/IOS-XE; other vendors have limited coverage |
| - Verify configurations against current vendor documentation before production deployment |
| - Not a substitute for lab testing |
|
|