eduard76
/

phi4-network-architect

+---
+license: mit
+base_model: microsoft/phi-4
+tags:
+- network-engineering
+- cisco
+- grpo
+- orpo
+- phi-4
+pipeline_tag: text-generation
+---
+# Phi-4 Network Architect v2
+Fine-tuned [microsoft/phi-4](https://huggingface.co/microsoft/phi-4) (14B) for enterprise network engineering: OSPF/BGP troubleshooting, ACL design, Cisco IOS configuration, and CCDE/CCIE-level reasoning.
+## Training Pipeline
+Three-stage pipeline on AWS EC2 g5.2xlarge (NVIDIA A10G 24GB) using [Unsloth](https://github.com/unslothai/unsloth) + TRL 0.24.
+### Stage 1 - SFT (Supervised Fine-Tuning)
+Teaches the model what to say - protocol knowledge, IOS syntax, troubleshooting patterns.
+| Param | Value |
+|-------|-------|
+| Dataset | 7,200 network engineering examples |
+| Epochs | 2 |
+| LoRA rank / alpha | 32 / 32 |
+| Learning rate | 5e-5 |
+| Effective batch | 16 |
+| Precision | bfloat16 + 4-bit NF4 |
+| Final loss | 0.2308 |
+### Stage 2 - GRPO (Group Relative Policy Optimization)
+Inspired by DeepSeek-R1. Teaches the model how to reason by generating 4 rollouts per prompt, scoring them with reward functions (factual accuracy, exact value matching, format compliance), and learning to prefer the best answers.
+| Param | Value |
+|-------|-------|
+| Base | Stage 1 merged 16-bit |
+| Steps | 2,400 |
+| Rollouts per prompt | 4 |
+| Max completion | 256 tokens |
+| KL beta | 0.1 |
+| Final loss | 0.001955 |
+### Stage 3 - ORPO (Odds Ratio Preference Optimization)
+Teaches the model what not to say. Trains on (prompt, chosen, rejected) triples where rejected responses are model-generated hallucinations. Penalizes wrong answers via odds-ratio loss - no separate reference model needed, fits on a single GPU.
+| Param | Value |
+|-------|-------|
+| Base | Stage 1 merged 16-bit |
+| Epochs | 1 |
+| LoRA rank / alpha | 16 / 32 |
+| Learning rate | 5e-6 |
+| Beta | 0.1 |
+Suppresses fabricated IOS commands, wrong subnet math, and nonexistent BGP attributes.
+## Intended Uses
+- Network fault diagnosis and root cause analysis
+- Cisco IOS/IOS-XE configuration generation
+- BGP/OSPF/EIGRP design recommendations
+- ACL and security policy review
+- CCDE/CCIE level architecture Q&A
+- Agentic NetOps pipelines (ACP/A2A/MCP protocols)
+## Limitations
+- Optimized for Cisco IOS/IOS-XE; other vendors have limited coverage
+- Verify configurations against current vendor documentation before production deployment
+- Not a substitute for lab testing