eduard76 commited on
Commit
8b3ba8b
·
verified ·
1 Parent(s): b48b197

Add model card with training pipeline description

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: microsoft/phi-4
4
+ tags:
5
+ - network-engineering
6
+ - cisco
7
+ - grpo
8
+ - orpo
9
+ - phi-4
10
+ pipeline_tag: text-generation
11
+ ---
12
+
13
+ # Phi-4 Network Architect v2
14
+
15
+ Fine-tuned [microsoft/phi-4](https://huggingface.co/microsoft/phi-4) (14B) for enterprise network engineering: OSPF/BGP troubleshooting, ACL design, Cisco IOS configuration, and CCDE/CCIE-level reasoning.
16
+
17
+ ## Training Pipeline
18
+
19
+ Three-stage pipeline on AWS EC2 g5.2xlarge (NVIDIA A10G 24GB) using [Unsloth](https://github.com/unslothai/unsloth) + TRL 0.24.
20
+
21
+ ### Stage 1 - SFT (Supervised Fine-Tuning)
22
+
23
+ Teaches the model what to say - protocol knowledge, IOS syntax, troubleshooting patterns.
24
+
25
+ | Param | Value |
26
+ |-------|-------|
27
+ | Dataset | 7,200 network engineering examples |
28
+ | Epochs | 2 |
29
+ | LoRA rank / alpha | 32 / 32 |
30
+ | Learning rate | 5e-5 |
31
+ | Effective batch | 16 |
32
+ | Precision | bfloat16 + 4-bit NF4 |
33
+ | Final loss | 0.2308 |
34
+
35
+ ### Stage 2 - GRPO (Group Relative Policy Optimization)
36
+
37
+ Inspired by DeepSeek-R1. Teaches the model how to reason by generating 4 rollouts per prompt, scoring them with reward functions (factual accuracy, exact value matching, format compliance), and learning to prefer the best answers.
38
+
39
+ | Param | Value |
40
+ |-------|-------|
41
+ | Base | Stage 1 merged 16-bit |
42
+ | Steps | 2,400 |
43
+ | Rollouts per prompt | 4 |
44
+ | Max completion | 256 tokens |
45
+ | KL beta | 0.1 |
46
+ | Final loss | 0.001955 |
47
+
48
+ ### Stage 3 - ORPO (Odds Ratio Preference Optimization)
49
+
50
+ Teaches the model what not to say. Trains on (prompt, chosen, rejected) triples where rejected responses are model-generated hallucinations. Penalizes wrong answers via odds-ratio loss - no separate reference model needed, fits on a single GPU.
51
+
52
+ | Param | Value |
53
+ |-------|-------|
54
+ | Base | Stage 1 merged 16-bit |
55
+ | Epochs | 1 |
56
+ | LoRA rank / alpha | 16 / 32 |
57
+ | Learning rate | 5e-6 |
58
+ | Beta | 0.1 |
59
+
60
+ Suppresses fabricated IOS commands, wrong subnet math, and nonexistent BGP attributes.
61
+
62
+ ## Intended Uses
63
+
64
+ - Network fault diagnosis and root cause analysis
65
+ - Cisco IOS/IOS-XE configuration generation
66
+ - BGP/OSPF/EIGRP design recommendations
67
+ - ACL and security policy review
68
+ - CCDE/CCIE level architecture Q&A
69
+ - Agentic NetOps pipelines (ACP/A2A/MCP protocols)
70
+
71
+ ## Limitations
72
+
73
+ - Optimized for Cisco IOS/IOS-XE; other vendors have limited coverage
74
+ - Verify configurations against current vendor documentation before production deployment
75
+ - Not a substitute for lab testing