Rick-AdaptKey commited on
Commit
7b5059a
Β·
verified Β·
1 Parent(s): 9a2f8f5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +258 -0
README.md ADDED
@@ -0,0 +1,258 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2
2
+
3
+ ## Overview
4
+
5
+ telecom-1.35M-v2 is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.
6
+
7
+ This model achieved a **79.3% benchmark score** β€” a 10% improvement over baseline β€” while using conservative anti-forgetting training strategies to preserve general capabilities.
8
+
9
+ ## What We Did
10
+
11
+ - **Goal**: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
12
+ - **Approach**: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
13
+ - **Dataset**: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
14
+ - **Base model**: NVIDIA Nemotron-3-Nano-30B (Megatron format)
15
+
16
+ ## Training Data
17
+
18
+ ### Dataset Composition (~1.31M examples)
19
+
20
+ | Split | Examples |
21
+ |---|---|
22
+ | Train | 1,303,277 |
23
+ | Validation | 5,000 |
24
+ | Test | 5,000 |
25
+ | **Total** | **1,313,277** |
26
+
27
+ ### Domain Coverage
28
+
29
+ The dataset includes comprehensive coverage of:
30
+
31
+ - **Network Traces & Anomaly Detection**: 5G trace analysis, KPI statistics, anomaly classification
32
+ - **Network Slicing**: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
33
+ - **Network Function Configuration**: Open5GS YAML generation, AMF/SMF/UPF configuration
34
+ - **3GPP Standards Q&A**: Core network procedures, RAN protocols, signaling
35
+ - **Network Forecasting**: Trend analysis, traffic prediction
36
+ - **Troubleshooting**: Root cause analysis, diagnostic procedures
37
+
38
+ ### Data Format
39
+
40
+ Each example follows the input/output format:
41
+ ```json
42
+ {
43
+ "input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
44
+ "output": "[detailed answer with reasoning]"
45
+ }
46
+ ```
47
+
48
+ ## Training Details
49
+
50
+ ### LoRA Hyperparameters
51
+
52
+ | Parameter | Value | Notes |
53
+ |---|---|---|
54
+ | LoRA dim | 64 | Adapter capacity |
55
+ | LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
56
+ | LoRA dropout | 0.1 | Regularization to prevent overfitting |
57
+ | Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |
58
+
59
+ ### Training Configuration
60
+
61
+ | Parameter | Value | Notes |
62
+ |---|---|---|
63
+ | Base model | Nemotron-3-Nano-30B (Megatron) | |
64
+ | Training iterations | 10,500 | ~1.03 epochs |
65
+ | Learning rate | 5e-5 | Conservative to prevent forgetting |
66
+ | LR warmup | 525 steps | 5% of total iterations |
67
+ | LR decay | Cosine to 10,500 | |
68
+ | Global batch size | 128 | |
69
+ | Micro batch size | 4 | Per GPU |
70
+ | Gradient accumulation | 8 steps | |
71
+ | Max sequence length | 2,048 | |
72
+ | Precision | bf16 | |
73
+ | Checkpoint interval | 1,000 steps | |
74
+
75
+ ### Parallelism (4x H100 NVL)
76
+
77
+ | Parameter | Value |
78
+ |---|---|
79
+ | Expert parallel | 4 |
80
+ | Tensor parallel | 1 |
81
+ | Pipeline parallel | 1 |
82
+ | MoE token dispatcher | alltoall |
83
+
84
+ ### Infrastructure
85
+
86
+ - **Hardware**: 4x NVIDIA H100 NVL 94GB (NVLink connected)
87
+ - **Framework**: NeMo/Megatron-Bridge with custom LoRA wrapper
88
+ - **Container**: `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano`
89
+ - **Training time**: ~3.5 days (~84 hours)
90
+ - **Shared memory**: 256GB
91
+
92
+ ## Training Progress
93
+
94
+ | Checkpoint | Train Loss | Val Loss | Val PPL |
95
+ |---|---|---|---|
96
+ | iter 500 | 0.402 | 0.242 | 1.274 |
97
+ | iter 1000 | 0.367 | 0.145 | 1.156 |
98
+ | iter 1500 | 0.381 | 0.118 | 1.125 |
99
+ | iter 2000 | 0.432 | 0.130 | 1.139 |
100
+ | iter 2500 | 0.377 | 0.139 | 1.149 |
101
+ | iter 3000 | 0.391 | 0.108 | 1.114 |
102
+ | **iter 10500 (final)** | **0.356** | **0.150** | **1.162** |
103
+
104
+ ## Comparison to Previous Versions
105
+
106
+ | Version | Dataset Size | Val Loss | Val PPL | Benchmark |
107
+ |---|---|---|---|---|
108
+ | telecom-1.27M | 1,240,185 | 0.379 | 1.46 | 69.3% |
109
+ | **telecom-1.35M-v2** | **1,303,277** | **0.150** | **1.162** | **79.3%** |
110
+
111
+ ### Key Improvements in v2
112
+
113
+ - Augmented network slicing examples to address weak performance
114
+ - Enhanced network function configuration coverage
115
+ - Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
116
+ - 10% absolute improvement on benchmark
117
+
118
+ ## Post-Training Pipeline
119
+
120
+ 1. **LoRA Merge**: Combined adapter weights with base model
121
+ 2. **HuggingFace Export**: Converted Megatron checkpoint to HF format
122
+ 3. **vLLM Deployment**: Served via vLLM with tensor parallelism
123
+
124
+ ```bash
125
+ # Merge LoRA weights
126
+ torchrun --nproc-per-node=4 \
127
+ /opt/Megatron-Bridge/examples/peft/merge_lora.py \
128
+ --lora-checkpoint /models/telecom-1.35M-v2-lora/iter_0010500 \
129
+ --hf-model-path /models/nemotron-30b \
130
+ --output /models/telecom-1.35M-v2-merged
131
+
132
+ # Export to HuggingFace format
133
+ python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
134
+ --hf-model /models/nemotron-30b \
135
+ --megatron-path /models/telecom-1.35M-v2-merged \
136
+ --hf-path /models/telecom-1.35M-v2-hf-export
137
+ ```
138
+
139
+ ## Repository Structure
140
+
141
+ ```
142
+ β”œβ”€β”€ models/telecom-1.35M-v2-hf-export/ # HF model weights
143
+ β”œβ”€β”€ training_data/
144
+ β”‚ β”œβ”€β”€ train.jsonl # 1,303,277 training examples
145
+ β”‚ β”œβ”€β”€ validation.jsonl # 5,000 validation examples
146
+ β”‚ └── test.jsonl # 5,000 test examples
147
+ β”œβ”€β”€ configs/
148
+ β”‚ β”œβ”€β”€ telecom-1.35M-v2.yaml # Training configuration
149
+ β”‚ β”œβ”€β”€ train_telecom-1.35M-v2.sh # Launch script
150
+ β”‚ β”œβ”€β”€ finetune_teleyaml.py # Custom training script
151
+ β”‚ └── teleyaml.py # Data processor
152
+ └── README.md
153
+ ```
154
+
155
+ ## Usage
156
+
157
+ ### With Transformers
158
+
159
+ ```python
160
+ from transformers import AutoModelForCausalLM, AutoTokenizer
161
+
162
+ model = AutoModelForCausalLM.from_pretrained(
163
+ "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
164
+ trust_remote_code=True,
165
+ torch_dtype="bfloat16",
166
+ )
167
+ tokenizer = AutoTokenizer.from_pretrained(
168
+ "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
169
+ trust_remote_code=True,
170
+ )
171
+
172
+ prompt = """System: You are an expert telecommunications engineer. Answer questions accurately based on your knowledge of telecom standards (3GPP, IETF, ITU, TM Forum).
173
+
174
+ User: Explain the difference between eMBB, URLLC, and mMTC slice types in 5G network slicing."""
175
+
176
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
177
+ outputs = model.generate(**inputs, max_new_tokens=512)
178
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
179
+ ```
180
+
181
+ ### With vLLM
182
+
183
+ ```python
184
+ from vllm import LLM, SamplingParams
185
+
186
+ llm = LLM(
187
+ model="AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
188
+ trust_remote_code=True,
189
+ tensor_parallel_size=1,
190
+ gpu_memory_utilization=0.90,
191
+ )
192
+
193
+ sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
194
+ outputs = llm.generate([prompt], sampling_params)
195
+ ```
196
+
197
+ ### Docker Compose (vLLM Server)
198
+
199
+ ```yaml
200
+ services:
201
+ vllm-telecom:
202
+ image: vllm/vllm-openai:latest
203
+ container_name: vllm-telecom-1.35M-v2
204
+ runtime: nvidia
205
+ environment:
206
+ - NVIDIA_VISIBLE_DEVICES=0
207
+ ports:
208
+ - "8090:8000"
209
+ volumes:
210
+ - /opt/models:/models:ro
211
+ command: >
212
+ --model /models/telecom-1.35M-v2-hf-export
213
+ --trust-remote-code
214
+ --max-model-len 8196
215
+ --gpu-memory-utilization 0.90
216
+ --tensor-parallel-size 1
217
+ restart: unless-stopped
218
+ ```
219
+
220
+ ## Evaluation
221
+
222
+ Benchmarked via internal evaluation system across telecom domain tasks:
223
+
224
+ - **Standards Q&A**: 3GPP, IETF protocol knowledge
225
+ - **Network Traces**: Anomaly detection, KPI analysis, trend identification
226
+ - **Configuration**: YAML generation, network function setup
227
+ - **Troubleshooting**: Root cause analysis, diagnostic procedures
228
+
229
+ **Overall Score: 79.3%**
230
+
231
+ ## Lessons Learned
232
+
233
+ 1. **Anti-forgetting strategy works**: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
234
+ 2. **Data quality matters more than quantity**: Improving weak-area examples had more impact than adding more data
235
+ 3. **System prompt alignment**: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
236
+ 4. **Mixed datasets**: Combining diverse telecom subcategories in training prevents narrow specialization
237
+
238
+ ## Future Work
239
+
240
+ - **Full SFT**: Bake domain knowledge permanently into base weights
241
+ - **Task-specific LoRA adapters**: Specialized adapters for YAML generation, anomaly detection, etc.
242
+ - **DPO refinement**: Preference optimization for response quality
243
+
244
+ ## License
245
+
246
+ See NVIDIA Nemotron-3-Nano-30B license terms.
247
+
248
+ ## Citation
249
+
250
+ ```bibtex
251
+ @misc{telecom-1.35M-v2,
252
+ title={Telco-Nemotron-Nano-30B-Telecom-1.35M-v2},
253
+ author={AdaptKey},
254
+ year={2026},
255
+ publisher={HuggingFace},
256
+ url={https://huggingface.co/AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2}
257
+ }
258
+ ```