Steve-adpatkey commited on
Commit
aef7b02
·
verified ·
1 Parent(s): 755f80d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +184 -77
README.md CHANGED
@@ -1,17 +1,163 @@
1
- # AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ## Overview
4
 
5
- telecom-1.35M-v2 is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.
6
 
7
- This model achieved a **79.3% benchmark score** — a 10% improvement over baseline — while using conservative anti-forgetting training strategies to preserve general capabilities.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  ## What We Did
10
 
11
  - **Goal**: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
12
  - **Approach**: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
13
  - **Dataset**: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
14
- - **Base model**: NVIDIA Nemotron-3-Nano-30B (Megatron format)
15
 
16
  ## Training Data
17
 
@@ -26,8 +172,6 @@ This model achieved a **79.3% benchmark score** — a 10% improvement over basel
26
 
27
  ### Domain Coverage
28
 
29
- The dataset includes comprehensive coverage of:
30
-
31
  - **Network Traces & Anomaly Detection**: 5G trace analysis, KPI statistics, anomaly classification
32
  - **Network Slicing**: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
33
  - **Network Function Configuration**: Open5GS YAML generation, AMF/SMF/UPF configuration
@@ -37,7 +181,6 @@ The dataset includes comprehensive coverage of:
37
 
38
  ### Data Format
39
 
40
- Each example follows the input/output format:
41
  ```json
42
  {
43
  "input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
@@ -51,7 +194,7 @@ Each example follows the input/output format:
51
 
52
  | Parameter | Value | Notes |
53
  |---|---|---|
54
- | LoRA dim | 64 | Adapter capacity |
55
  | LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
56
  | LoRA dropout | 0.1 | Regularization to prevent overfitting |
57
  | Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |
@@ -60,7 +203,7 @@ Each example follows the input/output format:
60
 
61
  | Parameter | Value | Notes |
62
  |---|---|---|
63
- | Base model | Nemotron-3-Nano-30B (Megatron) | |
64
  | Training iterations | 10,500 | ~1.03 epochs |
65
  | Learning rate | 5e-5 | Conservative to prevent forgetting |
66
  | LR warmup | 525 steps | 5% of total iterations |
@@ -69,10 +212,19 @@ Each example follows the input/output format:
69
  | Micro batch size | 4 | Per GPU |
70
  | Gradient accumulation | 8 steps | |
71
  | Max sequence length | 2,048 | |
72
- | Precision | bf16 | |
73
  | Checkpoint interval | 1,000 steps | |
74
 
75
- ### Parallelism (4x H100 NVL)
 
 
 
 
 
 
 
 
 
76
 
77
  | Parameter | Value |
78
  |---|---|
@@ -81,14 +233,6 @@ Each example follows the input/output format:
81
  | Pipeline parallel | 1 |
82
  | MoE token dispatcher | alltoall |
83
 
84
- ### Infrastructure
85
-
86
- - **Hardware**: 4x NVIDIA H100 NVL 94GB (NVLink connected)
87
- - **Framework**: NeMo/Megatron-Bridge with custom LoRA wrapper
88
- - **Container**: `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano`
89
- - **Training time**: ~3.5 days (~84 hours)
90
- - **Shared memory**: 256GB
91
-
92
  ## Training Progress
93
 
94
  | Checkpoint | Train Loss | Val Loss | Val PPL |
@@ -101,55 +245,34 @@ Each example follows the input/output format:
101
  | iter 3000 | 0.391 | 0.108 | 1.114 |
102
  | **iter 10500 (final)** | **0.356** | **0.150** | **1.162** |
103
 
104
- ## Comparison to Previous Versions
105
 
106
  | Version | Dataset Size | Val Loss | Val PPL | Benchmark |
107
  |---|---|---|---|---|
108
- | telecom-1.27M | 1,240,185 | 0.379 | 1.46 | 69.3% |
109
- | **telecom-1.35M-v2** | **1,303,277** | **0.150** | **1.162** | **79.3%** |
110
 
111
- ### Key Improvements in v2
112
 
113
- - Augmented network slicing examples to address weak performance
114
  - Enhanced network function configuration coverage
115
  - Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
116
- - 10% absolute improvement on benchmark
117
 
118
  ## Post-Training Pipeline
119
 
120
- 1. **LoRA Merge**: Combined adapter weights with base model
121
- 2. **HuggingFace Export**: Converted Megatron checkpoint to HF format
122
- 3. **vLLM Deployment**: Served via vLLM with tensor parallelism
123
-
124
  ```bash
125
  # Merge LoRA weights
126
  torchrun --nproc-per-node=4 \
127
  /opt/Megatron-Bridge/examples/peft/merge_lora.py \
128
- --lora-checkpoint /models/telecom-1.35M-v2-lora/iter_0010500 \
129
  --hf-model-path /models/nemotron-30b \
130
- --output /models/telecom-1.35M-v2-merged
131
 
132
  # Export to HuggingFace format
133
  python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
134
  --hf-model /models/nemotron-30b \
135
- --megatron-path /models/telecom-1.35M-v2-merged \
136
- --hf-path /models/telecom-1.35M-v2-hf-export
137
- ```
138
-
139
- ## Repository Structure
140
-
141
- ```
142
- ├── models/telecom-1.35M-v2-hf-export/ # HF model weights
143
- ├── training_data/
144
- │ ├── train.jsonl # 1,303,277 training examples
145
- │ ├── validation.jsonl # 5,000 validation examples
146
- │ └── test.jsonl # 5,000 test examples
147
- ├── configs/
148
- │ ├── telecom-1.35M-v2.yaml # Training configuration
149
- │ ├── train_telecom-1.35M-v2.sh # Launch script
150
- │ ├── finetune_teleyaml.py # Custom training script
151
- │ └── teleyaml.py # Data processor
152
- └── README.md
153
  ```
154
 
155
  ## Usage
@@ -160,12 +283,12 @@ python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
160
  from transformers import AutoModelForCausalLM, AutoTokenizer
161
 
162
  model = AutoModelForCausalLM.from_pretrained(
163
- "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
164
  trust_remote_code=True,
165
  torch_dtype="bfloat16",
166
  )
167
  tokenizer = AutoTokenizer.from_pretrained(
168
- "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
169
  trust_remote_code=True,
170
  )
171
 
@@ -184,7 +307,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
184
  from vllm import LLM, SamplingParams
185
 
186
  llm = LLM(
187
- model="AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
188
  trust_remote_code=True,
189
  tensor_parallel_size=1,
190
  gpu_memory_utilization=0.90,
@@ -198,9 +321,9 @@ outputs = llm.generate([prompt], sampling_params)
198
 
199
  ```yaml
200
  services:
201
- vllm-telecom:
202
  image: vllm/vllm-openai:latest
203
- container_name: vllm-telecom-1.35M-v2
204
  runtime: nvidia
205
  environment:
206
  - NVIDIA_VISIBLE_DEVICES=0
@@ -209,7 +332,7 @@ services:
209
  volumes:
210
  - /opt/models:/models:ro
211
  command: >
212
- --model /models/telecom-1.35M-v2-hf-export
213
  --trust-remote-code
214
  --max-model-len 8196
215
  --gpu-memory-utilization 0.90
@@ -217,42 +340,26 @@ services:
217
  restart: unless-stopped
218
  ```
219
 
220
- ## Evaluation
221
-
222
- Benchmarked via internal evaluation system across telecom domain tasks:
223
-
224
- - **Standards Q&A**: 3GPP, IETF protocol knowledge
225
- - **Network Traces**: Anomaly detection, KPI analysis, trend identification
226
- - **Configuration**: YAML generation, network function setup
227
- - **Troubleshooting**: Root cause analysis, diagnostic procedures
228
-
229
- **Overall Score: 79.3%**
230
-
231
  ## Lessons Learned
232
 
233
  1. **Anti-forgetting strategy works**: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
234
  2. **Data quality matters more than quantity**: Improving weak-area examples had more impact than adding more data
235
  3. **System prompt alignment**: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
236
- 4. **Mixed datasets**: Combining diverse telecom subcategories in training prevents narrow specialization
237
-
238
- ## Future Work
239
 
240
- - **Full SFT**: Bake domain knowledge permanently into base weights
241
- - **Task-specific LoRA adapters**: Specialized adapters for YAML generation, anomaly detection, etc.
242
- - **DPO refinement**: Preference optimization for response quality
243
 
244
  ## License
245
 
246
- See NVIDIA Nemotron-3-Nano-30B license terms.
247
 
248
  ## Citation
249
 
250
  ```bibtex
251
- @misc{telecom-1.35M-v2,
252
- title={Telco-Nemotron-Nano-30B-Telecom-1.35M-v2},
253
  author={AdaptKey},
254
  year={2026},
255
  publisher={HuggingFace},
256
- url={https://huggingface.co/AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2}
257
  }
258
- ```
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: other
5
+ license_name: nvidia-open-model-license
6
+ license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
7
+ base_model: nvidia/Nemotron-3-Nano-30B-A3B
8
+ tags:
9
+ - telecommunications
10
+ - 3gpp
11
+ - o-ran
12
+ - ietf
13
+ - telecom
14
+ - peft
15
+ - lora
16
+ - nemotron
17
+ - mixture-of-experts
18
+ - gsma
19
+ - network-slicing
20
+ - anomaly-detection
21
+ - srsran
22
+ pipeline_tag: text-generation
23
+ library_name: transformers
24
+ model-index:
25
+ - name: AdaptKey-Nemotron-30b
26
+ results:
27
+ - task:
28
+ type: text-generation
29
+ name: Telecom Domain Benchmark
30
+ metrics:
31
+ - type: accuracy
32
+ value: 596
33
+ name: GSMA Open-Telco Composite Score (vs Baseline 538)
34
+ ---
35
+
36
+ # AdaptKey/AdaptKey-Nemotron-30b
37
 
38
  ## Overview
39
 
40
+ **AdaptKey-Nemotron-30b** is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.
41
 
42
+ This model achieved a **composite benchmark score of 596** — a **+58 point improvement (+10.8%)** over the NVIDIA Nemotron-3-Nano-30B-A3B baseline of 538 — while using conservative anti-forgetting training strategies to preserve general capabilities.
43
+
44
+ ## Benchmark Results
45
+
46
+ Evaluated via the **TeleFlow** evaluation system on 2/9/2026. See [Evaluation Methodology](#evaluation-methodology) below for full details on scoring.
47
+
48
+ | Model | TeLogs | TeleMath | TeleQnA | 3GPPTSG | TeleYaml | TeleTables | srsRAN | ORAN | **Total** |
49
+ |---|---|---|---|---|---|---|---|---|---|
50
+ | **Baseline** — NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | 48.8 | 66.4 | 86.1 | 44 | 62.5 | 61 | 85 | 84.1 | **538** |
51
+ | **AdaptKey-Nemotron-30b** (this model) | **61.6** | **74** | **88.2** | **48** | **79.3** | **72.8** | **86** | **86.4** | **596** |
52
+ | **Δ improvement** | +12.8 | +7.6 | +2.1 | +4.0 | +16.8 | +11.8 | +1.0 | +2.3 | **+58** |
53
+
54
+ ### Strongest Gains
55
+ - **TeleYaml** +16.8 pts (+26.9%) — structured YAML generation for network configs
56
+ - **TeLogs** +12.8 pts (+26.2%) — network log analysis and fault diagnosis
57
+ - **TeleTables** +11.8 pts (+19.3%) — tabular reasoning over network parameters
58
+
59
+ ---
60
+
61
+ ## Evaluation Methodology
62
+
63
+ ### Overview
64
+
65
+ Adaptkey uses a two-tier scoring system designed to minimize judge cost while maximizing evaluation accuracy:
66
+
67
+ 1. **Deterministic scoring** — applied first whenever the answer is objectively verifiable (exact-match multiple choice, numeric answers). Scores are 10 (correct) or 0 (incorrect). The LLM judge is skipped entirely for these cases, eliminating variance and cost.
68
+ 2. **LLM-as-a-Judge** — invoked for all remaining responses where deterministic checking cannot conclusively score quality.
69
+
70
+ ### Judge Model
71
+
72
+ | Property | Value |
73
+ |---|---|
74
+ | Model | `openai/gpt-oss-120b` |
75
+ | Temperature | 0.1 (near-deterministic for consistency) |
76
+ | Max output tokens | 300 |
77
+ | Output format | Structured JSON `{"score": <int>, "reasoning": "<str>"}` |
78
+
79
+ ### Scoring Rubrics
80
+
81
+ Two rubrics are applied depending on benchmark type:
82
+
83
+ #### Rubric A — Free-Text Technical Answers
84
+ *Applied to: TeleQnA, TeleMath, TeleLogs, TSG-3GPP*
85
+
86
+ The judge evaluates three criteria simultaneously:
87
+ - **Factual Accuracy** — Are the key technical facts correct?
88
+ - **Completeness** — Does the response cover the main points from the reference answer?
89
+ - **Correctness** — Are there any incorrect statements that would mislead an engineer?
90
+
91
+ | Score | Interpretation |
92
+ |---|---|
93
+ | 10 | All key facts present and correct |
94
+ | 7–9 | Mostly correct, minor omissions or imprecisions |
95
+ | 4–6 | Partially correct, some important errors or omissions |
96
+ | 1–3 | Mostly incorrect or very incomplete |
97
+ | 0 | Completely wrong, off-topic, or empty |
98
+
99
+ #### Rubric B — Structured Configuration Answers
100
+ *Applied to: TeleYaml, TeleTables*
101
+
102
+ The judge evaluates two weighted axes:
103
+ - **Structural Validity (40%)** — Is the output a valid configuration with correct syntax?
104
+ - **Content Accuracy (60%)** — Do field names and values match the expected configuration? Partial credit awarded proportionally based on ratio of correct fields to total fields.
105
+
106
+ | Score | Interpretation |
107
+ |---|---|
108
+ | 10 | Perfect match — all fields correct |
109
+ | 8–9 | Valid structure, 1–2 minor value differences |
110
+ | 5–7 | Valid structure, several wrong values or missing fields |
111
+ | 1–4 | Invalid structure or mostly wrong |
112
+ | 0 | Empty, completely wrong, or unparseable |
113
+
114
+ ### Judge Prompt Structure
115
+
116
+ Each judge invocation consists of two messages:
117
+
118
+ **System message:**
119
+ ```
120
+ You are a strict telecom evaluation judge. Score accurately based on the rubric.
121
+ Output ONLY the JSON object.
122
+ ```
123
+
124
+ **User message:**
125
+ ```
126
+ Question: {question}
127
+
128
+ Reference Answer: {reference_answer}
129
+
130
+ Model Response: {model_response}
131
+
132
+ Scoring Rubric:
133
+ {applicable_rubric}
134
+
135
+ Output JSON: {"score": <0-10>, "reasoning": "<brief explanation>"}
136
+ ```
137
+
138
+ ### Retry Policy
139
+
140
+ If the judge scores a response below a configurable threshold, the model is re-prompted up to **5 times**. The **best score across all attempts** is recorded. This measures the model's capability ceiling rather than single-shot performance, and is applied consistently across all models evaluated including the baseline.
141
+
142
+ ### Benchmark-to-Rubric Mapping
143
+
144
+ | Benchmark | Rubric | Deterministic Bypass |
145
+ |---|---|---|
146
+ | TeleQnA | A — Free-Text Technical | Where multiple-choice |
147
+ | TeleMath | A — Free-Text Technical | Numeric exact-match |
148
+ | TeleLogs | A — Free-Text Technical | Classification labels |
149
+ | TSG-3GPP | A — Free-Text Technical | Where multiple-choice |
150
+ | TeleYaml | B — Structured Configuration | N/A |
151
+ | TeleTables | B — Structured Configuration | N/A |
152
+ | srsRAN | A — Free-Text Technical | Where multiple-choice |
153
+ | ORAN | A — Free-Text Technical | Where multiple-choice |
154
 
155
  ## What We Did
156
 
157
  - **Goal**: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
158
  - **Approach**: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
159
  - **Dataset**: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
160
+ - **Base model**: NVIDIA Nemotron-3-Nano-30B-A3B (Megatron format)
161
 
162
  ## Training Data
163
 
 
172
 
173
  ### Domain Coverage
174
 
 
 
175
  - **Network Traces & Anomaly Detection**: 5G trace analysis, KPI statistics, anomaly classification
176
  - **Network Slicing**: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
177
  - **Network Function Configuration**: Open5GS YAML generation, AMF/SMF/UPF configuration
 
181
 
182
  ### Data Format
183
 
 
184
  ```json
185
  {
186
  "input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
 
194
 
195
  | Parameter | Value | Notes |
196
  |---|---|---|
197
+ | LoRA dim (rank) | 64 | Adapter capacity |
198
  | LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
199
  | LoRA dropout | 0.1 | Regularization to prevent overfitting |
200
  | Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |
 
203
 
204
  | Parameter | Value | Notes |
205
  |---|---|---|
206
+ | Base model | Nemotron-3-Nano-30B-A3B (Megatron) | |
207
  | Training iterations | 10,500 | ~1.03 epochs |
208
  | Learning rate | 5e-5 | Conservative to prevent forgetting |
209
  | LR warmup | 525 steps | 5% of total iterations |
 
212
  | Micro batch size | 4 | Per GPU |
213
  | Gradient accumulation | 8 steps | |
214
  | Max sequence length | 2,048 | |
215
+ | Precision | BF16 | |
216
  | Checkpoint interval | 1,000 steps | |
217
 
218
+ ### Infrastructure
219
+
220
+ | Property | Value |
221
+ |---|---|
222
+ | Hardware | 4x NVIDIA H100 NVL 94GB (NVLink connected) |
223
+ | Framework | NeMo/Megatron-Bridge with custom LoRA wrapper |
224
+ | Container | `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano` |
225
+ | Training time | ~3.5 days (~84 hours) |
226
+
227
+ ### Parallelism
228
 
229
  | Parameter | Value |
230
  |---|---|
 
233
  | Pipeline parallel | 1 |
234
  | MoE token dispatcher | alltoall |
235
 
 
 
 
 
 
 
 
 
236
  ## Training Progress
237
 
238
  | Checkpoint | Train Loss | Val Loss | Val PPL |
 
245
  | iter 3000 | 0.391 | 0.108 | 1.114 |
246
  | **iter 10500 (final)** | **0.356** | **0.150** | **1.162** |
247
 
248
+ ## Version History
249
 
250
  | Version | Dataset Size | Val Loss | Val PPL | Benchmark |
251
  |---|---|---|---|---|
252
+ | **AdaptKey-Nemotron-30b** (this model) | **1,303,277** | **0.150** | **1.162** | **596 composite** |
 
253
 
254
+ ### Key Improvements in This Version
255
 
256
+ - Augmented network slicing examples to address weak benchmark performance
257
  - Enhanced network function configuration coverage
258
  - Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
259
+ - +10.8% absolute improvement on composite benchmark over NVIDIA baseline
260
 
261
  ## Post-Training Pipeline
262
 
 
 
 
 
263
  ```bash
264
  # Merge LoRA weights
265
  torchrun --nproc-per-node=4 \
266
  /opt/Megatron-Bridge/examples/peft/merge_lora.py \
267
+ --lora-checkpoint /models/AdaptKey-Nemotron-30b-lora/iter_0010500 \
268
  --hf-model-path /models/nemotron-30b \
269
+ --output /models/AdaptKey-Nemotron-30b-merged
270
 
271
  # Export to HuggingFace format
272
  python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
273
  --hf-model /models/nemotron-30b \
274
+ --megatron-path /models/AdaptKey-Nemotron-30b-merged \
275
+ --hf-path /models/AdaptKey-Nemotron-30b-hf-export
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
276
  ```
277
 
278
  ## Usage
 
283
  from transformers import AutoModelForCausalLM, AutoTokenizer
284
 
285
  model = AutoModelForCausalLM.from_pretrained(
286
+ "AdaptKey/AdaptKey-Nemotron-30b",
287
  trust_remote_code=True,
288
  torch_dtype="bfloat16",
289
  )
290
  tokenizer = AutoTokenizer.from_pretrained(
291
+ "AdaptKey/AdaptKey-Nemotron-30b",
292
  trust_remote_code=True,
293
  )
294
 
 
307
  from vllm import LLM, SamplingParams
308
 
309
  llm = LLM(
310
+ model="AdaptKey/AdaptKey-Nemotron-30b",
311
  trust_remote_code=True,
312
  tensor_parallel_size=1,
313
  gpu_memory_utilization=0.90,
 
321
 
322
  ```yaml
323
  services:
324
+ vllm-adaptkey:
325
  image: vllm/vllm-openai:latest
326
+ container_name: vllm-adaptkey-nemotron-30b
327
  runtime: nvidia
328
  environment:
329
  - NVIDIA_VISIBLE_DEVICES=0
 
332
  volumes:
333
  - /opt/models:/models:ro
334
  command: >
335
+ --model /models/AdaptKey-Nemotron-30b
336
  --trust-remote-code
337
  --max-model-len 8196
338
  --gpu-memory-utilization 0.90
 
340
  restart: unless-stopped
341
  ```
342
 
 
 
 
 
 
 
 
 
 
 
 
343
  ## Lessons Learned
344
 
345
  1. **Anti-forgetting strategy works**: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
346
  2. **Data quality matters more than quantity**: Improving weak-area examples had more impact than adding more data
347
  3. **System prompt alignment**: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
348
+ 4. **Mixed datasets**: Combining diverse telecom subcategories prevents narrow specialization
 
 
349
 
 
 
 
350
 
351
  ## License
352
 
353
+ This model is derived from NVIDIA's Nemotron-3-Nano-30B and is subject to the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Please review the license terms before use in commercial applications.
354
 
355
  ## Citation
356
 
357
  ```bibtex
358
+ @misc{adaptkey_nemotron_30b_2026,
359
+ title={AdaptKey-Nemotron-30b: A Telecom-Specialized Language Model},
360
  author={AdaptKey},
361
  year={2026},
362
  publisher={HuggingFace},
363
+ url={https://huggingface.co/AdaptKey/AdaptKey-Nemotron-30b}
364
  }
365
+ ```