Update README.md
Browse files
README.md
CHANGED
|
@@ -1,17 +1,163 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
## Overview
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
This model achieved a **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
## What We Did
|
| 10 |
|
| 11 |
- **Goal**: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
|
| 12 |
- **Approach**: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
|
| 13 |
- **Dataset**: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
|
| 14 |
-
- **Base model**: NVIDIA Nemotron-3-Nano-30B (Megatron format)
|
| 15 |
|
| 16 |
## Training Data
|
| 17 |
|
|
@@ -26,8 +172,6 @@ This model achieved a **79.3% benchmark score** — a 10% improvement over basel
|
|
| 26 |
|
| 27 |
### Domain Coverage
|
| 28 |
|
| 29 |
-
The dataset includes comprehensive coverage of:
|
| 30 |
-
|
| 31 |
- **Network Traces & Anomaly Detection**: 5G trace analysis, KPI statistics, anomaly classification
|
| 32 |
- **Network Slicing**: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
|
| 33 |
- **Network Function Configuration**: Open5GS YAML generation, AMF/SMF/UPF configuration
|
|
@@ -37,7 +181,6 @@ The dataset includes comprehensive coverage of:
|
|
| 37 |
|
| 38 |
### Data Format
|
| 39 |
|
| 40 |
-
Each example follows the input/output format:
|
| 41 |
```json
|
| 42 |
{
|
| 43 |
"input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
|
|
@@ -51,7 +194,7 @@ Each example follows the input/output format:
|
|
| 51 |
|
| 52 |
| Parameter | Value | Notes |
|
| 53 |
|---|---|---|
|
| 54 |
-
| LoRA dim | 64 | Adapter capacity |
|
| 55 |
| LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
|
| 56 |
| LoRA dropout | 0.1 | Regularization to prevent overfitting |
|
| 57 |
| Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |
|
|
@@ -60,7 +203,7 @@ Each example follows the input/output format:
|
|
| 60 |
|
| 61 |
| Parameter | Value | Notes |
|
| 62 |
|---|---|---|
|
| 63 |
-
| Base model | Nemotron-3-Nano-30B (Megatron) | |
|
| 64 |
| Training iterations | 10,500 | ~1.03 epochs |
|
| 65 |
| Learning rate | 5e-5 | Conservative to prevent forgetting |
|
| 66 |
| LR warmup | 525 steps | 5% of total iterations |
|
|
@@ -69,10 +212,19 @@ Each example follows the input/output format:
|
|
| 69 |
| Micro batch size | 4 | Per GPU |
|
| 70 |
| Gradient accumulation | 8 steps | |
|
| 71 |
| Max sequence length | 2,048 | |
|
| 72 |
-
| Precision |
|
| 73 |
| Checkpoint interval | 1,000 steps | |
|
| 74 |
|
| 75 |
-
###
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
| Parameter | Value |
|
| 78 |
|---|---|
|
|
@@ -81,14 +233,6 @@ Each example follows the input/output format:
|
|
| 81 |
| Pipeline parallel | 1 |
|
| 82 |
| MoE token dispatcher | alltoall |
|
| 83 |
|
| 84 |
-
### Infrastructure
|
| 85 |
-
|
| 86 |
-
- **Hardware**: 4x NVIDIA H100 NVL 94GB (NVLink connected)
|
| 87 |
-
- **Framework**: NeMo/Megatron-Bridge with custom LoRA wrapper
|
| 88 |
-
- **Container**: `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano`
|
| 89 |
-
- **Training time**: ~3.5 days (~84 hours)
|
| 90 |
-
- **Shared memory**: 256GB
|
| 91 |
-
|
| 92 |
## Training Progress
|
| 93 |
|
| 94 |
| Checkpoint | Train Loss | Val Loss | Val PPL |
|
|
@@ -101,55 +245,34 @@ Each example follows the input/output format:
|
|
| 101 |
| iter 3000 | 0.391 | 0.108 | 1.114 |
|
| 102 |
| **iter 10500 (final)** | **0.356** | **0.150** | **1.162** |
|
| 103 |
|
| 104 |
-
##
|
| 105 |
|
| 106 |
| Version | Dataset Size | Val Loss | Val PPL | Benchmark |
|
| 107 |
|---|---|---|---|---|
|
| 108 |
-
|
|
| 109 |
-
| **telecom-1.35M-v2** | **1,303,277** | **0.150** | **1.162** | **79.3%** |
|
| 110 |
|
| 111 |
-
### Key Improvements in
|
| 112 |
|
| 113 |
-
- Augmented network slicing examples to address weak performance
|
| 114 |
- Enhanced network function configuration coverage
|
| 115 |
- Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
|
| 116 |
-
- 10% absolute improvement on benchmark
|
| 117 |
|
| 118 |
## Post-Training Pipeline
|
| 119 |
|
| 120 |
-
1. **LoRA Merge**: Combined adapter weights with base model
|
| 121 |
-
2. **HuggingFace Export**: Converted Megatron checkpoint to HF format
|
| 122 |
-
3. **vLLM Deployment**: Served via vLLM with tensor parallelism
|
| 123 |
-
|
| 124 |
```bash
|
| 125 |
# Merge LoRA weights
|
| 126 |
torchrun --nproc-per-node=4 \
|
| 127 |
/opt/Megatron-Bridge/examples/peft/merge_lora.py \
|
| 128 |
-
--lora-checkpoint /models/
|
| 129 |
--hf-model-path /models/nemotron-30b \
|
| 130 |
-
--output /models/
|
| 131 |
|
| 132 |
# Export to HuggingFace format
|
| 133 |
python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
|
| 134 |
--hf-model /models/nemotron-30b \
|
| 135 |
-
--megatron-path /models/
|
| 136 |
-
--hf-path /models/
|
| 137 |
-
```
|
| 138 |
-
|
| 139 |
-
## Repository Structure
|
| 140 |
-
|
| 141 |
-
```
|
| 142 |
-
├── models/telecom-1.35M-v2-hf-export/ # HF model weights
|
| 143 |
-
├── training_data/
|
| 144 |
-
│ ├── train.jsonl # 1,303,277 training examples
|
| 145 |
-
│ ├── validation.jsonl # 5,000 validation examples
|
| 146 |
-
│ └── test.jsonl # 5,000 test examples
|
| 147 |
-
├── configs/
|
| 148 |
-
│ ├── telecom-1.35M-v2.yaml # Training configuration
|
| 149 |
-
│ ├── train_telecom-1.35M-v2.sh # Launch script
|
| 150 |
-
│ ├── finetune_teleyaml.py # Custom training script
|
| 151 |
-
│ └── teleyaml.py # Data processor
|
| 152 |
-
└── README.md
|
| 153 |
```
|
| 154 |
|
| 155 |
## Usage
|
|
@@ -160,12 +283,12 @@ python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
|
|
| 160 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 161 |
|
| 162 |
model = AutoModelForCausalLM.from_pretrained(
|
| 163 |
-
"AdaptKey/
|
| 164 |
trust_remote_code=True,
|
| 165 |
torch_dtype="bfloat16",
|
| 166 |
)
|
| 167 |
tokenizer = AutoTokenizer.from_pretrained(
|
| 168 |
-
"AdaptKey/
|
| 169 |
trust_remote_code=True,
|
| 170 |
)
|
| 171 |
|
|
@@ -184,7 +307,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 184 |
from vllm import LLM, SamplingParams
|
| 185 |
|
| 186 |
llm = LLM(
|
| 187 |
-
model="AdaptKey/
|
| 188 |
trust_remote_code=True,
|
| 189 |
tensor_parallel_size=1,
|
| 190 |
gpu_memory_utilization=0.90,
|
|
@@ -198,9 +321,9 @@ outputs = llm.generate([prompt], sampling_params)
|
|
| 198 |
|
| 199 |
```yaml
|
| 200 |
services:
|
| 201 |
-
vllm-
|
| 202 |
image: vllm/vllm-openai:latest
|
| 203 |
-
container_name: vllm-
|
| 204 |
runtime: nvidia
|
| 205 |
environment:
|
| 206 |
- NVIDIA_VISIBLE_DEVICES=0
|
|
@@ -209,7 +332,7 @@ services:
|
|
| 209 |
volumes:
|
| 210 |
- /opt/models:/models:ro
|
| 211 |
command: >
|
| 212 |
-
--model /models/
|
| 213 |
--trust-remote-code
|
| 214 |
--max-model-len 8196
|
| 215 |
--gpu-memory-utilization 0.90
|
|
@@ -217,42 +340,26 @@ services:
|
|
| 217 |
restart: unless-stopped
|
| 218 |
```
|
| 219 |
|
| 220 |
-
## Evaluation
|
| 221 |
-
|
| 222 |
-
Benchmarked via internal evaluation system across telecom domain tasks:
|
| 223 |
-
|
| 224 |
-
- **Standards Q&A**: 3GPP, IETF protocol knowledge
|
| 225 |
-
- **Network Traces**: Anomaly detection, KPI analysis, trend identification
|
| 226 |
-
- **Configuration**: YAML generation, network function setup
|
| 227 |
-
- **Troubleshooting**: Root cause analysis, diagnostic procedures
|
| 228 |
-
|
| 229 |
-
**Overall Score: 79.3%**
|
| 230 |
-
|
| 231 |
## Lessons Learned
|
| 232 |
|
| 233 |
1. **Anti-forgetting strategy works**: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
|
| 234 |
2. **Data quality matters more than quantity**: Improving weak-area examples had more impact than adding more data
|
| 235 |
3. **System prompt alignment**: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
|
| 236 |
-
4. **Mixed datasets**: Combining diverse telecom subcategories
|
| 237 |
-
|
| 238 |
-
## Future Work
|
| 239 |
|
| 240 |
-
- **Full SFT**: Bake domain knowledge permanently into base weights
|
| 241 |
-
- **Task-specific LoRA adapters**: Specialized adapters for YAML generation, anomaly detection, etc.
|
| 242 |
-
- **DPO refinement**: Preference optimization for response quality
|
| 243 |
|
| 244 |
## License
|
| 245 |
|
| 246 |
-
|
| 247 |
|
| 248 |
## Citation
|
| 249 |
|
| 250 |
```bibtex
|
| 251 |
-
@misc{
|
| 252 |
-
title={
|
| 253 |
author={AdaptKey},
|
| 254 |
year={2026},
|
| 255 |
publisher={HuggingFace},
|
| 256 |
-
url={https://huggingface.co/AdaptKey/
|
| 257 |
}
|
| 258 |
-
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: other
|
| 5 |
+
license_name: nvidia-open-model-license
|
| 6 |
+
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
|
| 7 |
+
base_model: nvidia/Nemotron-3-Nano-30B-A3B
|
| 8 |
+
tags:
|
| 9 |
+
- telecommunications
|
| 10 |
+
- 3gpp
|
| 11 |
+
- o-ran
|
| 12 |
+
- ietf
|
| 13 |
+
- telecom
|
| 14 |
+
- peft
|
| 15 |
+
- lora
|
| 16 |
+
- nemotron
|
| 17 |
+
- mixture-of-experts
|
| 18 |
+
- gsma
|
| 19 |
+
- network-slicing
|
| 20 |
+
- anomaly-detection
|
| 21 |
+
- srsran
|
| 22 |
+
pipeline_tag: text-generation
|
| 23 |
+
library_name: transformers
|
| 24 |
+
model-index:
|
| 25 |
+
- name: AdaptKey-Nemotron-30b
|
| 26 |
+
results:
|
| 27 |
+
- task:
|
| 28 |
+
type: text-generation
|
| 29 |
+
name: Telecom Domain Benchmark
|
| 30 |
+
metrics:
|
| 31 |
+
- type: accuracy
|
| 32 |
+
value: 596
|
| 33 |
+
name: GSMA Open-Telco Composite Score (vs Baseline 538)
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
# AdaptKey/AdaptKey-Nemotron-30b
|
| 37 |
|
| 38 |
## Overview
|
| 39 |
|
| 40 |
+
**AdaptKey-Nemotron-30b** is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.
|
| 41 |
|
| 42 |
+
This model achieved a **composite benchmark score of 596** — a **+58 point improvement (+10.8%)** over the NVIDIA Nemotron-3-Nano-30B-A3B baseline of 538 — while using conservative anti-forgetting training strategies to preserve general capabilities.
|
| 43 |
+
|
| 44 |
+
## Benchmark Results
|
| 45 |
+
|
| 46 |
+
Evaluated via the **TeleFlow** evaluation system on 2/9/2026. See [Evaluation Methodology](#evaluation-methodology) below for full details on scoring.
|
| 47 |
+
|
| 48 |
+
| Model | TeLogs | TeleMath | TeleQnA | 3GPPTSG | TeleYaml | TeleTables | srsRAN | ORAN | **Total** |
|
| 49 |
+
|---|---|---|---|---|---|---|---|---|---|
|
| 50 |
+
| **Baseline** — NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | 48.8 | 66.4 | 86.1 | 44 | 62.5 | 61 | 85 | 84.1 | **538** |
|
| 51 |
+
| **AdaptKey-Nemotron-30b** (this model) | **61.6** | **74** | **88.2** | **48** | **79.3** | **72.8** | **86** | **86.4** | **596** |
|
| 52 |
+
| **Δ improvement** | +12.8 | +7.6 | +2.1 | +4.0 | +16.8 | +11.8 | +1.0 | +2.3 | **+58** |
|
| 53 |
+
|
| 54 |
+
### Strongest Gains
|
| 55 |
+
- **TeleYaml** +16.8 pts (+26.9%) — structured YAML generation for network configs
|
| 56 |
+
- **TeLogs** +12.8 pts (+26.2%) — network log analysis and fault diagnosis
|
| 57 |
+
- **TeleTables** +11.8 pts (+19.3%) — tabular reasoning over network parameters
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## Evaluation Methodology
|
| 62 |
+
|
| 63 |
+
### Overview
|
| 64 |
+
|
| 65 |
+
Adaptkey uses a two-tier scoring system designed to minimize judge cost while maximizing evaluation accuracy:
|
| 66 |
+
|
| 67 |
+
1. **Deterministic scoring** — applied first whenever the answer is objectively verifiable (exact-match multiple choice, numeric answers). Scores are 10 (correct) or 0 (incorrect). The LLM judge is skipped entirely for these cases, eliminating variance and cost.
|
| 68 |
+
2. **LLM-as-a-Judge** — invoked for all remaining responses where deterministic checking cannot conclusively score quality.
|
| 69 |
+
|
| 70 |
+
### Judge Model
|
| 71 |
+
|
| 72 |
+
| Property | Value |
|
| 73 |
+
|---|---|
|
| 74 |
+
| Model | `openai/gpt-oss-120b` |
|
| 75 |
+
| Temperature | 0.1 (near-deterministic for consistency) |
|
| 76 |
+
| Max output tokens | 300 |
|
| 77 |
+
| Output format | Structured JSON `{"score": <int>, "reasoning": "<str>"}` |
|
| 78 |
+
|
| 79 |
+
### Scoring Rubrics
|
| 80 |
+
|
| 81 |
+
Two rubrics are applied depending on benchmark type:
|
| 82 |
+
|
| 83 |
+
#### Rubric A — Free-Text Technical Answers
|
| 84 |
+
*Applied to: TeleQnA, TeleMath, TeleLogs, TSG-3GPP*
|
| 85 |
+
|
| 86 |
+
The judge evaluates three criteria simultaneously:
|
| 87 |
+
- **Factual Accuracy** — Are the key technical facts correct?
|
| 88 |
+
- **Completeness** — Does the response cover the main points from the reference answer?
|
| 89 |
+
- **Correctness** — Are there any incorrect statements that would mislead an engineer?
|
| 90 |
+
|
| 91 |
+
| Score | Interpretation |
|
| 92 |
+
|---|---|
|
| 93 |
+
| 10 | All key facts present and correct |
|
| 94 |
+
| 7–9 | Mostly correct, minor omissions or imprecisions |
|
| 95 |
+
| 4–6 | Partially correct, some important errors or omissions |
|
| 96 |
+
| 1–3 | Mostly incorrect or very incomplete |
|
| 97 |
+
| 0 | Completely wrong, off-topic, or empty |
|
| 98 |
+
|
| 99 |
+
#### Rubric B — Structured Configuration Answers
|
| 100 |
+
*Applied to: TeleYaml, TeleTables*
|
| 101 |
+
|
| 102 |
+
The judge evaluates two weighted axes:
|
| 103 |
+
- **Structural Validity (40%)** — Is the output a valid configuration with correct syntax?
|
| 104 |
+
- **Content Accuracy (60%)** — Do field names and values match the expected configuration? Partial credit awarded proportionally based on ratio of correct fields to total fields.
|
| 105 |
+
|
| 106 |
+
| Score | Interpretation |
|
| 107 |
+
|---|---|
|
| 108 |
+
| 10 | Perfect match — all fields correct |
|
| 109 |
+
| 8–9 | Valid structure, 1–2 minor value differences |
|
| 110 |
+
| 5–7 | Valid structure, several wrong values or missing fields |
|
| 111 |
+
| 1–4 | Invalid structure or mostly wrong |
|
| 112 |
+
| 0 | Empty, completely wrong, or unparseable |
|
| 113 |
+
|
| 114 |
+
### Judge Prompt Structure
|
| 115 |
+
|
| 116 |
+
Each judge invocation consists of two messages:
|
| 117 |
+
|
| 118 |
+
**System message:**
|
| 119 |
+
```
|
| 120 |
+
You are a strict telecom evaluation judge. Score accurately based on the rubric.
|
| 121 |
+
Output ONLY the JSON object.
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
**User message:**
|
| 125 |
+
```
|
| 126 |
+
Question: {question}
|
| 127 |
+
|
| 128 |
+
Reference Answer: {reference_answer}
|
| 129 |
+
|
| 130 |
+
Model Response: {model_response}
|
| 131 |
+
|
| 132 |
+
Scoring Rubric:
|
| 133 |
+
{applicable_rubric}
|
| 134 |
+
|
| 135 |
+
Output JSON: {"score": <0-10>, "reasoning": "<brief explanation>"}
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
### Retry Policy
|
| 139 |
+
|
| 140 |
+
If the judge scores a response below a configurable threshold, the model is re-prompted up to **5 times**. The **best score across all attempts** is recorded. This measures the model's capability ceiling rather than single-shot performance, and is applied consistently across all models evaluated including the baseline.
|
| 141 |
+
|
| 142 |
+
### Benchmark-to-Rubric Mapping
|
| 143 |
+
|
| 144 |
+
| Benchmark | Rubric | Deterministic Bypass |
|
| 145 |
+
|---|---|---|
|
| 146 |
+
| TeleQnA | A — Free-Text Technical | Where multiple-choice |
|
| 147 |
+
| TeleMath | A — Free-Text Technical | Numeric exact-match |
|
| 148 |
+
| TeleLogs | A — Free-Text Technical | Classification labels |
|
| 149 |
+
| TSG-3GPP | A — Free-Text Technical | Where multiple-choice |
|
| 150 |
+
| TeleYaml | B — Structured Configuration | N/A |
|
| 151 |
+
| TeleTables | B — Structured Configuration | N/A |
|
| 152 |
+
| srsRAN | A — Free-Text Technical | Where multiple-choice |
|
| 153 |
+
| ORAN | A — Free-Text Technical | Where multiple-choice |
|
| 154 |
|
| 155 |
## What We Did
|
| 156 |
|
| 157 |
- **Goal**: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
|
| 158 |
- **Approach**: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
|
| 159 |
- **Dataset**: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
|
| 160 |
+
- **Base model**: NVIDIA Nemotron-3-Nano-30B-A3B (Megatron format)
|
| 161 |
|
| 162 |
## Training Data
|
| 163 |
|
|
|
|
| 172 |
|
| 173 |
### Domain Coverage
|
| 174 |
|
|
|
|
|
|
|
| 175 |
- **Network Traces & Anomaly Detection**: 5G trace analysis, KPI statistics, anomaly classification
|
| 176 |
- **Network Slicing**: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
|
| 177 |
- **Network Function Configuration**: Open5GS YAML generation, AMF/SMF/UPF configuration
|
|
|
|
| 181 |
|
| 182 |
### Data Format
|
| 183 |
|
|
|
|
| 184 |
```json
|
| 185 |
{
|
| 186 |
"input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
|
|
|
|
| 194 |
|
| 195 |
| Parameter | Value | Notes |
|
| 196 |
|---|---|---|
|
| 197 |
+
| LoRA dim (rank) | 64 | Adapter capacity |
|
| 198 |
| LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
|
| 199 |
| LoRA dropout | 0.1 | Regularization to prevent overfitting |
|
| 200 |
| Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |
|
|
|
|
| 203 |
|
| 204 |
| Parameter | Value | Notes |
|
| 205 |
|---|---|---|
|
| 206 |
+
| Base model | Nemotron-3-Nano-30B-A3B (Megatron) | |
|
| 207 |
| Training iterations | 10,500 | ~1.03 epochs |
|
| 208 |
| Learning rate | 5e-5 | Conservative to prevent forgetting |
|
| 209 |
| LR warmup | 525 steps | 5% of total iterations |
|
|
|
|
| 212 |
| Micro batch size | 4 | Per GPU |
|
| 213 |
| Gradient accumulation | 8 steps | |
|
| 214 |
| Max sequence length | 2,048 | |
|
| 215 |
+
| Precision | BF16 | |
|
| 216 |
| Checkpoint interval | 1,000 steps | |
|
| 217 |
|
| 218 |
+
### Infrastructure
|
| 219 |
+
|
| 220 |
+
| Property | Value |
|
| 221 |
+
|---|---|
|
| 222 |
+
| Hardware | 4x NVIDIA H100 NVL 94GB (NVLink connected) |
|
| 223 |
+
| Framework | NeMo/Megatron-Bridge with custom LoRA wrapper |
|
| 224 |
+
| Container | `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano` |
|
| 225 |
+
| Training time | ~3.5 days (~84 hours) |
|
| 226 |
+
|
| 227 |
+
### Parallelism
|
| 228 |
|
| 229 |
| Parameter | Value |
|
| 230 |
|---|---|
|
|
|
|
| 233 |
| Pipeline parallel | 1 |
|
| 234 |
| MoE token dispatcher | alltoall |
|
| 235 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 236 |
## Training Progress
|
| 237 |
|
| 238 |
| Checkpoint | Train Loss | Val Loss | Val PPL |
|
|
|
|
| 245 |
| iter 3000 | 0.391 | 0.108 | 1.114 |
|
| 246 |
| **iter 10500 (final)** | **0.356** | **0.150** | **1.162** |
|
| 247 |
|
| 248 |
+
## Version History
|
| 249 |
|
| 250 |
| Version | Dataset Size | Val Loss | Val PPL | Benchmark |
|
| 251 |
|---|---|---|---|---|
|
| 252 |
+
| **AdaptKey-Nemotron-30b** (this model) | **1,303,277** | **0.150** | **1.162** | **596 composite** |
|
|
|
|
| 253 |
|
| 254 |
+
### Key Improvements in This Version
|
| 255 |
|
| 256 |
+
- Augmented network slicing examples to address weak benchmark performance
|
| 257 |
- Enhanced network function configuration coverage
|
| 258 |
- Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
|
| 259 |
+
- +10.8% absolute improvement on composite benchmark over NVIDIA baseline
|
| 260 |
|
| 261 |
## Post-Training Pipeline
|
| 262 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 263 |
```bash
|
| 264 |
# Merge LoRA weights
|
| 265 |
torchrun --nproc-per-node=4 \
|
| 266 |
/opt/Megatron-Bridge/examples/peft/merge_lora.py \
|
| 267 |
+
--lora-checkpoint /models/AdaptKey-Nemotron-30b-lora/iter_0010500 \
|
| 268 |
--hf-model-path /models/nemotron-30b \
|
| 269 |
+
--output /models/AdaptKey-Nemotron-30b-merged
|
| 270 |
|
| 271 |
# Export to HuggingFace format
|
| 272 |
python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
|
| 273 |
--hf-model /models/nemotron-30b \
|
| 274 |
+
--megatron-path /models/AdaptKey-Nemotron-30b-merged \
|
| 275 |
+
--hf-path /models/AdaptKey-Nemotron-30b-hf-export
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 276 |
```
|
| 277 |
|
| 278 |
## Usage
|
|
|
|
| 283 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 284 |
|
| 285 |
model = AutoModelForCausalLM.from_pretrained(
|
| 286 |
+
"AdaptKey/AdaptKey-Nemotron-30b",
|
| 287 |
trust_remote_code=True,
|
| 288 |
torch_dtype="bfloat16",
|
| 289 |
)
|
| 290 |
tokenizer = AutoTokenizer.from_pretrained(
|
| 291 |
+
"AdaptKey/AdaptKey-Nemotron-30b",
|
| 292 |
trust_remote_code=True,
|
| 293 |
)
|
| 294 |
|
|
|
|
| 307 |
from vllm import LLM, SamplingParams
|
| 308 |
|
| 309 |
llm = LLM(
|
| 310 |
+
model="AdaptKey/AdaptKey-Nemotron-30b",
|
| 311 |
trust_remote_code=True,
|
| 312 |
tensor_parallel_size=1,
|
| 313 |
gpu_memory_utilization=0.90,
|
|
|
|
| 321 |
|
| 322 |
```yaml
|
| 323 |
services:
|
| 324 |
+
vllm-adaptkey:
|
| 325 |
image: vllm/vllm-openai:latest
|
| 326 |
+
container_name: vllm-adaptkey-nemotron-30b
|
| 327 |
runtime: nvidia
|
| 328 |
environment:
|
| 329 |
- NVIDIA_VISIBLE_DEVICES=0
|
|
|
|
| 332 |
volumes:
|
| 333 |
- /opt/models:/models:ro
|
| 334 |
command: >
|
| 335 |
+
--model /models/AdaptKey-Nemotron-30b
|
| 336 |
--trust-remote-code
|
| 337 |
--max-model-len 8196
|
| 338 |
--gpu-memory-utilization 0.90
|
|
|
|
| 340 |
restart: unless-stopped
|
| 341 |
```
|
| 342 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 343 |
## Lessons Learned
|
| 344 |
|
| 345 |
1. **Anti-forgetting strategy works**: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
|
| 346 |
2. **Data quality matters more than quantity**: Improving weak-area examples had more impact than adding more data
|
| 347 |
3. **System prompt alignment**: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
|
| 348 |
+
4. **Mixed datasets**: Combining diverse telecom subcategories prevents narrow specialization
|
|
|
|
|
|
|
| 349 |
|
|
|
|
|
|
|
|
|
|
| 350 |
|
| 351 |
## License
|
| 352 |
|
| 353 |
+
This model is derived from NVIDIA's Nemotron-3-Nano-30B and is subject to the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Please review the license terms before use in commercial applications.
|
| 354 |
|
| 355 |
## Citation
|
| 356 |
|
| 357 |
```bibtex
|
| 358 |
+
@misc{adaptkey_nemotron_30b_2026,
|
| 359 |
+
title={AdaptKey-Nemotron-30b: A Telecom-Specialized Language Model},
|
| 360 |
author={AdaptKey},
|
| 361 |
year={2026},
|
| 362 |
publisher={HuggingFace},
|
| 363 |
+
url={https://huggingface.co/AdaptKey/AdaptKey-Nemotron-30b}
|
| 364 |
}
|
| 365 |
+
```
|