AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2
Overview
telecom-1.35M-v2 is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.
This model achieved a 79.3% benchmark score β a 10% improvement over baseline β while using conservative anti-forgetting training strategies to preserve general capabilities.
What We Did
- Goal: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
- Approach: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
- Dataset: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
- Base model: NVIDIA Nemotron-3-Nano-30B (Megatron format)
Training Data
Dataset Composition (~1.31M examples)
| Split | Examples |
|---|---|
| Train | 1,303,277 |
| Validation | 5,000 |
| Test | 5,000 |
| Total | 1,313,277 |
Domain Coverage
The dataset includes comprehensive coverage of:
- Network Traces & Anomaly Detection: 5G trace analysis, KPI statistics, anomaly classification
- Network Slicing: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
- Network Function Configuration: Open5GS YAML generation, AMF/SMF/UPF configuration
- 3GPP Standards Q&A: Core network procedures, RAN protocols, signaling
- Network Forecasting: Trend analysis, traffic prediction
- Troubleshooting: Root cause analysis, diagnostic procedures
Data Format
Each example follows the input/output format:
{
"input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
"output": "[detailed answer with reasoning]"
}
Training Details
LoRA Hyperparameters
| Parameter | Value | Notes |
|---|---|---|
| LoRA dim | 64 | Adapter capacity |
| LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
| LoRA dropout | 0.1 | Regularization to prevent overfitting |
| Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |
Training Configuration
| Parameter | Value | Notes |
|---|---|---|
| Base model | Nemotron-3-Nano-30B (Megatron) | |
| Training iterations | 10,500 | ~1.03 epochs |
| Learning rate | 5e-5 | Conservative to prevent forgetting |
| LR warmup | 525 steps | 5% of total iterations |
| LR decay | Cosine to 10,500 | |
| Global batch size | 128 | |
| Micro batch size | 4 | Per GPU |
| Gradient accumulation | 8 steps | |
| Max sequence length | 2,048 | |
| Precision | bf16 | |
| Checkpoint interval | 1,000 steps |
Parallelism (4x H100 NVL)
| Parameter | Value |
|---|---|
| Expert parallel | 4 |
| Tensor parallel | 1 |
| Pipeline parallel | 1 |
| MoE token dispatcher | alltoall |
Infrastructure
- Hardware: 4x NVIDIA H100 NVL 94GB (NVLink connected)
- Framework: NeMo/Megatron-Bridge with custom LoRA wrapper
- Container:
nvcr.io/nvidia/nemo:25.11.nemotron_3_nano - Training time:
3.5 days (84 hours) - Shared memory: 256GB
Training Progress
| Checkpoint | Train Loss | Val Loss | Val PPL |
|---|---|---|---|
| iter 500 | 0.402 | 0.242 | 1.274 |
| iter 1000 | 0.367 | 0.145 | 1.156 |
| iter 1500 | 0.381 | 0.118 | 1.125 |
| iter 2000 | 0.432 | 0.130 | 1.139 |
| iter 2500 | 0.377 | 0.139 | 1.149 |
| iter 3000 | 0.391 | 0.108 | 1.114 |
| iter 10500 (final) | 0.356 | 0.150 | 1.162 |
Comparison to Previous Versions
| Version | Dataset Size | Val Loss | Val PPL | Benchmark |
|---|---|---|---|---|
| telecom-1.27M | 1,240,185 | 0.379 | 1.46 | 69.3% |
| telecom-1.35M-v2 | 1,303,277 | 0.150 | 1.162 | 79.3% |
Key Improvements in v2
- Augmented network slicing examples to address weak performance
- Enhanced network function configuration coverage
- Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
- 10% absolute improvement on benchmark
Post-Training Pipeline
- LoRA Merge: Combined adapter weights with base model
- HuggingFace Export: Converted Megatron checkpoint to HF format
- vLLM Deployment: Served via vLLM with tensor parallelism
# Merge LoRA weights
torchrun --nproc-per-node=4 \
/opt/Megatron-Bridge/examples/peft/merge_lora.py \
--lora-checkpoint /models/telecom-1.35M-v2-lora/iter_0010500 \
--hf-model-path /models/nemotron-30b \
--output /models/telecom-1.35M-v2-merged
# Export to HuggingFace format
python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
--hf-model /models/nemotron-30b \
--megatron-path /models/telecom-1.35M-v2-merged \
--hf-path /models/telecom-1.35M-v2-hf-export
Repository Structure
βββ models/telecom-1.35M-v2-hf-export/ # HF model weights
βββ training_data/
β βββ train.jsonl # 1,303,277 training examples
β βββ validation.jsonl # 5,000 validation examples
β βββ test.jsonl # 5,000 test examples
βββ configs/
β βββ telecom-1.35M-v2.yaml # Training configuration
β βββ train_telecom-1.35M-v2.sh # Launch script
β βββ finetune_teleyaml.py # Custom training script
β βββ teleyaml.py # Data processor
βββ README.md
Usage
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
trust_remote_code=True,
torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(
"AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
trust_remote_code=True,
)
prompt = """System: You are an expert telecommunications engineer. Answer questions accurately based on your knowledge of telecom standards (3GPP, IETF, ITU, TM Forum).
User: Explain the difference between eMBB, URLLC, and mMTC slice types in 5G network slicing."""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With vLLM
from vllm import LLM, SamplingParams
llm = LLM(
model="AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
trust_remote_code=True,
tensor_parallel_size=1,
gpu_memory_utilization=0.90,
)
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate([prompt], sampling_params)
Docker Compose (vLLM Server)
services:
vllm-telecom:
image: vllm/vllm-openai:latest
container_name: vllm-telecom-1.35M-v2
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=0
ports:
- "8090:8000"
volumes:
- /opt/models:/models:ro
command: >
--model /models/telecom-1.35M-v2-hf-export
--trust-remote-code
--max-model-len 8196
--gpu-memory-utilization 0.90
--tensor-parallel-size 1
restart: unless-stopped
Evaluation
Benchmarked via internal evaluation system across telecom domain tasks:
- Standards Q&A: 3GPP, IETF protocol knowledge
- Network Traces: Anomaly detection, KPI analysis, trend identification
- Configuration: YAML generation, network function setup
- Troubleshooting: Root cause analysis, diagnostic procedures
Overall Score: 79.3%
Lessons Learned
- Anti-forgetting strategy works: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
- Data quality matters more than quantity: Improving weak-area examples had more impact than adding more data
- System prompt alignment: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
- Mixed datasets: Combining diverse telecom subcategories in training prevents narrow specialization
Future Work
- Full SFT: Bake domain knowledge permanently into base weights
- Task-specific LoRA adapters: Specialized adapters for YAML generation, anomaly detection, etc.
- DPO refinement: Preference optimization for response quality
License
See NVIDIA Nemotron-3-Nano-30B license terms.
Citation
@misc{telecom-1.35M-v2,
title={Telco-Nemotron-Nano-30B-Telecom-1.35M-v2},
author={AdaptKey},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2}
}