AdaptKey-Nemotron-30b / README.md

Upload README.md with huggingface_hub

7b5059a verified about 2 months ago

preview code

raw

history blame

8.75 kB

AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2

Overview

telecom-1.35M-v2 is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.

This model achieved a 79.3% benchmark score — a 10% improvement over baseline — while using conservative anti-forgetting training strategies to preserve general capabilities.

What We Did

Goal: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
Approach: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
Dataset: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
Base model: NVIDIA Nemotron-3-Nano-30B (Megatron format)

Training Data

Dataset Composition (~1.31M examples)

Split	Examples
Train	1,303,277
Validation	5,000
Test	5,000
Total	1,313,277

Domain Coverage

The dataset includes comprehensive coverage of:

Network Traces & Anomaly Detection: 5G trace analysis, KPI statistics, anomaly classification
Network Slicing: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
Network Function Configuration: Open5GS YAML generation, AMF/SMF/UPF configuration
3GPP Standards Q&A: Core network procedures, RAN protocols, signaling
Network Forecasting: Trend analysis, traffic prediction
Troubleshooting: Root cause analysis, diagnostic procedures

Data Format

Each example follows the input/output format:

{
  "input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
  "output": "[detailed answer with reasoning]"
}

Training Details

LoRA Hyperparameters

Parameter	Value	Notes
LoRA dim	64	Adapter capacity
LoRA alpha	128	2:1 ratio for gentler gradient flow
LoRA dropout	0.1	Regularization to prevent overfitting
Target modules	linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj	Mamba + MLP layers

Training Configuration

Parameter	Value	Notes
Base model	Nemotron-3-Nano-30B (Megatron)
Training iterations	10,500	~1.03 epochs
Learning rate	5e-5	Conservative to prevent forgetting
LR warmup	525 steps	5% of total iterations
LR decay	Cosine to 10,500
Global batch size	128
Micro batch size	4	Per GPU
Gradient accumulation	8 steps
Max sequence length	2,048
Precision	bf16
Checkpoint interval	1,000 steps

Parallelism (4x H100 NVL)

Parameter	Value
Expert parallel	4
Tensor parallel	1
Pipeline parallel	1
MoE token dispatcher	alltoall

Infrastructure

Hardware: 4x NVIDIA H100 NVL 94GB (NVLink connected)
Framework: NeMo/Megatron-Bridge with custom LoRA wrapper
Container: nvcr.io/nvidia/nemo:25.11.nemotron_3_nano
Training time: ~~3.5 days (~~84 hours)
Shared memory: 256GB

Training Progress

Checkpoint	Train Loss	Val Loss	Val PPL
iter 500	0.402	0.242	1.274
iter 1000	0.367	0.145	1.156
iter 1500	0.381	0.118	1.125
iter 2000	0.432	0.130	1.139
iter 2500	0.377	0.139	1.149
iter 3000	0.391	0.108	1.114
iter 10500 (final)	0.356	0.150	1.162

Comparison to Previous Versions

Version	Dataset Size	Val Loss	Val PPL	Benchmark
telecom-1.27M	1,240,185	0.379	1.46	69.3%
telecom-1.35M-v2	1,303,277	0.150	1.162	79.3%

Key Improvements in v2

Augmented network slicing examples to address weak performance
Enhanced network function configuration coverage
Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
10% absolute improvement on benchmark

Post-Training Pipeline

LoRA Merge: Combined adapter weights with base model
HuggingFace Export: Converted Megatron checkpoint to HF format
vLLM Deployment: Served via vLLM with tensor parallelism

# Merge LoRA weights
torchrun --nproc-per-node=4 \
  /opt/Megatron-Bridge/examples/peft/merge_lora.py \
  --lora-checkpoint /models/telecom-1.35M-v2-lora/iter_0010500 \
  --hf-model-path /models/nemotron-30b \
  --output /models/telecom-1.35M-v2-merged

# Export to HuggingFace format
python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
  --hf-model /models/nemotron-30b \
  --megatron-path /models/telecom-1.35M-v2-merged \
  --hf-path /models/telecom-1.35M-v2-hf-export

Repository Structure

├── models/telecom-1.35M-v2-hf-export/    # HF model weights
├── training_data/
│   ├── train.jsonl                        # 1,303,277 training examples
│   ├── validation.jsonl                   # 5,000 validation examples
│   └── test.jsonl                         # 5,000 test examples
├── configs/
│   ├── telecom-1.35M-v2.yaml             # Training configuration
│   ├── train_telecom-1.35M-v2.sh         # Launch script
│   ├── finetune_teleyaml.py              # Custom training script
│   └── teleyaml.py                        # Data processor
└── README.md

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
    trust_remote_code=True,
    torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(
    "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
    trust_remote_code=True,
)

prompt = """System: You are an expert telecommunications engineer. Answer questions accurately based on your knowledge of telecom standards (3GPP, IETF, ITU, TM Forum).

User: Explain the difference between eMBB, URLLC, and mMTC slice types in 5G network slicing."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
    trust_remote_code=True,
    tensor_parallel_size=1,
    gpu_memory_utilization=0.90,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate([prompt], sampling_params)

Docker Compose (vLLM Server)

services:
  vllm-telecom:
    image: vllm/vllm-openai:latest
    container_name: vllm-telecom-1.35M-v2
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=0
    ports:
      - "8090:8000"
    volumes:
      - /opt/models:/models:ro
    command: >
      --model /models/telecom-1.35M-v2-hf-export
      --trust-remote-code
      --max-model-len 8196
      --gpu-memory-utilization 0.90
      --tensor-parallel-size 1
    restart: unless-stopped

Evaluation

Benchmarked via internal evaluation system across telecom domain tasks:

Standards Q&A: 3GPP, IETF protocol knowledge
Network Traces: Anomaly detection, KPI analysis, trend identification
Configuration: YAML generation, network function setup
Troubleshooting: Root cause analysis, diagnostic procedures

Overall Score: 79.3%

Lessons Learned

Anti-forgetting strategy works: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
Data quality matters more than quantity: Improving weak-area examples had more impact than adding more data
System prompt alignment: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
Mixed datasets: Combining diverse telecom subcategories in training prevents narrow specialization

Future Work

Full SFT: Bake domain knowledge permanently into base weights
Task-specific LoRA adapters: Specialized adapters for YAML generation, anomaly detection, etc.
DPO refinement: Preference optimization for response quality

License

See NVIDIA Nemotron-3-Nano-30B license terms.

Citation

@misc{telecom-1.35M-v2,
  title={Telco-Nemotron-Nano-30B-Telecom-1.35M-v2},
  author={AdaptKey},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2}
}