Rick-AdaptKey's picture
Upload README.md with huggingface_hub
7b5059a verified
|
raw
history blame
8.75 kB

AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2

Overview

telecom-1.35M-v2 is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.

This model achieved a 79.3% benchmark score β€” a 10% improvement over baseline β€” while using conservative anti-forgetting training strategies to preserve general capabilities.

What We Did

  • Goal: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
  • Approach: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
  • Dataset: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
  • Base model: NVIDIA Nemotron-3-Nano-30B (Megatron format)

Training Data

Dataset Composition (~1.31M examples)

Split Examples
Train 1,303,277
Validation 5,000
Test 5,000
Total 1,313,277

Domain Coverage

The dataset includes comprehensive coverage of:

  • Network Traces & Anomaly Detection: 5G trace analysis, KPI statistics, anomaly classification
  • Network Slicing: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
  • Network Function Configuration: Open5GS YAML generation, AMF/SMF/UPF configuration
  • 3GPP Standards Q&A: Core network procedures, RAN protocols, signaling
  • Network Forecasting: Trend analysis, traffic prediction
  • Troubleshooting: Root cause analysis, diagnostic procedures

Data Format

Each example follows the input/output format:

{
  "input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
  "output": "[detailed answer with reasoning]"
}

Training Details

LoRA Hyperparameters

Parameter Value Notes
LoRA dim 64 Adapter capacity
LoRA alpha 128 2:1 ratio for gentler gradient flow
LoRA dropout 0.1 Regularization to prevent overfitting
Target modules linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj Mamba + MLP layers

Training Configuration

Parameter Value Notes
Base model Nemotron-3-Nano-30B (Megatron)
Training iterations 10,500 ~1.03 epochs
Learning rate 5e-5 Conservative to prevent forgetting
LR warmup 525 steps 5% of total iterations
LR decay Cosine to 10,500
Global batch size 128
Micro batch size 4 Per GPU
Gradient accumulation 8 steps
Max sequence length 2,048
Precision bf16
Checkpoint interval 1,000 steps

Parallelism (4x H100 NVL)

Parameter Value
Expert parallel 4
Tensor parallel 1
Pipeline parallel 1
MoE token dispatcher alltoall

Infrastructure

  • Hardware: 4x NVIDIA H100 NVL 94GB (NVLink connected)
  • Framework: NeMo/Megatron-Bridge with custom LoRA wrapper
  • Container: nvcr.io/nvidia/nemo:25.11.nemotron_3_nano
  • Training time: 3.5 days (84 hours)
  • Shared memory: 256GB

Training Progress

Checkpoint Train Loss Val Loss Val PPL
iter 500 0.402 0.242 1.274
iter 1000 0.367 0.145 1.156
iter 1500 0.381 0.118 1.125
iter 2000 0.432 0.130 1.139
iter 2500 0.377 0.139 1.149
iter 3000 0.391 0.108 1.114
iter 10500 (final) 0.356 0.150 1.162

Comparison to Previous Versions

Version Dataset Size Val Loss Val PPL Benchmark
telecom-1.27M 1,240,185 0.379 1.46 69.3%
telecom-1.35M-v2 1,303,277 0.150 1.162 79.3%

Key Improvements in v2

  • Augmented network slicing examples to address weak performance
  • Enhanced network function configuration coverage
  • Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
  • 10% absolute improvement on benchmark

Post-Training Pipeline

  1. LoRA Merge: Combined adapter weights with base model
  2. HuggingFace Export: Converted Megatron checkpoint to HF format
  3. vLLM Deployment: Served via vLLM with tensor parallelism
# Merge LoRA weights
torchrun --nproc-per-node=4 \
  /opt/Megatron-Bridge/examples/peft/merge_lora.py \
  --lora-checkpoint /models/telecom-1.35M-v2-lora/iter_0010500 \
  --hf-model-path /models/nemotron-30b \
  --output /models/telecom-1.35M-v2-merged

# Export to HuggingFace format
python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
  --hf-model /models/nemotron-30b \
  --megatron-path /models/telecom-1.35M-v2-merged \
  --hf-path /models/telecom-1.35M-v2-hf-export

Repository Structure

β”œβ”€β”€ models/telecom-1.35M-v2-hf-export/    # HF model weights
β”œβ”€β”€ training_data/
β”‚   β”œβ”€β”€ train.jsonl                        # 1,303,277 training examples
β”‚   β”œβ”€β”€ validation.jsonl                   # 5,000 validation examples
β”‚   └── test.jsonl                         # 5,000 test examples
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ telecom-1.35M-v2.yaml             # Training configuration
β”‚   β”œβ”€β”€ train_telecom-1.35M-v2.sh         # Launch script
β”‚   β”œβ”€β”€ finetune_teleyaml.py              # Custom training script
β”‚   └── teleyaml.py                        # Data processor
└── README.md

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
    trust_remote_code=True,
    torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(
    "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
    trust_remote_code=True,
)

prompt = """System: You are an expert telecommunications engineer. Answer questions accurately based on your knowledge of telecom standards (3GPP, IETF, ITU, TM Forum).

User: Explain the difference between eMBB, URLLC, and mMTC slice types in 5G network slicing."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
    trust_remote_code=True,
    tensor_parallel_size=1,
    gpu_memory_utilization=0.90,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate([prompt], sampling_params)

Docker Compose (vLLM Server)

services:
  vllm-telecom:
    image: vllm/vllm-openai:latest
    container_name: vllm-telecom-1.35M-v2
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=0
    ports:
      - "8090:8000"
    volumes:
      - /opt/models:/models:ro
    command: >
      --model /models/telecom-1.35M-v2-hf-export
      --trust-remote-code
      --max-model-len 8196
      --gpu-memory-utilization 0.90
      --tensor-parallel-size 1
    restart: unless-stopped

Evaluation

Benchmarked via internal evaluation system across telecom domain tasks:

  • Standards Q&A: 3GPP, IETF protocol knowledge
  • Network Traces: Anomaly detection, KPI analysis, trend identification
  • Configuration: YAML generation, network function setup
  • Troubleshooting: Root cause analysis, diagnostic procedures

Overall Score: 79.3%

Lessons Learned

  1. Anti-forgetting strategy works: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
  2. Data quality matters more than quantity: Improving weak-area examples had more impact than adding more data
  3. System prompt alignment: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
  4. Mixed datasets: Combining diverse telecom subcategories in training prevents narrow specialization

Future Work

  • Full SFT: Bake domain knowledge permanently into base weights
  • Task-specific LoRA adapters: Specialized adapters for YAML generation, anomaly detection, etc.
  • DPO refinement: Preference optimization for response quality

License

See NVIDIA Nemotron-3-Nano-30B license terms.

Citation

@misc{telecom-1.35M-v2,
  title={Telco-Nemotron-Nano-30B-Telecom-1.35M-v2},
  author={AdaptKey},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2}
}