Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +258 -0

README.md ADDED Viewed

	@@ -0,0 +1,258 @@

+# AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2
+## Overview
+telecom-1.35M-v2 is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.
+This model achieved a **79.3% benchmark score** — a 10% improvement over baseline — while using conservative anti-forgetting training strategies to preserve general capabilities.
+## What We Did
+- **Goal**: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
+- **Approach**: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
+- **Dataset**: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
+- **Base model**: NVIDIA Nemotron-3-Nano-30B (Megatron format)
+## Training Data
+### Dataset Composition (~1.31M examples)
+| Split | Examples |
+|---|---|
+| Train | 1,303,277 |
+| Validation | 5,000 |
+| Test | 5,000 |
+| **Total** | **1,313,277** |
+### Domain Coverage
+The dataset includes comprehensive coverage of:
+- **Network Traces & Anomaly Detection**: 5G trace analysis, KPI statistics, anomaly classification
+- **Network Slicing**: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
+- **Network Function Configuration**: Open5GS YAML generation, AMF/SMF/UPF configuration
+- **3GPP Standards Q&A**: Core network procedures, RAN protocols, signaling
+- **Network Forecasting**: Trend analysis, traffic prediction
+- **Troubleshooting**: Root cause analysis, diagnostic procedures
+### Data Format
+Each example follows the input/output format:
+```json
+{
+  "input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
+  "output": "[detailed answer with reasoning]"
+}
+```
+## Training Details
+### LoRA Hyperparameters
+| Parameter | Value | Notes |
+|---|---|---|
+| LoRA dim | 64 | Adapter capacity |
+| LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
+| LoRA dropout | 0.1 | Regularization to prevent overfitting |
+| Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |
+### Training Configuration
+| Parameter | Value | Notes |
+|---|---|---|
+| Base model | Nemotron-3-Nano-30B (Megatron) | |
+| Training iterations | 10,500 | ~1.03 epochs |
+| Learning rate | 5e-5 | Conservative to prevent forgetting |
+| LR warmup | 525 steps | 5% of total iterations |
+| LR decay | Cosine to 10,500 | |
+| Global batch size | 128 | |
+| Micro batch size | 4 | Per GPU |
+| Gradient accumulation | 8 steps | |
+| Max sequence length | 2,048 | |
+| Precision | bf16 | |
+| Checkpoint interval | 1,000 steps | |
+### Parallelism (4x H100 NVL)
+| Parameter | Value |
+|---|---|
+| Expert parallel | 4 |
+| Tensor parallel | 1 |
+| Pipeline parallel | 1 |
+| MoE token dispatcher | alltoall |
+### Infrastructure
+- **Hardware**: 4x NVIDIA H100 NVL 94GB (NVLink connected)
+- **Framework**: NeMo/Megatron-Bridge with custom LoRA wrapper
+- **Container**: `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano`
+- **Training time**: ~3.5 days (~84 hours)
+- **Shared memory**: 256GB
+## Training Progress
+| Checkpoint | Train Loss | Val Loss | Val PPL |
+|---|---|---|---|
+| iter 500 | 0.402 | 0.242 | 1.274 |
+| iter 1000 | 0.367 | 0.145 | 1.156 |
+| iter 1500 | 0.381 | 0.118 | 1.125 |
+| iter 2000 | 0.432 | 0.130 | 1.139 |
+| iter 2500 | 0.377 | 0.139 | 1.149 |
+| iter 3000 | 0.391 | 0.108 | 1.114 |
+| **iter 10500 (final)** | **0.356** | **0.150** | **1.162** |
+## Comparison to Previous Versions
+| Version | Dataset Size | Val Loss | Val PPL | Benchmark |
+|---|---|---|---|---|
+| telecom-1.27M | 1,240,185 | 0.379 | 1.46 | 69.3% |
+| **telecom-1.35M-v2** | **1,303,277** | **0.150** | **1.162** | **79.3%** |
+### Key Improvements in v2
+- Augmented network slicing examples to address weak performance
+- Enhanced network function configuration coverage
+- Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
+- 10% absolute improvement on benchmark
+## Post-Training Pipeline
+1. **LoRA Merge**: Combined adapter weights with base model
+2. **HuggingFace Export**: Converted Megatron checkpoint to HF format
+3. **vLLM Deployment**: Served via vLLM with tensor parallelism
+```bash
+# Merge LoRA weights
+torchrun --nproc-per-node=4 \
+  /opt/Megatron-Bridge/examples/peft/merge_lora.py \
+  --lora-checkpoint /models/telecom-1.35M-v2-lora/iter_0010500 \
+  --hf-model-path /models/nemotron-30b \
+  --output /models/telecom-1.35M-v2-merged
+# Export to HuggingFace format
+python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
+  --hf-model /models/nemotron-30b \
+  --megatron-path /models/telecom-1.35M-v2-merged \
+  --hf-path /models/telecom-1.35M-v2-hf-export
+```
+## Repository Structure
+```
+├── models/telecom-1.35M-v2-hf-export/    # HF model weights
+├── training_data/
+│   ├── train.jsonl                        # 1,303,277 training examples
+│   ├── validation.jsonl                   # 5,000 validation examples
+│   └── test.jsonl                         # 5,000 test examples
+├── configs/
+│   ├── telecom-1.35M-v2.yaml             # Training configuration
+│   ├── train_telecom-1.35M-v2.sh         # Launch script
+│   ├── finetune_teleyaml.py              # Custom training script
+│   └── teleyaml.py                        # Data processor
+└── README.md
+```
+## Usage
+### With Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
+    trust_remote_code=True,
+    torch_dtype="bfloat16",
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
+    trust_remote_code=True,
+)
+prompt = """System: You are an expert telecommunications engineer. Answer questions accurately based on your knowledge of telecom standards (3GPP, IETF, ITU, TM Forum).
+User: Explain the difference between eMBB, URLLC, and mMTC slice types in 5G network slicing."""
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### With vLLM
+```python
+from vllm import LLM, SamplingParams
+llm = LLM(
+    model="AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
+    trust_remote_code=True,
+    tensor_parallel_size=1,
+    gpu_memory_utilization=0.90,
+)
+sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
+outputs = llm.generate([prompt], sampling_params)
+```
+### Docker Compose (vLLM Server)
+```yaml
+services:
+  vllm-telecom:
+    image: vllm/vllm-openai:latest
+    container_name: vllm-telecom-1.35M-v2
+    runtime: nvidia
+    environment:
+      - NVIDIA_VISIBLE_DEVICES=0
+    ports:
+      - "8090:8000"
+    volumes:
+      - /opt/models:/models:ro
+    command: >
+      --model /models/telecom-1.35M-v2-hf-export
+      --trust-remote-code
+      --max-model-len 8196
+      --gpu-memory-utilization 0.90
+      --tensor-parallel-size 1
+    restart: unless-stopped
+```
+## Evaluation
+Benchmarked via internal evaluation system across telecom domain tasks:
+- **Standards Q&A**: 3GPP, IETF protocol knowledge
+- **Network Traces**: Anomaly detection, KPI analysis, trend identification
+- **Configuration**: YAML generation, network function setup
+- **Troubleshooting**: Root cause analysis, diagnostic procedures
+**Overall Score: 79.3%**
+## Lessons Learned
+1. **Anti-forgetting strategy works**: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
+2. **Data quality matters more than quantity**: Improving weak-area examples had more impact than adding more data
+3. **System prompt alignment**: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
+4. **Mixed datasets**: Combining diverse telecom subcategories in training prevents narrow specialization
+## Future Work
+- **Full SFT**: Bake domain knowledge permanently into base weights
+- **Task-specific LoRA adapters**: Specialized adapters for YAML generation, anomaly detection, etc.
+- **DPO refinement**: Preference optimization for response quality
+## License
+See NVIDIA Nemotron-3-Nano-30B license terms.
+## Citation
+```bibtex
+@misc{telecom-1.35M-v2,
+  title={Telco-Nemotron-Nano-30B-Telecom-1.35M-v2},
+  author={AdaptKey},
+  year={2026},
+  publisher={HuggingFace},
+  url={https://huggingface.co/AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2}
+}
+```