Update README.md

Browse files

Files changed (1) hide show

README.md +184 -77

README.md CHANGED Viewed

@@ -1,17 +1,163 @@
-# AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2
 ## Overview
-telecom-1.35M-v2 is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.
-This model achieved a **79.3% benchmark score** — a 10% improvement over baseline — while using conservative anti-forgetting training strategies to preserve general capabilities.
 ## What We Did
 - **Goal**: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
 - **Approach**: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
 - **Dataset**: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
-- **Base model**: NVIDIA Nemotron-3-Nano-30B (Megatron format)
 ## Training Data
@@ -26,8 +172,6 @@ This model achieved a **79.3% benchmark score** — a 10% improvement over basel
 ### Domain Coverage
-The dataset includes comprehensive coverage of:
 - **Network Traces & Anomaly Detection**: 5G trace analysis, KPI statistics, anomaly classification
 - **Network Slicing**: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
 - **Network Function Configuration**: Open5GS YAML generation, AMF/SMF/UPF configuration
@@ -37,7 +181,6 @@ The dataset includes comprehensive coverage of:
 ### Data Format
-Each example follows the input/output format:
 ```json
 {
   "input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
@@ -51,7 +194,7 @@ Each example follows the input/output format:
 | Parameter | Value | Notes |
 |---|---|---|
-| LoRA dim | 64 | Adapter capacity |
 | LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
 | LoRA dropout | 0.1 | Regularization to prevent overfitting |
 | Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |
@@ -60,7 +203,7 @@ Each example follows the input/output format:
 | Parameter | Value | Notes |
 |---|---|---|
-| Base model | Nemotron-3-Nano-30B (Megatron) | |
 | Training iterations | 10,500 | ~1.03 epochs |
 | Learning rate | 5e-5 | Conservative to prevent forgetting |
 | LR warmup | 525 steps | 5% of total iterations |
@@ -69,10 +212,19 @@ Each example follows the input/output format:
 | Micro batch size | 4 | Per GPU |
 | Gradient accumulation | 8 steps | |
 | Max sequence length | 2,048 | |
-| Precision | bf16 | |
 | Checkpoint interval | 1,000 steps | |
-### Parallelism (4x H100 NVL)
 | Parameter | Value |
 |---|---|
@@ -81,14 +233,6 @@ Each example follows the input/output format:
 | Pipeline parallel | 1 |
 | MoE token dispatcher | alltoall |
-### Infrastructure
-- **Hardware**: 4x NVIDIA H100 NVL 94GB (NVLink connected)
-- **Framework**: NeMo/Megatron-Bridge with custom LoRA wrapper
-- **Container**: `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano`
-- **Training time**: ~3.5 days (~84 hours)
-- **Shared memory**: 256GB
 ## Training Progress
 | Checkpoint | Train Loss | Val Loss | Val PPL |
@@ -101,55 +245,34 @@ Each example follows the input/output format:
 | iter 3000 | 0.391 | 0.108 | 1.114 |
 | **iter 10500 (final)** | **0.356** | **0.150** | **1.162** |
-## Comparison to Previous Versions
 | Version | Dataset Size | Val Loss | Val PPL | Benchmark |
 |---|---|---|---|---|
-| telecom-1.27M | 1,240,185 | 0.379 | 1.46 | 69.3% |
-| **telecom-1.35M-v2** | **1,303,277** | **0.150** | **1.162** | **79.3%** |
-### Key Improvements in v2
-- Augmented network slicing examples to address weak performance
 - Enhanced network function configuration coverage
 - Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
-- 10% absolute improvement on benchmark
 ## Post-Training Pipeline
-1. **LoRA Merge**: Combined adapter weights with base model
-2. **HuggingFace Export**: Converted Megatron checkpoint to HF format
-3. **vLLM Deployment**: Served via vLLM with tensor parallelism
 ```bash
 # Merge LoRA weights
 torchrun --nproc-per-node=4 \
   /opt/Megatron-Bridge/examples/peft/merge_lora.py \
-  --lora-checkpoint /models/telecom-1.35M-v2-lora/iter_0010500 \
   --hf-model-path /models/nemotron-30b \
-  --output /models/telecom-1.35M-v2-merged
 # Export to HuggingFace format
 python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
   --hf-model /models/nemotron-30b \
-  --megatron-path /models/telecom-1.35M-v2-merged \
-  --hf-path /models/telecom-1.35M-v2-hf-export
-```
-## Repository Structure
-```
-├── models/telecom-1.35M-v2-hf-export/    # HF model weights
-├── training_data/
-│   ├── train.jsonl                        # 1,303,277 training examples
-│   ├── validation.jsonl                   # 5,000 validation examples
-│   └── test.jsonl                         # 5,000 test examples
-├── configs/
-│   ├── telecom-1.35M-v2.yaml             # Training configuration
-│   ├── train_telecom-1.35M-v2.sh         # Launch script
-│   ├── finetune_teleyaml.py              # Custom training script
-│   └── teleyaml.py                        # Data processor
-└── README.md
 ```
 ## Usage
@@ -160,12 +283,12 @@ python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained(
-    "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
     trust_remote_code=True,
     torch_dtype="bfloat16",
 )
 tokenizer = AutoTokenizer.from_pretrained(
-    "AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
     trust_remote_code=True,
 )
@@ -184,7 +307,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 from vllm import LLM, SamplingParams
 llm = LLM(
-    model="AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2",
     trust_remote_code=True,
     tensor_parallel_size=1,
     gpu_memory_utilization=0.90,
@@ -198,9 +321,9 @@ outputs = llm.generate([prompt], sampling_params)
 ```yaml
 services:
-  vllm-telecom:
     image: vllm/vllm-openai:latest
-    container_name: vllm-telecom-1.35M-v2
     runtime: nvidia
     environment:
       - NVIDIA_VISIBLE_DEVICES=0
@@ -209,7 +332,7 @@ services:
     volumes:
       - /opt/models:/models:ro
     command: >
-      --model /models/telecom-1.35M-v2-hf-export
       --trust-remote-code
       --max-model-len 8196
       --gpu-memory-utilization 0.90
@@ -217,42 +340,26 @@ services:
     restart: unless-stopped
 ```
-## Evaluation
-Benchmarked via internal evaluation system across telecom domain tasks:
-- **Standards Q&A**: 3GPP, IETF protocol knowledge
-- **Network Traces**: Anomaly detection, KPI analysis, trend identification
-- **Configuration**: YAML generation, network function setup
-- **Troubleshooting**: Root cause analysis, diagnostic procedures
-**Overall Score: 79.3%**
 ## Lessons Learned
 1. **Anti-forgetting strategy works**: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
 2. **Data quality matters more than quantity**: Improving weak-area examples had more impact than adding more data
 3. **System prompt alignment**: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
-4. **Mixed datasets**: Combining diverse telecom subcategories in training prevents narrow specialization
-## Future Work
-- **Full SFT**: Bake domain knowledge permanently into base weights
-- **Task-specific LoRA adapters**: Specialized adapters for YAML generation, anomaly detection, etc.
-- **DPO refinement**: Preference optimization for response quality
 ## License
-See NVIDIA Nemotron-3-Nano-30B license terms.
 ## Citation
 ```bibtex
-@misc{telecom-1.35M-v2,
-  title={Telco-Nemotron-Nano-30B-Telecom-1.35M-v2},
   author={AdaptKey},
   year={2026},
   publisher={HuggingFace},
-  url={https://huggingface.co/AdaptKey/telco-nemotron-nano-30B-telecom-1.35M-v2}
 }
-```

+---
+language:
+  - en
+license: other
+license_name: nvidia-open-model-license
+license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
+base_model: nvidia/Nemotron-3-Nano-30B-A3B
+tags:
+  - telecommunications
+  - 3gpp
+  - o-ran
+  - ietf
+  - telecom
+  - peft
+  - lora
+  - nemotron
+  - mixture-of-experts
+  - gsma
+  - network-slicing
+  - anomaly-detection
+  - srsran
+pipeline_tag: text-generation
+library_name: transformers
+model-index:
+  - name: AdaptKey-Nemotron-30b
+    results:
+      - task:
+          type: text-generation
+          name: Telecom Domain Benchmark
+        metrics:
+          - type: accuracy
+            value: 596
+            name: GSMA Open-Telco Composite Score (vs Baseline 538)
+---
+# AdaptKey/AdaptKey-Nemotron-30b
 ## Overview
+**AdaptKey-Nemotron-30b** is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.
+This model achieved a **composite benchmark score of 596** — a **+58 point improvement (+10.8%)** over the NVIDIA Nemotron-3-Nano-30B-A3B baseline of 538 — while using conservative anti-forgetting training strategies to preserve general capabilities.
+## Benchmark Results
+Evaluated via the **TeleFlow** evaluation system on 2/9/2026. See [Evaluation Methodology](#evaluation-methodology) below for full details on scoring.
+| Model | TeLogs | TeleMath | TeleQnA | 3GPPTSG | TeleYaml | TeleTables | srsRAN | ORAN | **Total** |
+|---|---|---|---|---|---|---|---|---|---|
+| **Baseline** — NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | 48.8 | 66.4 | 86.1 | 44 | 62.5 | 61 | 85 | 84.1 | **538** |
+| **AdaptKey-Nemotron-30b** (this model) | **61.6** | **74** | **88.2** | **48** | **79.3** | **72.8** | **86** | **86.4** | **596** |
+| **Δ improvement** | +12.8 | +7.6 | +2.1 | +4.0 | +16.8 | +11.8 | +1.0 | +2.3 | **+58** |
+### Strongest Gains
+- **TeleYaml** +16.8 pts (+26.9%) — structured YAML generation for network configs
+- **TeLogs** +12.8 pts (+26.2%) — network log analysis and fault diagnosis
+- **TeleTables** +11.8 pts (+19.3%) — tabular reasoning over network parameters
+---
+## Evaluation Methodology
+### Overview
+Adaptkey uses a two-tier scoring system designed to minimize judge cost while maximizing evaluation accuracy:
+1. **Deterministic scoring** — applied first whenever the answer is objectively verifiable (exact-match multiple choice, numeric answers). Scores are 10 (correct) or 0 (incorrect). The LLM judge is skipped entirely for these cases, eliminating variance and cost.
+2. **LLM-as-a-Judge** — invoked for all remaining responses where deterministic checking cannot conclusively score quality.
+### Judge Model
+| Property | Value |
+|---|---|
+| Model | `openai/gpt-oss-120b` |
+| Temperature | 0.1 (near-deterministic for consistency) |
+| Max output tokens | 300 |
+| Output format | Structured JSON `{"score": <int>, "reasoning": "<str>"}` |
+### Scoring Rubrics
+Two rubrics are applied depending on benchmark type:
+#### Rubric A — Free-Text Technical Answers
+*Applied to: TeleQnA, TeleMath, TeleLogs, TSG-3GPP*
+The judge evaluates three criteria simultaneously:
+- **Factual Accuracy** — Are the key technical facts correct?
+- **Completeness** — Does the response cover the main points from the reference answer?
+- **Correctness** — Are there any incorrect statements that would mislead an engineer?
+| Score | Interpretation |
+|---|---|
+| 10 | All key facts present and correct |
+| 7–9 | Mostly correct, minor omissions or imprecisions |
+| 4–6 | Partially correct, some important errors or omissions |
+| 1–3 | Mostly incorrect or very incomplete |
+| 0 | Completely wrong, off-topic, or empty |
+#### Rubric B — Structured Configuration Answers
+*Applied to: TeleYaml, TeleTables*
+The judge evaluates two weighted axes:
+- **Structural Validity (40%)** — Is the output a valid configuration with correct syntax?
+- **Content Accuracy (60%)** — Do field names and values match the expected configuration? Partial credit awarded proportionally based on ratio of correct fields to total fields.
+| Score | Interpretation |
+|---|---|
+| 10 | Perfect match — all fields correct |
+| 8–9 | Valid structure, 1–2 minor value differences |
+| 5–7 | Valid structure, several wrong values or missing fields |
+| 1–4 | Invalid structure or mostly wrong |
+| 0 | Empty, completely wrong, or unparseable |
+### Judge Prompt Structure
+Each judge invocation consists of two messages:
+**System message:**
+```
+You are a strict telecom evaluation judge. Score accurately based on the rubric.
+Output ONLY the JSON object.
+```
+**User message:**
+```
+Question: {question}
+Reference Answer: {reference_answer}
+Model Response: {model_response}
+Scoring Rubric:
+{applicable_rubric}
+Output JSON: {"score": <0-10>, "reasoning": "<brief explanation>"}
+```
+### Retry Policy
+If the judge scores a response below a configurable threshold, the model is re-prompted up to **5 times**. The **best score across all attempts** is recorded. This measures the model's capability ceiling rather than single-shot performance, and is applied consistently across all models evaluated including the baseline.
+### Benchmark-to-Rubric Mapping
+| Benchmark | Rubric | Deterministic Bypass |
+|---|---|---|
+| TeleQnA | A — Free-Text Technical | Where multiple-choice |
+| TeleMath | A — Free-Text Technical | Numeric exact-match |
+| TeleLogs | A — Free-Text Technical | Classification labels |
+| TSG-3GPP | A — Free-Text Technical | Where multiple-choice |
+| TeleYaml | B — Structured Configuration | N/A |
+| TeleTables | B — Structured Configuration | N/A |
+| srsRAN | A — Free-Text Technical | Where multiple-choice |
+| ORAN | A — Free-Text Technical | Where multiple-choice |
 ## What We Did
 - **Goal**: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
 - **Approach**: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
 - **Dataset**: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
+- **Base model**: NVIDIA Nemotron-3-Nano-30B-A3B (Megatron format)
 ## Training Data
 ### Domain Coverage
 - **Network Traces & Anomaly Detection**: 5G trace analysis, KPI statistics, anomaly classification
 - **Network Slicing**: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
 - **Network Function Configuration**: Open5GS YAML generation, AMF/SMF/UPF configuration
 ### Data Format
 ```json
 {
   "input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
 | Parameter | Value | Notes |
 |---|---|---|
+| LoRA dim (rank) | 64 | Adapter capacity |
 | LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
 | LoRA dropout | 0.1 | Regularization to prevent overfitting |
 | Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |
 | Parameter | Value | Notes |
 |---|---|---|
+| Base model | Nemotron-3-Nano-30B-A3B (Megatron) | |
 | Training iterations | 10,500 | ~1.03 epochs |
 | Learning rate | 5e-5 | Conservative to prevent forgetting |
 | LR warmup | 525 steps | 5% of total iterations |
 | Micro batch size | 4 | Per GPU |
 | Gradient accumulation | 8 steps | |
 | Max sequence length | 2,048 | |
+| Precision | BF16 | |
 | Checkpoint interval | 1,000 steps | |
+### Infrastructure
+| Property | Value |
+|---|---|
+| Hardware | 4x NVIDIA H100 NVL 94GB (NVLink connected) |
+| Framework | NeMo/Megatron-Bridge with custom LoRA wrapper |
+| Container | `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano` |
+| Training time | ~3.5 days (~84 hours) |
+### Parallelism
 | Parameter | Value |
 |---|---|
 | Pipeline parallel | 1 |
 | MoE token dispatcher | alltoall |
 ## Training Progress
 | Checkpoint | Train Loss | Val Loss | Val PPL |
 | iter 3000 | 0.391 | 0.108 | 1.114 |
 | **iter 10500 (final)** | **0.356** | **0.150** | **1.162** |
+## Version History
 | Version | Dataset Size | Val Loss | Val PPL | Benchmark |
 |---|---|---|---|---|
+| **AdaptKey-Nemotron-30b** (this model) | **1,303,277** | **0.150** | **1.162** | **596 composite** |
+### Key Improvements in This Version
+- Augmented network slicing examples to address weak benchmark performance
 - Enhanced network function configuration coverage
 - Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
+- +10.8% absolute improvement on composite benchmark over NVIDIA baseline
 ## Post-Training Pipeline
 ```bash
 # Merge LoRA weights
 torchrun --nproc-per-node=4 \
   /opt/Megatron-Bridge/examples/peft/merge_lora.py \
+  --lora-checkpoint /models/AdaptKey-Nemotron-30b-lora/iter_0010500 \
   --hf-model-path /models/nemotron-30b \
+  --output /models/AdaptKey-Nemotron-30b-merged
 # Export to HuggingFace format
 python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
   --hf-model /models/nemotron-30b \
+  --megatron-path /models/AdaptKey-Nemotron-30b-merged \
+  --hf-path /models/AdaptKey-Nemotron-30b-hf-export
 ```
 ## Usage
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained(
+    "AdaptKey/AdaptKey-Nemotron-30b",
     trust_remote_code=True,
     torch_dtype="bfloat16",
 )
 tokenizer = AutoTokenizer.from_pretrained(
+    "AdaptKey/AdaptKey-Nemotron-30b",
     trust_remote_code=True,
 )
 from vllm import LLM, SamplingParams
 llm = LLM(
+    model="AdaptKey/AdaptKey-Nemotron-30b",
     trust_remote_code=True,
     tensor_parallel_size=1,
     gpu_memory_utilization=0.90,
 ```yaml
 services:
+  vllm-adaptkey:
     image: vllm/vllm-openai:latest
+    container_name: vllm-adaptkey-nemotron-30b
     runtime: nvidia
     environment:
       - NVIDIA_VISIBLE_DEVICES=0
     volumes:
       - /opt/models:/models:ro
     command: >
+      --model /models/AdaptKey-Nemotron-30b
       --trust-remote-code
       --max-model-len 8196
       --gpu-memory-utilization 0.90
     restart: unless-stopped
 ```
 ## Lessons Learned
 1. **Anti-forgetting strategy works**: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
 2. **Data quality matters more than quantity**: Improving weak-area examples had more impact than adding more data
 3. **System prompt alignment**: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
+4. **Mixed datasets**: Combining diverse telecom subcategories prevents narrow specialization
 ## License
+This model is derived from NVIDIA's Nemotron-3-Nano-30B and is subject to the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Please review the license terms before use in commercial applications.
 ## Citation
 ```bibtex
+@misc{adaptkey_nemotron_30b_2026,
+  title={AdaptKey-Nemotron-30b: A Telecom-Specialized Language Model},
   author={AdaptKey},
   year={2026},
   publisher={HuggingFace},
+  url={https://huggingface.co/AdaptKey/AdaptKey-Nemotron-30b}
 }
+```