README.md · AdaptKey/AdaptKey-Nemotron-30b at main

File size: 12,713 Bytes

---
language:
  - en
license: other
license_name: nvidia-open-model-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
base_model: nvidia/Nemotron-3-Nano-30B-A3B
tags:
  - telecommunications
  - 3gpp
  - o-ran
  - ietf
  - telecom
  - peft
  - lora
  - nemotron
  - mixture-of-experts
  - gsma
  - network-slicing
  - anomaly-detection
  - srsran
pipeline_tag: text-generation
library_name: transformers
model-index:
  - name: AdaptKey-Nemotron-30b
    results:
      - task:
          type: text-generation
          name: Telecom Domain Benchmark
        metrics:
          - type: accuracy
            value: 596
            name: GSMA Open-Telco Composite Score (vs Baseline 538)
---

# AdaptKey/AdaptKey-Nemotron-30b

## Overview

**AdaptKey-Nemotron-30b** is a LoRA fine-tuned version of NVIDIA's Nemotron-3-Nano-30B model, specialized for telecommunications and network engineering applications. The model was trained on 1.3M+ telecom domain examples covering 3GPP standards, IETF protocols, network traces, anomaly detection, and network function configuration.

This model achieved a **composite benchmark score of 596** — a **+58 point improvement (+10.8%)** over the NVIDIA Nemotron-3-Nano-30B-A3B baseline of 538 — while using conservative anti-forgetting training strategies to preserve general capabilities.

## Benchmark Results

Evaluated via the **TeleFlow** evaluation system on 2/9/2026. See [Evaluation Methodology](#evaluation-methodology) below for full details on scoring.

| Model | TeLogs | TeleMath | TeleQnA | 3GPPTSG | TeleYaml | TeleTables | srsRAN | ORAN | **Total** |
|---|---|---|---|---|---|---|---|---|---|
| **Baseline** — NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | 48.8 | 66.4 | 86.1 | 44 | 62.5 | 61 | 85 | 84.1 | **538** |
| **AdaptKey-Nemotron-30b** (this model) | **61.6** | **74** | **88.2** | **48** | **79.3** | **72.8** | **86** | **86.4** | **596** |
| **Δ improvement** | +12.8 | +7.6 | +2.1 | +4.0 | +16.8 | +11.8 | +1.0 | +2.3 | **+58** |

### Strongest Gains
- **TeleYaml** +16.8 pts (+26.9%) — structured YAML generation for network configs
- **TeLogs** +12.8 pts (+26.2%) — network log analysis and fault diagnosis
- **TeleTables** +11.8 pts (+19.3%) — tabular reasoning over network parameters

---

## Evaluation Methodology

### Overview

Adaptkey uses a two-tier scoring system designed to minimize judge cost while maximizing evaluation accuracy:

1. **Deterministic scoring** — applied first whenever the answer is objectively verifiable (exact-match multiple choice, numeric answers). Scores are 10 (correct) or 0 (incorrect). The LLM judge is skipped entirely for these cases, eliminating variance and cost.
2. **LLM-as-a-Judge** — invoked for all remaining responses where deterministic checking cannot conclusively score quality.

### Judge Model

| Property | Value |
|---|---|
| Model | `openai/gpt-oss-120b` |
| Temperature | 0.1 (near-deterministic for consistency) |
| Max output tokens | 300 |
| Output format | Structured JSON `{"score": <int>, "reasoning": "<str>"}` |

### Scoring Rubrics

Two rubrics are applied depending on benchmark type:

#### Rubric A — Free-Text Technical Answers
*Applied to: TeleQnA, TeleMath, TeleLogs, TSG-3GPP*

The judge evaluates three criteria simultaneously:
- **Factual Accuracy** — Are the key technical facts correct?
- **Completeness** — Does the response cover the main points from the reference answer?
- **Correctness** — Are there any incorrect statements that would mislead an engineer?

| Score | Interpretation |
|---|---|
| 10 | All key facts present and correct |
| 7–9 | Mostly correct, minor omissions or imprecisions |
| 4–6 | Partially correct, some important errors or omissions |
| 1–3 | Mostly incorrect or very incomplete |
| 0 | Completely wrong, off-topic, or empty |

#### Rubric B — Structured Configuration Answers
*Applied to: TeleYaml, TeleTables*

The judge evaluates two weighted axes:
- **Structural Validity (40%)** — Is the output a valid configuration with correct syntax?
- **Content Accuracy (60%)** — Do field names and values match the expected configuration? Partial credit awarded proportionally based on ratio of correct fields to total fields.

| Score | Interpretation |
|---|---|
| 10 | Perfect match — all fields correct |
| 8–9 | Valid structure, 1–2 minor value differences |
| 5–7 | Valid structure, several wrong values or missing fields |
| 1–4 | Invalid structure or mostly wrong |
| 0 | Empty, completely wrong, or unparseable |

### Judge Prompt Structure

Each judge invocation consists of two messages:

**System message:**
```
You are a strict telecom evaluation judge. Score accurately based on the rubric.
Output ONLY the JSON object.
```

**User message:**
```
Question: {question}

Reference Answer: {reference_answer}

Model Response: {model_response}

Scoring Rubric:
{applicable_rubric}

Output JSON: {"score": <0-10>, "reasoning": "<brief explanation>"}
```

### Retry Policy

If the judge scores a response below a configurable threshold, the model is re-prompted up to **5 times**. The **best score across all attempts** is recorded. This measures the model's capability ceiling rather than single-shot performance, and is applied consistently across all models evaluated including the baseline.

### Benchmark-to-Rubric Mapping

| Benchmark | Rubric | Deterministic Bypass |
|---|---|---|
| TeleQnA | A — Free-Text Technical | Where multiple-choice |
| TeleMath | A — Free-Text Technical | Numeric exact-match |
| TeleLogs | A — Free-Text Technical | Classification labels |
| TSG-3GPP | A — Free-Text Technical | Where multiple-choice |
| TeleYaml | B — Structured Configuration | N/A |
| TeleTables | B — Structured Configuration | N/A |
| srsRAN | A — Free-Text Technical | Where multiple-choice |
| ORAN | A — Free-Text Technical | Where multiple-choice |

## What We Did

- **Goal**: Create a specialized telecom AI assistant with expert-level knowledge of 3GPP, IETF, ITU, and TM Forum standards
- **Approach**: LoRA fine-tuning with conservative hyperparameters to prevent catastrophic forgetting
- **Dataset**: 1.3M+ telecom Q&A examples with augmented network slicing and network function configuration data
- **Base model**: NVIDIA Nemotron-3-Nano-30B-A3B (Megatron format)

## Training Data

### Dataset Composition (~1.31M examples)

| Split | Examples |
|---|---|
| Train | 1,303,277 |
| Validation | 5,000 |
| Test | 5,000 |
| **Total** | **1,313,277** |

### Domain Coverage

- **Network Traces & Anomaly Detection**: 5G trace analysis, KPI statistics, anomaly classification
- **Network Slicing**: S-NSSAI configuration, slice types (eMBB, URLLC, mMTC), resource allocation
- **Network Function Configuration**: Open5GS YAML generation, AMF/SMF/UPF configuration
- **3GPP Standards Q&A**: Core network procedures, RAN protocols, signaling
- **Network Forecasting**: Trend analysis, traffic prediction
- **Troubleshooting**: Root cause analysis, diagnostic procedures

### Data Format

```json
{
  "input": "System: You are an expert telecommunications engineer...\nUser: [question with context]",
  "output": "[detailed answer with reasoning]"
}
```

## Training Details

### LoRA Hyperparameters

| Parameter | Value | Notes |
|---|---|---|
| LoRA dim (rank) | 64 | Adapter capacity |
| LoRA alpha | 128 | 2:1 ratio for gentler gradient flow |
| LoRA dropout | 0.1 | Regularization to prevent overfitting |
| Target modules | linear_qkv, linear_proj, linear_fc1, linear_fc2, in_proj, out_proj | Mamba + MLP layers |

### Training Configuration

| Parameter | Value | Notes |
|---|---|---|
| Base model | Nemotron-3-Nano-30B-A3B (Megatron) | |
| Training iterations | 10,500 | ~1.03 epochs |
| Learning rate | 5e-5 | Conservative to prevent forgetting |
| LR warmup | 525 steps | 5% of total iterations |
| LR decay | Cosine to 10,500 | |
| Global batch size | 128 | |
| Micro batch size | 4 | Per GPU |
| Gradient accumulation | 8 steps | |
| Max sequence length | 2,048 | |
| Precision | BF16 | |
| Checkpoint interval | 1,000 steps | |

### Infrastructure

| Property | Value |
|---|---|
| Hardware | 4x NVIDIA H100 NVL 94GB (NVLink connected) |
| Framework | NeMo/Megatron-Bridge with custom LoRA wrapper |
| Container | `nvcr.io/nvidia/nemo:25.11.nemotron_3_nano` |
| Training time | 84 hours |

### Parallelism

| Parameter | Value |
|---|---|
| Expert parallel | 4 |
| Tensor parallel | 1 |
| Pipeline parallel | 1 |
| MoE token dispatcher | alltoall |

## Training Progress

| Checkpoint | Train Loss | Val Loss | Val PPL |
|---|---|---|---|
| iter 500 | 0.402 | 0.242 | 1.274 |
| iter 1000 | 0.367 | 0.145 | 1.156 |
| iter 1500 | 0.381 | 0.118 | 1.125 |
| iter 2000 | 0.432 | 0.130 | 1.139 |
| iter 2500 | 0.377 | 0.139 | 1.149 |
| iter 3000 | 0.391 | 0.108 | 1.114 |
| **iter 10500 (final)** | **0.356** | **0.150** | **1.162** |

## Version History

| Version | Dataset Size | Val Loss | Val PPL | Benchmark |
|---|---|---|---|---|
| **AdaptKey-Nemotron-30b** (this model) | **1,303,277** | **0.150** | **1.162** | **596 composite** |

### Key Improvements in This Version

- Augmented network slicing examples to address weak benchmark performance
- Enhanced network function configuration coverage
- Improved system prompts (removed misleading "telco expert" framing for non-telco questions)
- +10.8% absolute improvement on composite benchmark over NVIDIA baseline

## Post-Training Pipeline

```bash
# Merge LoRA weights
torchrun --nproc-per-node=4 \
  /opt/Megatron-Bridge/examples/peft/merge_lora.py \
  --lora-checkpoint /models/AdaptKey-Nemotron-30b-lora/iter_0010500 \
  --hf-model-path /models/nemotron-30b \
  --output /models/AdaptKey-Nemotron-30b-merged

# Export to HuggingFace format
python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py export \
  --hf-model /models/nemotron-30b \
  --megatron-path /models/AdaptKey-Nemotron-30b-merged \
  --hf-path /models/AdaptKey-Nemotron-30b-hf-export
```

## Usage

### With Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AdaptKey/AdaptKey-Nemotron-30b",
    trust_remote_code=True,
    torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(
    "AdaptKey/AdaptKey-Nemotron-30b",
    trust_remote_code=True,
)

prompt = """System: You are an expert telecommunications engineer. Answer questions accurately based on your knowledge of telecom standards (3GPP, IETF, ITU, TM Forum).

User: Explain the difference between eMBB, URLLC, and mMTC slice types in 5G network slicing."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### With vLLM

```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="AdaptKey/AdaptKey-Nemotron-30b",
    trust_remote_code=True,
    tensor_parallel_size=1,
    gpu_memory_utilization=0.90,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate([prompt], sampling_params)
```

### Docker Compose (vLLM Server)

```yaml
services:
  vllm-adaptkey:
    image: vllm/vllm-openai:latest
    container_name: vllm-adaptkey-nemotron-30b
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=0
    ports:
      - "8090:8000"
    volumes:
      - /opt/models:/models:ro
    command: >
      --model /models/AdaptKey-Nemotron-30b
      --trust-remote-code
      --max-model-len 8196
      --gpu-memory-utilization 0.90
      --tensor-parallel-size 1
    restart: unless-stopped
```

## Lessons Learned

1. **Anti-forgetting strategy works**: Conservative LoRA params (64/128/0.1) with 5e-5 LR preserved general capabilities
2. **Data quality matters more than quantity**: Improving weak-area examples had more impact than adding more data
3. **System prompt alignment**: Mismatched system prompts (e.g., "telco expert" for ethics questions) hurt performance
4. **Mixed datasets**: Combining diverse telecom subcategories prevents narrow specialization


## License

This model is derived from NVIDIA's Nemotron-3-Nano-30B and is subject to the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Please review the license terms before use in commercial applications.

## Citation

```bibtex
@misc{adaptkey_nemotron_30b_2026,
  title={AdaptKey-Nemotron-30b: A Telecom-Specialized Language Model},
  author={AdaptKey},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/AdaptKey/AdaptKey-Nemotron-30b}
}
```