OTel-Reranker-4B
OTel-Reranker-4B is a telecom reranker model full-parameter fine-tuned on OTel telecommunications data. It is part of the OTel Family of Models, an open-source initiative to build reference AI resources for the global telecommunications sector.
Across the OTel reranker baselines, OTel fine-tuning improves MRR@10 by +0.535 to +0.598 over the base checkpoints.
Community Use
As of June 23, 2026, the released OTel models had more than 18 million downloads, and the Open Telco AI project had received 157+ pieces of media coverage worldwide.
Model Details
| Attribute | Value |
|---|---|
| Base model | Qwen/Qwen3-4B |
| Parameters | 4B |
| OTel training dataset | OTel-Reranker |
| Dataset fields | sentence_0, sentence_1, label |
| Training method | Full-parameter post-training / fine-tuning |
| Language | English |
| OTel release license | Apache 2.0 |
Model Lineage
Qwen/Qwen3-4B -> OTel-Reranker full-parameter post-training -> farbodtavakkoli/OTel-Reranker-4B
OTel vs. Base Model
| Metric | Base model | OTel fine-tuned | Delta | Evaluation split |
|---|---|---|---|---|
| MRR@10 | 0.407 | 0.948 +/- 0.005 | +0.541 | OTel-Reranker held-out 5% |
Standard errors are computed with bootstrap resampling (n=10) over the held-out OTel evaluation partition. MRR@10 measures how quickly the first truly relevant telecom passage is promoted near the top of the reranked list.
Evaluation Caveats
- Reranker results measure held-out OTel reranking partitions.
- Reported standard errors come from bootstrap resampling over the held-out evaluation partitions.
- Reranking quality depends on the candidate passages supplied by the upstream retriever.
- External benchmark transfer, multilingual performance, and per-subdomain performance should be evaluated separately for production settings.
Training Data
The model was trained on telecom-focused data curated by 100+ domain experts. The raw corpus contained roughly 1.1M training points and was filtered to 326,767 higher-confidence examples.
| Source | Contributor |
|---|---|
| arXiv telecom papers, 3GPP standards, telecom Wikipedia, telecom Common Crawl | Yale University |
| GSMA Permanent Reference Documents, Discover portal | GSMA |
| IETF RFC series | NetoAI |
| Industry whitepapers | Khalifa University |
| O-RAN specifications (working groups 1, 2, 4, 5, 6, 7, 8, 9, 10) | University of Leeds |
| O-RAN documents across working groups | The University of Texas at Dallas |
Released datasets: OTel-LLM, OTel-Embedding, OTel-Reranker, and OTel-Safety.
The OTel datasets release derived QA/retrieval/reranking examples rather than the raw source documents.
Each released dataset includes a dataset card and Croissant metadata with Responsible AI fields for data limitations, biases, sensitive-information considerations, use cases, social impact, synthetic-data status, and provenance.
Representative Training Row
OTel-Reranker rows are pointwise cross-encoder relevance examples.
{
"sentence_0": "The Fronthaul Gateway can translate FH protocol from an O-DUx with split option 7-2 to an O-RUy with split option 8.",
"sentence_1": "Fronthaul Gateway that can translate FH protocol from an O-DUx with split option x to an O-RUy with split option y, with currently available option 7-2 to 8.",
"label": 1.0
}
Intended Use
This model is intended to re-score telecom query-passage pairs after an initial retrieval step. It is designed for the reranking stage of a telecom RAG pipeline, where the goal is to promote the most relevant retrieved passages before answer generation.
Training Recipe
| Item | Value |
|---|---|
| Framework | ScalarLM |
| Optimizer | AdamW, 8-bit |
| Learning-rate schedule | Cosine decay with warmup |
| Weight decay | 0.01 |
| Warmup steps | 100 |
| Random seed | 42 |
| Maximum sequence length | 1500 tokens |
| Precision | BF16 |
| Attention | Flash Attention 2 |
| Distributed training | Fully Sharded Data Parallel |
| Gradient checkpointing | Enabled |
| Epochs | 3 for LLM/embedding models; 2 for rerankers |
| Compute | AMD MI300X/MI325X/MI355X and NVIDIA A100/H100 GPUs |
Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_name = "farbodtavakkoli/OTel-Reranker-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
trust_remote_code=True,
)
query = "What is the F1 interface?"
documents = [
"The F1 interface connects O-DU to O-CU in O-RAN architecture.",
"5G networks use millimeter wave frequencies.",
]
pairs = [[query, doc] for doc in documents]
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
scores = model(**inputs).logits.squeeze()
print(scores)
Limitations and Responsible Use
- OTel models are domain-specific to telecommunications and should not be treated as general-purpose models.
- The current release is English-only and primarily text-centric.
- The reported OTel performance results use held-out OTel evaluation partitions and should not be interpreted as results from a fully independent external benchmark suite.
- Aggregate scores can hide subdomain variation; collaborator stress tests suggest O-RAN retrieval is comparatively strong, while academic-paper and GSMA PRD examples need further curation.
- Generated telecom content should be verified before operational, customer-facing, regulatory, safety, or network-configuration use.
- Users must comply with both the OTel release license and the upstream base-model license or terms.
Related Models
Project Resources
- Project page: https://huggingface.co/farbodtavakkoli
- Code: https://github.com/farbodtavakkoli/OTel
- Media coverage list: https://github.com/farbodtavakkoli/OTel/blob/main/docs/media_coverage.md
Citation
@misc{otel_models_2026,
title = {OTel: Open Telco AI Datasets, Benchmarks, and Models},
author = {Tavakkoli, Farbod and others},
year = {2026},
note = {Open Telco (OTel) model release},
url = {https://huggingface.co/farbodtavakkoli}
}
Contact
For technical questions, contact farbod.tavakkoli@att.com or farbodtavakoli@gmail.com.
- Downloads last month
- 49