OTel-Reranker-4B

OTel-Reranker-4B is a telecom reranker model full-parameter fine-tuned on OTel telecommunications data. It is part of the OTel Family of Models, an open-source initiative to build reference AI resources for the global telecommunications sector.

Across the OTel reranker baselines, OTel fine-tuning improves MRR@10 by +0.535 to +0.598 over the base checkpoints.

Community Use

As of June 23, 2026, the released OTel models had more than 18 million downloads, and the Open Telco AI project had received 157+ pieces of media coverage worldwide.

Model Details

Attribute	Value
Base model	Qwen/Qwen3-4B
Parameters	4B
OTel training dataset	OTel-Reranker
Dataset fields	`sentence_0`, `sentence_1`, `label`
Training method	Full-parameter post-training / fine-tuning
Language	English
OTel release license	Apache 2.0

Model Lineage

Qwen/Qwen3-4B -> OTel-Reranker full-parameter post-training -> farbodtavakkoli/OTel-Reranker-4B

OTel vs. Base Model

Metric	Base model	OTel fine-tuned	Delta	Evaluation split
MRR@10	0.407	0.948 +/- 0.005	+0.541	OTel-Reranker held-out 5%

Standard errors are computed with bootstrap resampling (n=10) over the held-out OTel evaluation partition. MRR@10 measures how quickly the first truly relevant telecom passage is promoted near the top of the reranked list.

Evaluation Caveats

Reranker results measure held-out OTel reranking partitions.
Reported standard errors come from bootstrap resampling over the held-out evaluation partitions.
Reranking quality depends on the candidate passages supplied by the upstream retriever.
External benchmark transfer, multilingual performance, and per-subdomain performance should be evaluated separately for production settings.

Training Data

The model was trained on telecom-focused data curated by 100+ domain experts. The raw corpus contained roughly 1.1M training points and was filtered to 326,767 higher-confidence examples.

Source	Contributor
arXiv telecom papers, 3GPP standards, telecom Wikipedia, telecom Common Crawl	Yale University
GSMA Permanent Reference Documents, Discover portal	GSMA
IETF RFC series	NetoAI
Industry whitepapers	Khalifa University
O-RAN specifications (working groups 1, 2, 4, 5, 6, 7, 8, 9, 10)	University of Leeds
O-RAN documents across working groups	The University of Texas at Dallas

Released datasets: OTel-LLM, OTel-Embedding, OTel-Reranker, and OTel-Safety.

The OTel datasets release derived QA/retrieval/reranking examples rather than the raw source documents.

Each released dataset includes a dataset card and Croissant metadata with Responsible AI fields for data limitations, biases, sensitive-information considerations, use cases, social impact, synthetic-data status, and provenance.

Representative Training Row

OTel-Reranker rows are pointwise cross-encoder relevance examples.

{
  "sentence_0": "The Fronthaul Gateway can translate FH protocol from an O-DUx with split option 7-2 to an O-RUy with split option 8.",
  "sentence_1": "Fronthaul Gateway that can translate FH protocol from an O-DUx with split option x to an O-RUy with split option y, with currently available option 7-2 to 8.",
  "label": 1.0
}

Intended Use

This model is intended to re-score telecom query-passage pairs after an initial retrieval step. It is designed for the reranking stage of a telecom RAG pipeline, where the goal is to promote the most relevant retrieved passages before answer generation.

Training Recipe

Item	Value
Framework	ScalarLM
Optimizer	AdamW, 8-bit
Learning-rate schedule	Cosine decay with warmup
Weight decay	0.01
Warmup steps	100
Random seed	42
Maximum sequence length	1500 tokens
Precision	BF16
Attention	Flash Attention 2
Distributed training	Fully Sharded Data Parallel
Gradient checkpointing	Enabled
Epochs	3 for LLM/embedding models; 2 for rerankers
Compute	AMD MI300X/MI325X/MI355X and NVIDIA A100/H100 GPUs

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "farbodtavakkoli/OTel-Reranker-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    trust_remote_code=True,
)

query = "What is the F1 interface?"
documents = [
    "The F1 interface connects O-DU to O-CU in O-RAN architecture.",
    "5G networks use millimeter wave frequencies.",
]

pairs = [[query, doc] for doc in documents]
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
    scores = model(**inputs).logits.squeeze()
print(scores)

Limitations and Responsible Use

OTel models are domain-specific to telecommunications and should not be treated as general-purpose models.
The current release is English-only and primarily text-centric.
The reported OTel performance results use held-out OTel evaluation partitions and should not be interpreted as results from a fully independent external benchmark suite.
Aggregate scores can hide subdomain variation; collaborator stress tests suggest O-RAN retrieval is comparatively strong, while academic-paper and GSMA PRD examples need further curation.
Generated telecom content should be verified before operational, customer-facing, regulatory, safety, or network-configuration use.
Users must comply with both the OTel release license and the upstream base-model license or terms.

Related Models

Project Resources

Project page: https://huggingface.co/farbodtavakkoli
Code: https://github.com/farbodtavakkoli/OTel
Media coverage list: https://github.com/farbodtavakkoli/OTel/blob/main/docs/media_coverage.md

Citation

@misc{otel_models_2026,
  title  = {OTel: Open Telco AI Datasets, Benchmarks, and Models},
  author = {Tavakkoli, Farbod and others},
  year   = {2026},
  note   = {Open Telco (OTel) model release},
  url    = {https://huggingface.co/farbodtavakkoli}
}