IndoBERT Relevancy Classifier

Context-conditioned relevancy classifier for Indonesian text. Given a context (topic) and a text (news headline or social media post), predicts whether the text is relevant to the context.

Model Details

  • Base Model: indobenchmark/indobert-large-p2 (335M params)
  • Task: Binary classification (RELEVANT / NOT_RELEVANT)
  • Input: [CLS] context [SEP] text [SEP]
  • Language: Indonesian (Bahasa Indonesia)

Performance

Metric Score
Accuracy 96.5%
F1 0.948
Precision 94.8%
Recall 94.8%

Training Data

31,360 samples from 3 sources:

  • News headlines (18.8K) โ€” scraped from CNBC Indonesia, labeled by GPT-4o-mini
  • Social media text (7.9K) โ€” scraped from Twitter/X, labeled by GPT-4o-mini
  • Implicit text (4.7K) โ€” LLM-generated informal text without explicit keywords

188 unique contexts covering economics, politics, health, technology, sports, entertainment, and more.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "apriandito/indobert-relevancy-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

context = "Kebijakan moneter Bank Indonesia"
text = "BI tahan suku bunga acuan 6 persen"

inputs = tokenizer(context, text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    score = probs[0][1].item()  # relevancy score

print(f"Relevancy score: {score:.3f}")
print(f"Relevant: {score > 0.5}")

Training Details

  • Epochs: 5
  • Batch size: 16
  • Learning rate: 2e-5
  • Max length: 256
  • Class weights: Applied (inverse frequency) to handle class imbalance
  • Early stopping: Patience 2, metric F1

Labels

Label ID
NOT_RELEVANT 0
RELEVANT 1

Citation

If you use this model in your research, please cite:

@article{saputra2026indobert,
  title={IndoBERT-Relevancy: A Context-Conditioned Relevancy Classifier for Indonesian Text},
  author={Saputra, Muhammad Apriandito Arya},
  year={2026},
  doi={10.5281/zenodo.19237938},
  url={https://zenodo.org/records/19237938}
}
Downloads last month
45
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for apriandito/indobert-relevancy-classifier

Finetuned
(29)
this model