IndoBERT Relevancy Classifier
Context-conditioned relevancy classifier for Indonesian text. Given a context (topic) and a text (news headline or social media post), predicts whether the text is relevant to the context.
Model Details
- Base Model: indobenchmark/indobert-large-p2 (335M params)
- Task: Binary classification (RELEVANT / NOT_RELEVANT)
- Input:
[CLS] context [SEP] text [SEP] - Language: Indonesian (Bahasa Indonesia)
Performance
| Metric | Score |
|---|---|
| Accuracy | 96.5% |
| F1 | 0.948 |
| Precision | 94.8% |
| Recall | 94.8% |
Training Data
31,360 samples from 3 sources:
- News headlines (18.8K) โ scraped from CNBC Indonesia, labeled by GPT-4o-mini
- Social media text (7.9K) โ scraped from Twitter/X, labeled by GPT-4o-mini
- Implicit text (4.7K) โ LLM-generated informal text without explicit keywords
188 unique contexts covering economics, politics, health, technology, sports, entertainment, and more.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "apriandito/indobert-relevancy-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
context = "Kebijakan moneter Bank Indonesia"
text = "BI tahan suku bunga acuan 6 persen"
inputs = tokenizer(context, text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
score = probs[0][1].item() # relevancy score
print(f"Relevancy score: {score:.3f}")
print(f"Relevant: {score > 0.5}")
Training Details
- Epochs: 5
- Batch size: 16
- Learning rate: 2e-5
- Max length: 256
- Class weights: Applied (inverse frequency) to handle class imbalance
- Early stopping: Patience 2, metric F1
Labels
| Label | ID |
|---|---|
| NOT_RELEVANT | 0 |
| RELEVANT | 1 |
Citation
If you use this model in your research, please cite:
@article{saputra2026indobert,
title={IndoBERT-Relevancy: A Context-Conditioned Relevancy Classifier for Indonesian Text},
author={Saputra, Muhammad Apriandito Arya},
year={2026},
doi={10.5281/zenodo.19237938},
url={https://zenodo.org/records/19237938}
}
- Downloads last month
- 45
Model tree for apriandito/indobert-relevancy-classifier
Base model
indobenchmark/indobert-large-p2