IndoBERT Relevancy Classifier

Context-conditioned relevancy classifier for Indonesian text. Given a context (topic) and a text (news headline or social media post), predicts whether the text is relevant to the context.

Model Details

Base Model: indobenchmark/indobert-large-p2 (335M params)
Task: Binary classification (RELEVANT / NOT_RELEVANT)
Input: [CLS] context [SEP] text [SEP]
Language: Indonesian (Bahasa Indonesia)

Performance

Metric	Score
Accuracy	96.5%
F1	0.948
Precision	94.8%
Recall	94.8%

Training Data

31,360 samples from 3 sources:

News headlines (18.8K) — scraped from CNBC Indonesia, labeled by GPT-4o-mini
Social media text (7.9K) — scraped from Twitter/X, labeled by GPT-4o-mini
Implicit text (4.7K) — LLM-generated informal text without explicit keywords

188 unique contexts covering economics, politics, health, technology, sports, entertainment, and more.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "apriandito/indobert-relevancy-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

context = "Kebijakan moneter Bank Indonesia"
text = "BI tahan suku bunga acuan 6 persen"

inputs = tokenizer(context, text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    score = probs[0][1].item()  # relevancy score

print(f"Relevancy score: {score:.3f}")
print(f"Relevant: {score > 0.5}")

Training Details

Epochs: 5
Batch size: 16
Learning rate: 2e-5
Max length: 256
Class weights: Applied (inverse frequency) to handle class imbalance
Early stopping: Patience 2, metric F1

Labels

Label	ID
NOT_RELEVANT	0
RELEVANT	1

Citation

If you use this model in your research, please cite:

@article{saputra2026indobert,
  title={IndoBERT-Relevancy: A Context-Conditioned Relevancy Classifier for Indonesian Text},
  author={Saputra, Muhammad Apriandito Arya},
  year={2026},
  doi={10.5281/zenodo.19237938},
  url={https://zenodo.org/records/19237938}
}

Downloads last month: 256

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for apriandito/indobert-relevancy-classifier

Base model

indobenchmark/indobert-large-p2

Finetuned

(29)

this model