Indo-Religiolect-BERT

A fine-tuned Indonesian BERT model for classifying religious texts into:

Islam
Catholic
Protestant

Model Details

Base Model: indolem/indobert-base-uncased
Task: Sequence Classification
Language: Indonesian
Labels: Islam (0), Catholic (1), Protestant (2)

Training Data

Trained on ~2 million Indonesian sentences collected from:

Catholic websites
Islamic websites
Protestant websites

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained("dansachs/indo-religiolect-bert")
model = AutoModelForSequenceClassification.from_pretrained("dansachs/indo-religiolect-bert")

# Predict
text = "Allah adalah Tuhan yang Maha Esa"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()

label_map = {0: 'Islam', 1: 'Catholic', 2: 'Protestant'}
print(f"Prediction: {label_map[prediction]}")

Performance

Model performance metrics are available in the training logs.

Citation

If you use this model, please cite:

@misc{indo-religiolect-bert,
  author = {Dan Sachs},
  title = {Indo-Religiolect-BERT: Indonesian Religious Text Classifier},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/dansachs/indo-religiolect-bert}}
}

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support