dansachs's picture
Update README.md
8c8c849 verified
# Indo-Religiolect-BERT
A fine-tuned Indonesian BERT model for classifying religious texts into:
- **Islam**
- **Catholic**
- **Protestant**
## Model Details
- **Base Model**: `indolem/indobert-base-uncased`
- **Task**: Sequence Classification
- **Language**: Indonesian
- **Labels**: Islam (0), Catholic (1), Protestant (2)
## Training Data
Trained on ~2 million Indonesian sentences collected from:
- Catholic websites
- Islamic websites
- Protestant websites
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model
tokenizer = AutoTokenizer.from_pretrained("dansachs/indo-religiolect-bert")
model = AutoModelForSequenceClassification.from_pretrained("dansachs/indo-religiolect-bert")
# Predict
text = "Allah adalah Tuhan yang Maha Esa"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()
label_map = {0: 'Islam', 1: 'Catholic', 2: 'Protestant'}
print(f"Prediction: {label_map[prediction]}")
```
## Performance
Model performance metrics are available in the training logs.
## Citation
If you use this model, please cite:
```
@misc{indo-religiolect-bert,
author = {Dan Sachs},
title = {Indo-Religiolect-BERT: Indonesian Religious Text Classifier},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/dansachs/indo-religiolect-bert}}
}
```