# Indo-Religiolect-BERT

A fine-tuned Indonesian BERT model for classifying religious texts into:
- **Islam**
- **Catholic**
- **Protestant**

## Model Details

- **Base Model**: `indolem/indobert-base-uncased`
- **Task**: Sequence Classification
- **Language**: Indonesian
- **Labels**: Islam (0), Catholic (1), Protestant (2)

## Training Data

Trained on ~2 million Indonesian sentences collected from:
- Catholic websites
- Islamic websites
- Protestant websites

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained("dansachs/indo-religiolect-bert")
model = AutoModelForSequenceClassification.from_pretrained("dansachs/indo-religiolect-bert")

# Predict
text = "Allah adalah Tuhan yang Maha Esa"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()

label_map = {0: 'Islam', 1: 'Catholic', 2: 'Protestant'}
print(f"Prediction: {label_map[prediction]}")
```

## Performance

Model performance metrics are available in the training logs.

## Citation

If you use this model, please cite:
```
@misc{indo-religiolect-bert,
  author = {Dan Sachs},
  title = {Indo-Religiolect-BERT: Indonesian Religious Text Classifier},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/dansachs/indo-religiolect-bert}}
}
```