| # Indo-Religiolect-BERT | |
| A fine-tuned Indonesian BERT model for classifying religious texts into: | |
| - **Islam** | |
| - **Catholic** | |
| - **Protestant** | |
| ## Model Details | |
| - **Base Model**: `indolem/indobert-base-uncased` | |
| - **Task**: Sequence Classification | |
| - **Language**: Indonesian | |
| - **Labels**: Islam (0), Catholic (1), Protestant (2) | |
| ## Training Data | |
| Trained on ~2 million Indonesian sentences collected from: | |
| - Catholic websites | |
| - Islamic websites | |
| - Protestant websites | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| # Load model | |
| tokenizer = AutoTokenizer.from_pretrained("dansachs/indo-religiolect-bert") | |
| model = AutoModelForSequenceClassification.from_pretrained("dansachs/indo-religiolect-bert") | |
| # Predict | |
| text = "Allah adalah Tuhan yang Maha Esa" | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) | |
| outputs = model(**inputs) | |
| prediction = torch.argmax(outputs.logits, dim=-1).item() | |
| label_map = {0: 'Islam', 1: 'Catholic', 2: 'Protestant'} | |
| print(f"Prediction: {label_map[prediction]}") | |
| ``` | |
| ## Performance | |
| Model performance metrics are available in the training logs. | |
| ## Citation | |
| If you use this model, please cite: | |
| ``` | |
| @misc{indo-religiolect-bert, | |
| author = {Dan Sachs}, | |
| title = {Indo-Religiolect-BERT: Indonesian Religious Text Classifier}, | |
| year = {2025}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/dansachs/indo-religiolect-bert}} | |
| } | |
| ``` | |