# Indo-Religiolect-BERT A fine-tuned Indonesian BERT model for classifying religious texts into: - **Islam** - **Catholic** - **Protestant** ## Model Details - **Base Model**: `indolem/indobert-base-uncased` - **Task**: Sequence Classification - **Language**: Indonesian - **Labels**: Islam (0), Catholic (1), Protestant (2) ## Training Data Trained on ~2 million Indonesian sentences collected from: - Catholic websites - Islamic websites - Protestant websites ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model tokenizer = AutoTokenizer.from_pretrained("dansachs/indo-religiolect-bert") model = AutoModelForSequenceClassification.from_pretrained("dansachs/indo-religiolect-bert") # Predict text = "Allah adalah Tuhan yang Maha Esa" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) outputs = model(**inputs) prediction = torch.argmax(outputs.logits, dim=-1).item() label_map = {0: 'Islam', 1: 'Catholic', 2: 'Protestant'} print(f"Prediction: {label_map[prediction]}") ``` ## Performance Model performance metrics are available in the training logs. ## Citation If you use this model, please cite: ``` @misc{indo-religiolect-bert, author = {Dan Sachs}, title = {Indo-Religiolect-BERT: Indonesian Religious Text Classifier}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/dansachs/indo-religiolect-bert}} } ```