dansachs
/

indo-religiolect-bert

Model card Files Files and versions

indo-religiolect-bert / README.md

dansachs's picture

Update README.md

8c8c849 verified about 1 month ago

|

history blame contribute delete

1.47 kB


	# Indo-Religiolect-BERT

	A fine-tuned Indonesian BERT model for classifying religious texts into:
	- Islam
	- Catholic
	- Protestant

	## Model Details

	- Base Model: `indolem/indobert-base-uncased`
	- Task: Sequence Classification
	- Language: Indonesian
	- Labels: Islam (0), Catholic (1), Protestant (2)

	## Training Data

	Trained on ~2 million Indonesian sentences collected from:
	- Catholic websites
	- Islamic websites
	- Protestant websites

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model
	tokenizer = AutoTokenizer.from_pretrained("dansachs/indo-religiolect-bert")
	model = AutoModelForSequenceClassification.from_pretrained("dansachs/indo-religiolect-bert")

	# Predict
	text = "Allah adalah Tuhan yang Maha Esa"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
	outputs = model(**inputs)
	prediction = torch.argmax(outputs.logits, dim=-1).item()

	label_map = {0: 'Islam', 1: 'Catholic', 2: 'Protestant'}
	print(f"Prediction: {label_map[prediction]}")
	```

	## Performance

	Model performance metrics are available in the training logs.

	## Citation

	If you use this model, please cite:
	```
	@misc{indo-religiolect-bert,
	author = {Dan Sachs},
	title = {Indo-Religiolect-BERT: Indonesian Religious Text Classifier},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/dansachs/indo-religiolect-bert}}
	}
	```