File size: 2,057 Bytes
04474ee 0249527 d5d604b 04474ee d5d604b 0249527 d5d604b 0249527 04474ee d5d604b 0249527 d5d604b 0249527 d5d604b 0249527 d5d604b 0249527 d5d604b 0249527 04474ee d5d604b 0249527 04474ee | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | </markdown>
# Malay Claim Classifier
This model is fine-tuned on a dataset of Malaysian claims to classify them into different categories for fact-checking purposes. It's specifically designed to categorize claims in Bahasa Malaysia into 9 main categories.
## Model Description
- **Model Type:** BERT-based sequence classification
- **Language:** Malay/Bahasa Malaysia
- **Base Model:** rmtariq/malay_classification
- **Number of Labels:** 9
- **Labels:** agama, alam sekitar, ekonomi, kesihatan, pendidikan, pengguna, politik, sosial, teknologi
- **Model Size:** 178M parameters
- **Tensor Type:** F32
## Category Descriptions
- **agama:** Religious claims, including halal/haram issues
- **alam sekitar:** Environmental claims, climate, weather, natural disasters
- **ekonomi:** Economic claims, business, finance, trade
- **kesihatan:** Health claims, diseases, treatments, mental health
- **pendidikan:** Education claims, schools, universities, exams
- **pengguna:** Consumer product claims, brands, quality, safety
- **politik:** Political claims, government, policies, elections
- **sosial:** Social claims, culture, entertainment, sports, crime
- **teknologi:** Technology claims, digital, internet, innovations
## Usage
```python
from transformers import BertTokenizer, BertForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = BertTokenizer.from_pretrained("rmtariq/malay_claim_classifier_v2")
model = BertForSequenceClassification.from_pretrained("rmtariq/malay_claim_classifier_v2")
# Prepare input
example_claim = "Benarkah pewarna merah yang digunakan dalam makanan ringan dihasilkan daripada serangga dan tidak halal?"
inputs = tokenizer(example_claim, return_tensors="pt", padding=True, truncation=True, max_length=128)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
predicted_class = torch.argmax(predictions, dim=1).item()
label = model.config.id2label[predicted_class]
print(f"Claim: {example_claim}")
print(f"Predicted Category: {label}") |