File size: 2,057 Bytes
04474ee
0249527
d5d604b
04474ee
d5d604b
0249527
d5d604b
0249527
 
 
 
 
04474ee
 
 
 
 
 
 
 
 
 
 
 
 
 
d5d604b
0249527
d5d604b
0249527
 
 
d5d604b
0249527
 
 
d5d604b
0249527
 
 
d5d604b
0249527
 
 
 
 
04474ee
d5d604b
0249527
04474ee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
</markdown>
# Malay Claim Classifier

This model is fine-tuned on a dataset of Malaysian claims to classify them into different categories for fact-checking purposes. It's specifically designed to categorize claims in Bahasa Malaysia into 9 main categories.

## Model Description

- **Model Type:** BERT-based sequence classification
- **Language:** Malay/Bahasa Malaysia
- **Base Model:** rmtariq/malay_classification
- **Number of Labels:** 9
- **Labels:** agama, alam sekitar, ekonomi, kesihatan, pendidikan, pengguna, politik, sosial, teknologi
- **Model Size:** 178M parameters
- **Tensor Type:** F32

## Category Descriptions

- **agama:** Religious claims, including halal/haram issues
- **alam sekitar:** Environmental claims, climate, weather, natural disasters
- **ekonomi:** Economic claims, business, finance, trade
- **kesihatan:** Health claims, diseases, treatments, mental health
- **pendidikan:** Education claims, schools, universities, exams
- **pengguna:** Consumer product claims, brands, quality, safety
- **politik:** Political claims, government, policies, elections
- **sosial:** Social claims, culture, entertainment, sports, crime
- **teknologi:** Technology claims, digital, internet, innovations

## Usage

```python
from transformers import BertTokenizer, BertForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = BertTokenizer.from_pretrained("rmtariq/malay_claim_classifier_v2")
model = BertForSequenceClassification.from_pretrained("rmtariq/malay_claim_classifier_v2")

# Prepare input
example_claim = "Benarkah pewarna merah yang digunakan dalam makanan ringan dihasilkan daripada serangga dan tidak halal?"
inputs = tokenizer(example_claim, return_tensors="pt", padding=True, truncation=True, max_length=128)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits
    predicted_class = torch.argmax(predictions, dim=1).item()
    label = model.config.id2label[predicted_class]

print(f"Claim: {example_claim}")
print(f"Predicted Category: {label}")