rmtariq
/

malay_classification

@@ -1,119 +1,77 @@
-# 🇲🇾 Malay Claim Classification Model
-This is a fine-tuned BERT model built to classify claims in Malay (and English) into 21 categories.
-## 📊 Categories
-The model classifies claims into the following categories:
-- `Politik` (Politics)
-- `Perpaduan` (Unity)
-- `Keluarga` (Family)
-- `Belia` (Youth)
-- `Perumahan` (Housing)
-- `Internet` (Internet)
-- `Pengguna` (Consumer)
-- `Makanan` (Food)
-- `Pekerjaan` (Employment)
-- `Pengangkutan` (Transportation)
-- `Sukan` (Sports)
-- `Ekonomi` (Economy)
-- `Hiburan` (Entertainment)
-- `Jenayah` (Crime)
-- `Alam Sekitar` (Environment)
-- `Teknologi` (Technology)
-- `Pendidikan` (Education)
-- `Agama` (Religion)
-- `Sosial` (Social)
-- `Kesihatan` (Health)
-- `Halal` (Halal)
-## 🧠 Base Model
-Fine-tuned from `bert-base-multilingual-cased`, which supports both Malay and English text.
-## 🧪 Example Usage
-```python
-from transformers import AutoModelForSequenceClassification, AutoTokenizer
-import torch
-# Load model and tokenizer
-model_name = "rmtariq/malay_classification"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForSequenceClassification.from_pretrained(model_name)
-# Function to classify a claim
-def classify_claim(claim):
-    # Prepare the input
-    inputs = tokenizer(claim, return_tensors="pt", truncation=True, max_length=128)
-    # Get the prediction
-    with torch.no_grad():
-        outputs = model(**inputs)
-    # Get the predicted class
-    logits = outputs.logits
-    predicted_class_id = logits.argmax().item()
-    # Get the confidence score
-    probabilities = torch.nn.functional.softmax(logits, dim=1)[0]
-    confidence = probabilities[predicted_class_id].item()
-    # Map to category
-    category = model.config.id2label[predicted_class_id]
-    return category, confidence
-# Example claims
-examples = [
-    "Projek mega kerajaan penuh dengan ketirisan.",
-    "Harga barang keperluan naik setiap bulan.",
-    "Program vaksinasi tidak mencakupi golongan luar bandar.",
-    "Makanan di hotel lima bintang tidak jelas status halalnya."
-]
-# Classify each example
-for claim in examples:
-    category, confidence = classify_claim(claim)
-    print(f"Claim: {claim}")
-    print(f"Category: {category}")
-    print(f"Confidence: {confidence:.4f}")
-    print("-" * 50)
-```
-## 📚 Dataset
-Fine-tuned on a custom dataset with 3,675 claims labeled by category, with an 80/20 train/test split.
-## 🔍 Evaluation
-The model achieves high accuracy on the test set, with most predictions having confidence scores above 0.95.
-## 🎯 Specific Claim Patterns
-The model includes special handling for specific claim patterns:
-1. **Police-related claims**: Claims about the police chief, summons, or threats
-   - Example: "Ketua Polis Negara (KPN) Tan Sri Razarudin Husain hantar e-mel berkaitan saman dan berbaur ugutan kepada orang awam"
-   - Category: Jenayah (Crime)
-2. **Zakat-related claims**: Claims about zakat fitrah, rice types, or payment validity
-   - Example: "Zakat fitrah tidak sah jika dibayar tidak mengikut jenis beras yang dimakan"
-   - Category: Agama (Religion)
-3. **Tax-related claims**: Claims about government taxes, especially on palm oil
-   - Example: "Kerajaan akan memperkenalkan cukai khas minyak sawit mentah"
-   - Category: Ekonomi (Economy)
-4. **Consumer product claims**: Claims about contact lenses or online sales
-   - Example: "Kanta lekap tidak boleh dijual secara dalam talian"
-   - Category: Pengguna (Consumer)
-5. **National security claims**: Claims about ammunition, colonization, or enemies
-   - Example: "Penemuan 50 tan kelongsong dan peluru petanda negara bakal dijajah musuh"
-   - Category: Politik (Politics)
-## 📋 License
-MIT License

+---
+library_name: transformers
+base_model: rmtariq/malay_classification
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+model-index:
+- name: malay_classification
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# malay_classification
+This model is a fine-tuned version of [rmtariq/malay_classification](https://huggingface.co/rmtariq/malay_classification) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.0024
+- Accuracy: 0.9990
+- F1: 0.9990
+- Precision: 0.9991
+- Recall: 0.9990
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 8
+- eval_batch_size: 16
+- seed: 42
+- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Accuracy | F1     | Precision | Recall |
+|:-------------:|:------:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
+| 0.1691        | 0.2720 | 500  | 0.1373          | 0.9717   | 0.9717 | 0.9730    | 0.9717 |
+| 0.0493        | 0.5441 | 1000 | 0.0369          | 0.9943   | 0.9943 | 0.9945    | 0.9943 |
+| 0.0669        | 0.8161 | 1500 | 0.0406          | 0.9952   | 0.9952 | 0.9954    | 0.9952 |
+| 0.0287        | 1.0881 | 2000 | 0.0276          | 0.9943   | 0.9944 | 0.9948    | 0.9943 |
+| 0.0061        | 1.3602 | 2500 | 0.0168          | 0.9971   | 0.9971 | 0.9972    | 0.9971 |
+| 0.0137        | 1.6322 | 3000 | 0.0128          | 0.9981   | 0.9981 | 0.9981    | 0.9981 |
+| 0.0178        | 1.9042 | 3500 | 0.0179          | 0.9968   | 0.9968 | 0.9969    | 0.9968 |
+| 0.0112        | 2.1763 | 4000 | 0.0110          | 0.9975   | 0.9975 | 0.9975    | 0.9975 |
+| 0.0001        | 2.4483 | 4500 | 0.0079          | 0.9987   | 0.9987 | 0.9988    | 0.9987 |
+| 0.0001        | 2.7203 | 5000 | 0.0021          | 0.9987   | 0.9987 | 0.9987    | 0.9987 |
+| 0.0003        | 2.9924 | 5500 | 0.0024          | 0.9990   | 0.9990 | 0.9991    | 0.9990 |
+### Framework versions
+- Transformers 4.53.1
+- Pytorch 2.7.1
+- Datasets 3.6.0
+- Tokenizers 0.21.2

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6c1d91e1af8ac7d950b8aaa56fe4210b91e19b4da2a78b94275fe0e46baf0a90
 size 711501908

 version https://git-lfs.github.com/spec/v1
+oid sha256:4b2d031437b7b3ceed085985b1a9a59ced72928e3b8c09fa62fa3969391c8b34
 size 711501908