--- license: mit language: - id metrics: - accuracy - f1 base_model: indobenchmark/indobert-base-p1 pipeline_tag: text-classification library_name: transformers tags: - indoBERT - classification - aduan - indonesian model-index: - name: aduan-model results: - task: type: text-classification name: Text Classification dataset: name: Custom Labeled Aduan Dataset type: private split: validation metrics: - type: accuracy value: 0.9389 - type: f1 value: 0.9389 --- # ๐Ÿ“Š Indonesian Complaint Classification Model (IndoBERT) [![Model](https://img.shields.io/badge/๐Ÿค—%20Hugging%20Face-Model-yellow)](https://huggingface.co/Zulkifli1409/aduan-model) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Language](https://img.shields.io/badge/Language-Indonesian-red.svg)](https://en.wikipedia.org/wiki/Indonesian_language) Model klasifikasi teks aduan masyarakat dalam Bahasa Indonesia menggunakan **IndoBERT (indobenchmark/indobert-base-p1)**. Model dapat mengelompokkan aduan ke dalam **5 kategori** dengan akurasi **96.10%**. --- ## ๐Ÿ“‘ Kategori Klasifikasi | Label | Deskripsi | Contoh | |-------|-----------|--------| | **PINALTI** | Konten yang mengandung kata kasar, SARA, pornografi, ujaran kebencian, atau pelanggaran norma | "Kampret pejabat koruptor!", "Konten porno beredar", "Rasis banget pemerintah" | | **DARURAT** | Situasi darurat yang membutuhkan respon segera (kebakaran, kecelakaan, bencana, ancaman nyawa) | "Ada kebakaran besar di pasar!", "Kecelakaan beruntun di tol", "Banjir bandang melanda desa" | | **PRIORITAS** | Permasalahan yang perlu penanganan cepat (infrastruktur rusak, kebersihan, pelayanan publik) | "Jalan berlubang berbahaya", "Sampah menumpuk seminggu", "Lampu jalan mati semua" | | **UMUM** | Pertanyaan informasi, saran, atau aduan non-urgent | "Bagaimana cara mengurus KTP?", "Kapan jadwal posyandu?", "Saran untuk program desa" | | **LAINNYA** | Aduan yang tidak termasuk kategori di atas | "Terima kasih atas pelayanannya", "Hanya ingin menyampaikan apresiasi" | --- ## ๐ŸŽฏ Model Performance ### **Overall Metrics** - **Validation Accuracy**: **96.10%** - **Macro F1-Score**: **0.9608** - **Weighted F1-Score**: **0.9610** - **Average Confidence**: **93.90%** ### **Per-Class Performance** | Label | Precision | Recall | F1-Score | Support | |-------|-----------|--------|----------|---------| | Pinalti | 0.9588 | 0.9645 | 0.9617 | 169 | | Darurat | 0.9453 | 0.9603 | 0.9528 | 126 | | Prioritas | 0.9675 | 0.9675 | 0.9675 | 123 | | Umum | 0.9752 | 0.9593 | 0.9672 | 123 | | Lainnya | 0.9596 | 0.9500 | 0.9548 | 100 | ### **Confusion Matrix** ``` Predicted Pin Dar Pri Umu Lai Actual Pin 163 2 1 0 3 Dar 2 121 2 0 1 Pri 0 3 119 1 0 Umu 2 2 1 118 0 Lai 3 0 0 2 95 ``` --- ## ๐Ÿ“Š Dataset Information - **Total Training Samples**: 3,204 - Pinalti: 844 - Darurat: 630 - Prioritas: 612 - Umum: 616 - Lainnya: 502 - **Train/Val Split**: 80% / 20% (2,563 / 641) - **Augmentation**: Applied to balance classes - **Language**: Indonesian (Bahasa Indonesia) --- ## ๐Ÿš€ Quick Start ### Installation ```bash pip install transformers torch ``` ### Basic Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "Zulkifli1409/aduan-model" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Prepare input text = "Ada kebakaran besar di pasar, tolong kirim pemadam segera!" inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128) # Predict with torch.no_grad(): outputs = model(**inputs) probs = torch.nn.functional.softmax(outputs.logits, dim=1) pred_idx = torch.argmax(probs).item() # Labels labels = ["PINALTI", "DARURAT", "PRIORITAS", "UMUM", "LAINNYA"] print(f"Prediksi: {labels[pred_idx]}") print(f"Confidence: {probs[0][pred_idx].item():.2%}") print(f"\nAll probabilities:") for label, prob in zip(labels, probs[0]): print(f" {label}: {prob.item():.2%}") ``` **Output:** ``` Prediksi: DARURAT Confidence: 96.03% All probabilities: PINALTI: 0.21% DARURAT: 96.03% PRIORITAS: 2.89% UMUM: 0.45% LAINNYA: 0.42% ``` --- ## ๐Ÿงช Example Predictions | Input Text | Prediction | Confidence | |------------|------------|------------| | "Brengsek! Pejabat korup semua!" | **PINALTI** | 94.23% | | "Ada orang kecelakaan parah butuh ambulans" | **DARURAT** | 95.67% | | "Jalan berlubang perlu diperbaiki segera" | **PRIORITAS** | 92.34% | | "Bagaimana cara mengurus surat izin usaha?" | **UMUM** | 89.45% | | "Terima kasih atas bantuannya" | **LAINNYA** | 88.91% | | "Konten porno tersebar di grup WhatsApp" | **PINALTI** | 91.78% | | "Banjir tinggi merendam rumah warga" | **DARURAT** | 93.12% | | "Sampah menumpuk di jalan sejak seminggu lalu" | **PRIORITAS** | 90.56% | --- ## ๐Ÿ”ง Batch Prediction ```python texts = [ "Ada kebakaran di gedung!", "Jalan rusak parah", "Dasar bodoh pemerintah!", "Kapan jadwal vaksinasi?" ] # Tokenize batch inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128) # Predict with torch.no_grad(): outputs = model(**inputs) probs = torch.nn.functional.softmax(outputs.logits, dim=1) predictions = torch.argmax(probs, dim=1) labels = ["PINALTI", "DARURAT", "PRIORITAS", "UMUM", "LAINNYA"] for text, pred_idx, prob in zip(texts, predictions, probs): pred_label = labels[pred_idx] confidence = prob[pred_idx].item() print(f"Text: {text}") print(f"Prediction: {pred_label} ({confidence:.2%})\n") ``` --- ## ๐ŸŒ API Deployment Model ini juga tersedia sebagai REST API di Railway: **Base URL**: `https://api-klasifikasi-aduan.up.railway.app` ### cURL Example ```bash curl -X POST https://api-klasifikasi-aduan.up.railway.app/predict \ -H "Content-Type: application/json" \ -d '{"text": "Ada kebakaran di pasar"}' ``` ### Response ```json { "label": "DARURAT", "confidence": 0.9603, "all_scores": { "PINALTI": 0.0021, "DARURAT": 0.9603, "PRIORITAS": 0.0289, "UMUM": 0.0045, "LAINNYA": 0.0042 } } ``` --- ## ๐Ÿ› ๏ธ Training Details ### Model Architecture - **Base Model**: `indobenchmark/indobert-base-p1` - **Task**: Sequence Classification (5 classes) - **Max Sequence Length**: 128 tokens - **Hidden Size**: 768 - **Attention Heads**: 12 - **Layers**: 12 ### Training Configuration - **GPU**: Tesla T4 (14.74 GB VRAM) - **Precision**: FP16 (Mixed Precision) - **Gradient Checkpointing**: Enabled - **Batch Size**: 2 - **Learning Rate**: 1.5e-5 - **Epochs**: 5 - **Optimizer**: AdamW - **Best Epoch**: 5 ### Training Progress | Epoch | Train Loss | Train Acc | Val Loss | Val Acc | Val F1 | |-------|------------|-----------|----------|---------|--------| | 1 | 0.3688 | 74.87% | 0.0825 | 93.45% | 0.9346 | | 2 | 0.0586 | 95.86% | 0.0604 | 96.10% | 0.9609 | | 3 | 0.0179 | 98.52% | 0.0635 | 96.41% | 0.9641 | | 4 | 0.0069 | 99.38% | 0.0668 | 96.10% | 0.9611 | | 5 | 0.0021 | 99.88% | 0.0623 | **96.10%** | **0.9610** | --- ## โš ๏ธ Important Notes ### Content Moderation (PINALTI) Model ini dapat mendeteksi konten yang tidak pantas, namun **tidak sempurna**. Untuk aplikasi produksi yang sensitif, pertimbangkan: - Layer moderasi tambahan - Human review untuk kasus borderline - Whitelist/blacklist kata kunci eksplisit - Kombinasi dengan rule-based filtering ### Limitations - Model dilatih dengan data aduan masyarakat Indonesia - Performa optimal untuk teks dengan panjang 10-100 kata - Slang atau dialek daerah tertentu mungkin kurang akurat - Context yang ambigu dapat menghasilkan prediksi yang kurang tepat --- ## ๐Ÿ“„ License This model is licensed under **Apache 2.0 License**. --- ## ๐Ÿ“ง Citation & Contact **Developer**: Zulkifli1409 **Hugging Face**: [@Zulkifli1409](https://huggingface.co/Zulkifli1409) Jika Anda menggunakan model ini dalam penelitian atau aplikasi, mohon untuk memberikan kredit yang sesuai. ### BibTeX ```bibtex @misc{zulkifli2025aduan, author = {Zulkifli}, title = {Indonesian Complaint Classification Model with IndoBERT}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/Zulkifli1409/aduan-model}} } ``` --- ## ๐Ÿค Contributing Umpan balik, laporan bug, dan kontribusi sangat diterima! Silakan buka *issue* di repository atau hubungi via Hugging Face. --- **ยฉ 2025 - Klasifikasi Aduan Model**