| # Model Card: Chatbot Caca Retrieval | |
| ## Model Description | |
| Lightweight retrieval-based QA system untuk Bahasa Indonesia. | |
| ### Training Data | |
| - **Source:** datasets-caca-3500 | |
| - **Size:** 3,500 conversational QA pairs | |
| - **Language:** Indonesian | |
| - **Format:** User-Assistant conversations | |
| ### Architecture | |
| - **Algorithm:** Hybrid scoring system | |
| - BM25 (40% weight) - keyword matching | |
| - TF-IDF + Cosine Similarity (50% weight) - semantic matching | |
| - Fuzzy String Matching (10% weight) - typo tolerance | |
| ### Performance | |
| | Metric | Value | | |
| |--------|-------| | |
| | Model Size | 2.69 MB | | |
| | Query Latency | <10 ms | | |
| | Memory Usage | ~5 MB RAM | | |
| | Paraphrase Accuracy | High | | |
| ### Limitations | |
| - Only works for questions in dataset or similar paraphrases | |
| - No generative capability | |
| - Limited to Indonesian language | |
| ### Ethical Considerations | |
| - Responses reflect training data (datasets-caca-3500) | |
| - Personality may include sarcasm/humor | |
| - Not suitable for critical applications | |
| ### License | |
| MIT License |