File size: 1,003 Bytes
d646643 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# Model Card: Chatbot Caca Retrieval
## Model Description
Lightweight retrieval-based QA system untuk Bahasa Indonesia.
### Training Data
- **Source:** datasets-caca-3500
- **Size:** 3,500 conversational QA pairs
- **Language:** Indonesian
- **Format:** User-Assistant conversations
### Architecture
- **Algorithm:** Hybrid scoring system
- BM25 (40% weight) - keyword matching
- TF-IDF + Cosine Similarity (50% weight) - semantic matching
- Fuzzy String Matching (10% weight) - typo tolerance
### Performance
| Metric | Value |
|--------|-------|
| Model Size | 2.69 MB |
| Query Latency | <10 ms |
| Memory Usage | ~5 MB RAM |
| Paraphrase Accuracy | High |
### Limitations
- Only works for questions in dataset or similar paraphrases
- No generative capability
- Limited to Indonesian language
### Ethical Considerations
- Responses reflect training data (datasets-caca-3500)
- Personality may include sarcasm/humor
- Not suitable for critical applications
### License
MIT License |