Caca-Chatbot / model_card.md
Lyon28's picture
Create model_card.md
d646643 verified
# Model Card: Chatbot Caca Retrieval
## Model Description
Lightweight retrieval-based QA system untuk Bahasa Indonesia.
### Training Data
- **Source:** datasets-caca-3500
- **Size:** 3,500 conversational QA pairs
- **Language:** Indonesian
- **Format:** User-Assistant conversations
### Architecture
- **Algorithm:** Hybrid scoring system
- BM25 (40% weight) - keyword matching
- TF-IDF + Cosine Similarity (50% weight) - semantic matching
- Fuzzy String Matching (10% weight) - typo tolerance
### Performance
| Metric | Value |
|--------|-------|
| Model Size | 2.69 MB |
| Query Latency | <10 ms |
| Memory Usage | ~5 MB RAM |
| Paraphrase Accuracy | High |
### Limitations
- Only works for questions in dataset or similar paraphrases
- No generative capability
- Limited to Indonesian language
### Ethical Considerations
- Responses reflect training data (datasets-caca-3500)
- Personality may include sarcasm/humor
- Not suitable for critical applications
### License
MIT License