File size: 1,003 Bytes
d646643
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Model Card: Chatbot Caca Retrieval

## Model Description

Lightweight retrieval-based QA system untuk Bahasa Indonesia.

### Training Data

- **Source:** datasets-caca-3500
- **Size:** 3,500 conversational QA pairs
- **Language:** Indonesian
- **Format:** User-Assistant conversations

### Architecture

- **Algorithm:** Hybrid scoring system
  - BM25 (40% weight) - keyword matching
  - TF-IDF + Cosine Similarity (50% weight) - semantic matching
  - Fuzzy String Matching (10% weight) - typo tolerance

### Performance

| Metric | Value |
|--------|-------|
| Model Size | 2.69 MB |
| Query Latency | <10 ms |
| Memory Usage | ~5 MB RAM |
| Paraphrase Accuracy | High |

### Limitations

- Only works for questions in dataset or similar paraphrases
- No generative capability
- Limited to Indonesian language

### Ethical Considerations

- Responses reflect training data (datasets-caca-3500)
- Personality may include sarcasm/humor
- Not suitable for critical applications

### License

MIT License