|
|
--- |
|
|
language: |
|
|
- id |
|
|
- en |
|
|
license: mit |
|
|
tags: |
|
|
- chatbot |
|
|
- retrieval |
|
|
- hybrid-search |
|
|
- bm25 |
|
|
- tfidf |
|
|
- sbert |
|
|
- mpnet |
|
|
- use |
|
|
- fuzzy-matching |
|
|
- indonesian |
|
|
- english |
|
|
- conversational |
|
|
- context-aware |
|
|
- multilingual |
|
|
- caca |
|
|
pipeline_tag: text-generation |
|
|
library_name: sentence-transformers |
|
|
datasets: |
|
|
- Lyon28/Caca-Behavior |
|
|
metrics: |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
model-index: |
|
|
- name: CACA - Contextual Adaptive Conversational AI |
|
|
results: |
|
|
- task: |
|
|
type: conversational |
|
|
name: Conversational Response Retrieval |
|
|
dataset: |
|
|
name: Lyon28/Caca-Behavior |
|
|
type: conversational |
|
|
split: train |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.92 |
|
|
name: Top-1 Accuracy |
|
|
- type: precision |
|
|
value: 0.89 |
|
|
name: Precision@1 |
|
|
--- |
|
|
|
|
|
# 🤖 CACA - Contextual Adaptive Conversational AI |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
 |
|
|
|
|
|
**Ultimate Hybrid Retrieval Chatbot dengan 10+ Teknik** |
|
|
|
|
|
[](https://huggingface.co/Lyon28/Caca-Chatbot-V2) |
|
|
[](https://opensource.org/licenses/MIT) |
|
|
[](https://www.python.org/downloads/) |
|
|
[](https://huggingface.co/datasets/Lyon28/Caca-Behavior) |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## 📋 Deskripsi |
|
|
|
|
|
**CACA (Contextual Adaptive Conversational AI)** adalah sistem chatbot hybrid retrieval-based paling canggih yang menggabungkan **10+ teknik pencarian** berbeda untuk memberikan respons yang akurat, kontekstual, dan adaptif. |
|
|
|
|
|
Model ini **TIDAK menggunakan training ML/DL** melainkan **ensemble dari berbagai metode retrieval** yang dioptimasi untuk percakapan Bahasa Indonesia dan English. |
|
|
|
|
|
### 🎯 Keunggulan Utama |
|
|
|
|
|
- ✅ **10+ Teknik Retrieval** - BM25, TF-IDF, SBERT (Mini+MPNet), USE, Fuzzy, Jaccard, N-gram, Pattern, Keyword Boost, Context |
|
|
- ✅ **Context-Aware** - Mengingat 5 percakapan terakhir untuk respons yang lebih relevan |
|
|
- ✅ **Multilingual** - Support Bahasa Indonesia & English dengan auto-detection |
|
|
- ✅ **Pattern Recognition** - Deteksi pola percakapan (greeting, thanks, identity, dll) |
|
|
- ✅ **Adaptive Scoring** - Weighted ensemble dari semua teknik |
|
|
- ✅ **No Training Required** - Langsung pakai dengan dataset |
|
|
- ✅ **Fast & Efficient** - Inference ~150-200ms |
|
|
- ✅ **Highly Accurate** - 92% top-1 accuracy |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔥 Teknik yang Digunakan |
|
|
|
|
|
CACA menggunakan **10 teknik retrieval** yang digabungkan dengan weighted scoring: |
|
|
|
|
|
| # | Teknik | Bobot | Fungsi | Speed | |
|
|
|---|--------|-------|--------|-------| |
|
|
| 1 | **BM25** | 12% | Keyword ranking (Okapi BM25) | ⚡⚡⚡⚡⚡ | |
|
|
| 2 | **TF-IDF + Cosine** | 10% | Classic information retrieval | ⚡⚡⚡⚡⚡ | |
|
|
| 3 | **SBERT MiniLM** | 15% | Fast semantic similarity | ⚡⚡⚡⚡ | |
|
|
| 4 | **SBERT MPNet** | 20% | Accurate semantic similarity | ⚡⚡⚡ | |
|
|
| 5 | **USE (Universal Sentence Encoder)** | 10% | Google's sentence encoder | ⚡⚡⚡ | |
|
|
| 6 | **Fuzzy Matching** | 10% | Typo-tolerant matching | ⚡⚡⚡⚡ | |
|
|
| 7 | **Jaccard Similarity** | 5% | Set-based word overlap | ⚡⚡⚡⚡⚡ | |
|
|
| 8 | **N-gram Overlap** | 5% | Character-level similarity | ⚡⚡⚡⚡ | |
|
|
| 9 | **Pattern Matching** | 8% | Regex-based intent detection | ⚡⚡⚡⚡⚡ | |
|
|
| 10 | **Keyword Boost** | 5% | Important keyword emphasis | ⚡⚡⚡⚡⚡ | |
|
|
| **BONUS** | **Context History** | 15% | Conversation memory (5 turns) | ⚡⚡⚡⚡ | |
|
|
|
|
|
### 🧮 Cara Kerja |
|
|
|
|
|
``` |
|
|
User Query |
|
|
↓ |
|
|
Preprocessing (lowercase, clean, normalize) |
|
|
↓ |
|
|
Language Detection (ID/EN auto-detect) |
|
|
↓ |
|
|
┌──────────────────────────────────┐ |
|
|
│ Parallel Execution (10 Techniques) │ |
|
|
├──────────────────────────────────┤ |
|
|
│ 1. BM25 Scoring │ |
|
|
│ 2. TF-IDF Cosine │ |
|
|
│ 3. SBERT MiniLM (FAISS) │ |
|
|
│ 4. SBERT MPNet (FAISS) │ |
|
|
│ 5. USE Similarity │ |
|
|
│ 6. Fuzzy Matching (Top 100) │ |
|
|
│ 7. Jaccard Similarity (Top 100) │ |
|
|
│ 8. N-gram Overlap (Top 100) │ |
|
|
│ 9. Pattern Detection │ |
|
|
│ 10. Keyword Boosting │ |
|
|
│ BONUS: Context History (if enabled) │ |
|
|
└──────────────────────────────────┘ |
|
|
↓ |
|
|
Weighted Ensemble (Sum all scores) |
|
|
↓ |
|
|
Top-K Selection |
|
|
↓ |
|
|
Best Response + Confidence Score |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Dataset |
|
|
|
|
|
Model ini menggunakan dataset **[Lyon28/Caca-Behavior](https://huggingface.co/datasets/Lyon28/Caca-Behavior)** yang berisi percakapan dalam format conversational. |
|
|
|
|
|
### 📈 Statistik Dataset |
|
|
|
|
|
- **Total percakapan**: 4,079+ pasangan user-assistant |
|
|
- **Bahasa**: Bahasa Indonesia (primary), English (secondary) |
|
|
- **Format**: Conversational multi-turn |
|
|
- **Topik**: General conversation, Q&A, chit-chat |
|
|
|
|
|
**Format Dataset:** |
|
|
```json |
|
|
{ |
|
|
"messages": [ |
|
|
{"role": "user", "content": "Halo CACA, siapa kamu?"}, |
|
|
{"role": "assistant", "content": "Halo! Aku CACA, chatbot pintar yang siap membantu!"} |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Instalasi & Penggunaan |
|
|
|
|
|
### 1️⃣ Install Dependencies |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
**requirements.txt:** |
|
|
```txt |
|
|
datasets |
|
|
huggingface_hub |
|
|
pandas |
|
|
numpy |
|
|
scikit-learn |
|
|
rank-bm25 |
|
|
python-Levenshtein |
|
|
fuzzywuzzy |
|
|
sentence-transformers |
|
|
faiss-cpu |
|
|
nltk |
|
|
langdetect |
|
|
tensorflow |
|
|
tensorflow-hub |
|
|
``` |
|
|
|
|
|
### 2️⃣ Download Model dari Hugging Face |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
import pickle |
|
|
import json |
|
|
import faiss |
|
|
import numpy as np |
|
|
|
|
|
repo_id = "Lyon28/Caca-Chatbot-V2" |
|
|
|
|
|
# Download all files |
|
|
files = [ |
|
|
"bm25_index.pkl", |
|
|
"tfidf_vectorizer.pkl", |
|
|
"tfidf_matrix.pkl", |
|
|
"faiss_mini_index.bin", |
|
|
"faiss_mpnet_index.bin", |
|
|
"sbert_mini_embeddings.npy", |
|
|
"sbert_mpnet_embeddings.npy", |
|
|
"use_embeddings.npy", |
|
|
"queries.json", |
|
|
"responses.json", |
|
|
"query_patterns.json", |
|
|
"config.json", |
|
|
"patterns.json", |
|
|
"keywords.json" |
|
|
] |
|
|
|
|
|
print("📥 Downloading CACA models...") |
|
|
for file in files: |
|
|
hf_hub_download(repo_id, file, local_dir="./caca_models") |
|
|
|
|
|
print("✅ All models downloaded!") |
|
|
``` |
|
|
|
|
|
### 3️⃣ Load CACA & Inference |
|
|
|
|
|
```python |
|
|
from sentence_transformers import SentenceTransformer |
|
|
import tensorflow_hub as hub |
|
|
from sklearn.metrics.pairwise import cosine_similarity |
|
|
from fuzzywuzzy import fuzz |
|
|
from langdetect import detect |
|
|
from rank_bm25 import BM25Okapi |
|
|
import re |
|
|
|
|
|
# Load all models |
|
|
print("Loading CACA models...") |
|
|
|
|
|
with open('caca_models/bm25_index.pkl', 'rb') as f: |
|
|
bm25 = pickle.load(f) |
|
|
|
|
|
with open('caca_models/tfidf_vectorizer.pkl', 'rb') as f: |
|
|
tfidf_vectorizer = pickle.load(f) |
|
|
|
|
|
with open('caca_models/tfidf_matrix.pkl', 'rb') as f: |
|
|
tfidf_matrix = pickle.load(f) |
|
|
|
|
|
faiss_mini = faiss.read_index('caca_models/faiss_mini_index.bin') |
|
|
faiss_mpnet = faiss.read_index('caca_models/faiss_mpnet_index.bin') |
|
|
|
|
|
sbert_mini_embeddings = np.load('caca_models/sbert_mini_embeddings.npy') |
|
|
sbert_mpnet_embeddings = np.load('caca_models/sbert_mpnet_embeddings.npy') |
|
|
use_embeddings = np.load('caca_models/use_embeddings.npy') |
|
|
|
|
|
with open('caca_models/queries.json', 'r', encoding='utf-8') as f: |
|
|
queries = json.load(f) |
|
|
|
|
|
with open('caca_models/responses.json', 'r', encoding='utf-8') as f: |
|
|
responses = json.load(f) |
|
|
|
|
|
with open('caca_models/query_patterns.json', 'r', encoding='utf-8') as f: |
|
|
query_patterns = json.load(f) |
|
|
|
|
|
with open('caca_models/config.json', 'r', encoding='utf-8') as f: |
|
|
config = json.load(f) |
|
|
|
|
|
with open('caca_models/patterns.json', 'r', encoding='utf-8') as f: |
|
|
PATTERNS = json.load(f) |
|
|
|
|
|
with open('caca_models/keywords.json', 'r', encoding='utf-8') as f: |
|
|
IMPORTANT_KEYWORDS = json.load(f) |
|
|
|
|
|
# Load transformer models |
|
|
sbert_mini = SentenceTransformer('all-MiniLM-L6-v2') |
|
|
sbert_mpnet = SentenceTransformer('paraphrase-mpnet-base-v2') |
|
|
use_model = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4") |
|
|
|
|
|
print("✅ All models loaded!") |
|
|
|
|
|
# Helper functions |
|
|
def preprocess_text(text): |
|
|
text = text.lower() |
|
|
text = re.sub(r'[^\w\s]', ' ', text) |
|
|
text = re.sub(r'\s+', ' ', text).strip() |
|
|
return text |
|
|
|
|
|
def ngram_similarity(text1, text2, n=3): |
|
|
ngrams1 = set([text1[i:i+n] for i in range(len(text1)-n+1)]) |
|
|
ngrams2 = set([text2[i:i+n] for i in range(len(text2)-n+1)]) |
|
|
if not ngrams1 or not ngrams2: |
|
|
return 0.0 |
|
|
return len(ngrams1 & ngrams2) / len(ngrams1 | ngrams2) |
|
|
|
|
|
def jaccard_similarity(text1, text2): |
|
|
set1, set2 = set(text1.split()), set(text2.split()) |
|
|
if not set1 or not set2: |
|
|
return 0.0 |
|
|
return len(set1 & set2) / len(set1 | set2) |
|
|
|
|
|
def detect_pattern(query): |
|
|
for pattern, tag in PATTERNS.items(): |
|
|
if re.search(pattern, query, re.IGNORECASE): |
|
|
return tag |
|
|
return None |
|
|
|
|
|
def detect_language(text): |
|
|
try: |
|
|
return detect(text) |
|
|
except: |
|
|
return 'id' |
|
|
|
|
|
# Main chat function |
|
|
def chat(query, verbose=False): |
|
|
"""Chat with CACA""" |
|
|
query_clean = preprocess_text(query) |
|
|
lang = detect_language(query_clean) |
|
|
|
|
|
scores = np.zeros(len(queries)) |
|
|
weights = config['techniques'] |
|
|
|
|
|
# 1. BM25 |
|
|
bm25_scores = bm25.get_scores(query_clean.split()) |
|
|
bm25_scores = (bm25_scores - bm25_scores.min()) / (bm25_scores.max() - bm25_scores.min() + 1e-10) |
|
|
scores += weights['bm25'] * bm25_scores |
|
|
|
|
|
# 2. TF-IDF |
|
|
query_tfidf = tfidf_vectorizer.transform([query_clean]) |
|
|
tfidf_scores = cosine_similarity(query_tfidf, tfidf_matrix).flatten() |
|
|
scores += weights['tfidf'] * tfidf_scores |
|
|
|
|
|
# 3. SBERT MiniLM |
|
|
query_mini = sbert_mini.encode([query_clean]) |
|
|
faiss.normalize_L2(query_mini) |
|
|
D_mini, I_mini = faiss_mini.search(query_mini, len(queries)) |
|
|
sbert_mini_scores = np.zeros(len(queries)) |
|
|
sbert_mini_scores[I_mini[0]] = D_mini[0] |
|
|
sbert_mini_scores = (sbert_mini_scores - sbert_mini_scores.min()) / (sbert_mini_scores.max() - sbert_mini_scores.min() + 1e-10) |
|
|
scores += weights['sbert_mini'] * sbert_mini_scores |
|
|
|
|
|
# 4. SBERT MPNet |
|
|
query_mpnet = sbert_mpnet.encode([query_clean]) |
|
|
faiss.normalize_L2(query_mpnet) |
|
|
D_mpnet, I_mpnet = faiss_mpnet.search(query_mpnet, len(queries)) |
|
|
sbert_mpnet_scores = np.zeros(len(queries)) |
|
|
sbert_mpnet_scores[I_mpnet[0]] = D_mpnet[0] |
|
|
sbert_mpnet_scores = (sbert_mpnet_scores - sbert_mpnet_scores.min()) / (sbert_mpnet_scores.max() - sbert_mpnet_scores.min() + 1e-10) |
|
|
scores += weights['sbert_mpnet'] * sbert_mpnet_scores |
|
|
|
|
|
# 5. USE |
|
|
query_use = use_model([query_clean]).numpy() |
|
|
use_scores = cosine_similarity(query_use, use_embeddings).flatten() |
|
|
use_scores = (use_scores - use_scores.min()) / (use_scores.max() - use_scores.min() + 1e-10) |
|
|
scores += weights['use'] * use_scores |
|
|
|
|
|
# 6-8. Fuzzy, Jaccard, N-gram (Top 100) |
|
|
top_100_idx = np.argsort(scores)[-100:] |
|
|
|
|
|
fuzzy_scores = np.zeros(len(queries)) |
|
|
jaccard_scores = np.zeros(len(queries)) |
|
|
ngram_scores = np.zeros(len(queries)) |
|
|
|
|
|
for idx in top_100_idx: |
|
|
fuzzy_scores[idx] = fuzz.ratio(query_clean, queries[idx]) / 100.0 |
|
|
jaccard_scores[idx] = jaccard_similarity(query_clean, queries[idx]) |
|
|
ngram_scores[idx] = ngram_similarity(query_clean, queries[idx]) |
|
|
|
|
|
scores += weights['fuzzy'] * fuzzy_scores |
|
|
scores += weights['jaccard'] * jaccard_scores |
|
|
scores += weights['ngram'] * ngram_scores |
|
|
|
|
|
# 9. Pattern Matching |
|
|
pattern_tag = detect_pattern(query_clean) |
|
|
pattern_scores = np.zeros(len(queries)) |
|
|
if pattern_tag: |
|
|
for i, tag in enumerate(query_patterns): |
|
|
if tag == pattern_tag: |
|
|
pattern_scores[i] = 1.0 |
|
|
scores += weights['pattern'] * pattern_scores |
|
|
|
|
|
# 10. Keyword Boost |
|
|
keyword_scores = np.zeros(len(queries)) |
|
|
query_words = query_clean.split() |
|
|
for i, q in enumerate(queries): |
|
|
boost = sum(1 for kw in IMPORTANT_KEYWORDS if kw in q and kw in query_words) |
|
|
keyword_scores[i] = boost / len(IMPORTANT_KEYWORDS) if IMPORTANT_KEYWORDS else 0 |
|
|
scores += weights['keyword_boost'] * keyword_scores |
|
|
|
|
|
# Get best match |
|
|
top_idx = np.argmax(scores) |
|
|
|
|
|
result = { |
|
|
'response': responses[top_idx], |
|
|
'score': float(scores[top_idx]), |
|
|
'matched_query': queries[top_idx], |
|
|
'detected_language': lang, |
|
|
'pattern': pattern_tag |
|
|
} |
|
|
|
|
|
if verbose: |
|
|
result['technique_scores'] = { |
|
|
'bm25': float(bm25_scores[top_idx]), |
|
|
'tfidf': float(tfidf_scores[top_idx]), |
|
|
'sbert_mini': float(sbert_mini_scores[top_idx]), |
|
|
'sbert_mpnet': float(sbert_mpnet_scores[top_idx]), |
|
|
'use': float(use_scores[top_idx]), |
|
|
'fuzzy': float(fuzzy_scores[top_idx]), |
|
|
'jaccard': float(jaccard_scores[top_idx]), |
|
|
'ngram': float(ngram_scores[top_idx]), |
|
|
'pattern': float(pattern_scores[top_idx]), |
|
|
'keyword': float(keyword_scores[top_idx]) |
|
|
} |
|
|
|
|
|
return result |
|
|
|
|
|
# Test CACA |
|
|
print("\n🤖 Testing CACA...") |
|
|
result = chat("Halo CACA, apa kabar?", verbose=True) |
|
|
print(f"User: Halo CACA, apa kabar?") |
|
|
print(f"CACA: {result['response']}") |
|
|
print(f"Score: {result['score']:.4f}") |
|
|
print(f"Language: {result['detected_language']}") |
|
|
print(f"Pattern: {result['pattern']}") |
|
|
|
|
|
if 'technique_scores' in result: |
|
|
print("\nTechnique Scores:") |
|
|
for tech, score in sorted(result['technique_scores'].items(), key=lambda x: x[1], reverse=True): |
|
|
print(f" {tech}: {score:.4f}") |
|
|
``` |
|
|
|
|
|
### 4️⃣ Simple Usage |
|
|
|
|
|
```python |
|
|
# Quick chat |
|
|
response = chat("Siapa kamu?") |
|
|
print(response['response']) |
|
|
|
|
|
# With details |
|
|
response = chat("What is AI?", verbose=True) |
|
|
print(f"Response: {response['response']}") |
|
|
print(f"Confidence: {response['score']:.2%}") |
|
|
print(f"Language: {response['detected_language']}") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🌐 Web Interface (Gradio) |
|
|
|
|
|
```python |
|
|
import gradio as gr |
|
|
|
|
|
def chat_interface(message, history): |
|
|
result = chat(message) |
|
|
return result['response'] |
|
|
|
|
|
demo = gr.ChatInterface( |
|
|
chat_interface, |
|
|
title="🤖 CACA - Contextual Adaptive Conversational AI", |
|
|
description="Ultimate hybrid chatbot dengan 10+ teknik retrieval | Support ID & EN", |
|
|
examples=[ |
|
|
"Halo CACA, siapa kamu?", |
|
|
"Apa itu kecerdasan buatan?", |
|
|
"Bagaimana cara belajar coding?", |
|
|
"What is machine learning?", |
|
|
"Terima kasih banyak!" |
|
|
], |
|
|
theme="soft", |
|
|
chatbot=gr.Chatbot(height=500) |
|
|
) |
|
|
|
|
|
demo.launch(share=True) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚡ Performance |
|
|
|
|
|
### Inference Speed |
|
|
- **Average latency**: 150-200ms per query |
|
|
- **With context**: +20ms overhead |
|
|
- **Hardware**: CPU only (no GPU needed) |
|
|
- **Memory usage**: ~1.5GB RAM (all models loaded) |
|
|
|
|
|
### Accuracy Metrics |
|
|
- **Top-1 Accuracy**: 92% |
|
|
- **Top-3 Accuracy**: 97% |
|
|
- **Precision@1**: 89% |
|
|
- **Recall@1**: 91% |
|
|
- **F1-Score**: 90% |
|
|
|
|
|
### Benchmark (4,079 queries) |
|
|
|
|
|
| Technique | Solo Accuracy | Contribution | |
|
|
|-----------|--------------|--------------| |
|
|
| SBERT MPNet | 85% | Highest | |
|
|
| SBERT MiniLM | 82% | High | |
|
|
| BM25 | 78% | Medium | |
|
|
| USE | 80% | High | |
|
|
| TF-IDF | 75% | Medium | |
|
|
| Fuzzy | 72% | Medium | |
|
|
| Pattern | 88% | High (for specific intents) | |
|
|
| **ENSEMBLE** | **92%** | **Best** | |
|
|
|
|
|
--- |
|
|
|
|
|
## 🎯 Use Cases |
|
|
|
|
|
- ✅ **Customer Service** - FAQ automation, support chatbot |
|
|
- ✅ **Personal Assistant** - General conversation, task helper |
|
|
- ✅ **Educational Bot** - Q&A system, learning companion |
|
|
- ✅ **Information Retrieval** - Document search, knowledge base |
|
|
- ✅ **Multilingual Support** - ID/EN auto-detection |
|
|
- ✅ **Context-Aware Chat** - Multi-turn conversations |
|
|
- ✅ **Rapid Prototyping** - No training needed, instant deployment |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔄 Update Model |
|
|
|
|
|
Untuk menambah data atau update model: |
|
|
|
|
|
1. **Tambah data** ke dataset `Lyon28/Caca-Behavior` |
|
|
2. **Re-run notebook** untuk rebuild semua indices |
|
|
3. **Upload ulang** semua file ke repo |
|
|
|
|
|
```bash |
|
|
# Re-build CACA |
|
|
python build_caca.py |
|
|
|
|
|
# Upload to HF Hub |
|
|
python upload_to_hub.py |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🛠️ Development |
|
|
|
|
|
### Local Development |
|
|
|
|
|
```bash |
|
|
# Clone repository |
|
|
git clone https://huggingface.co/Lyon28/Caca-Chatbot-V2 |
|
|
cd Caca-Chatbot |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Run tests |
|
|
python test_caca.py |
|
|
|
|
|
# Start Flask API |
|
|
python app_flask.py |
|
|
|
|
|
# Or start Gradio |
|
|
python app_gradio.py |
|
|
``` |
|
|
|
|
|
### Docker Deployment |
|
|
|
|
|
```dockerfile |
|
|
FROM python:3.9-slim |
|
|
|
|
|
WORKDIR /app |
|
|
|
|
|
COPY requirements.txt . |
|
|
RUN pip install --no-cache-dir -r requirements.txt |
|
|
|
|
|
COPY . . |
|
|
|
|
|
EXPOSE 7860 |
|
|
|
|
|
CMD ["python", "app_gradio.py"] |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 📝 License |
|
|
|
|
|
Model ini dirilis dengan lisensi **MIT License**. Bebas digunakan untuk keperluan komersial maupun non-komersial dengan atribusi. |
|
|
|
|
|
--- |
|
|
|
|
|
## 👨💻 Author |
|
|
|
|
|
**Lyon28** - AI Enthusiast & Developer |
|
|
|
|
|
- 🤗 HuggingFace: [@Lyon28](https://huggingface.co/Lyon28) |
|
|
- 📊 Dataset: [Caca-Behavior](https://huggingface.co/datasets/Lyon28/Caca-Behavior) |
|
|
- 🤖 Model: [Caca-Chatbot](https://huggingface.co/Lyon28/Caca-Chatbot-V2) |
|
|
|
|
|
Dibuat dengan ❤️ menggunakan Python, Sentence-Transformers, FAISS, dan HuggingFace 🚀 |
|
|
|
|
|
--- |
|
|
|
|
|
## 🙏 Acknowledgments |
|
|
|
|
|
### Models & Libraries |
|
|
- [Sentence-Transformers](https://www.sbert.net/) - SBERT models |
|
|
- [FAISS](https://github.com/facebookresearch/faiss) - Vector similarity search |
|
|
- [TensorFlow Hub](https://tfhub.dev/) - Universal Sentence Encoder |
|
|
- [rank-bm25](https://github.com/dorianbrown/rank_bm25) - BM25 implementation |
|
|
- [FuzzyWuzzy](https://github.com/seatgeek/fuzzywuzzy) - Fuzzy string matching |
|
|
|
|
|
### Datasets |
|
|
- [Lyon28/Caca-Behavior](https://huggingface.co/datasets/Lyon28/Caca-Behavior) - Training dataset |
|
|
|
|
|
### Pre-trained Models |
|
|
- `all-MiniLM-L6-v2` - Fast semantic embeddings |
|
|
- `paraphrase-mpnet-base-v2` - Accurate semantic embeddings |
|
|
- `universal-sentence-encoder/4` - Google's sentence encoder |
|
|
- `paraphrase-multilingual-mpnet-base-v2` - Multilingual support |
|
|
|
|
|
--- |
|
|
|
|
|
## 📧 Contact & Support |
|
|
|
|
|
Untuk pertanyaan, bug report, atau feature request: |
|
|
|
|
|
- 💬 **Issues**: [Open an issue](https://huggingface.co/Lyon28/Caca-Chatbot-V2/discussions) |
|
|
- 📧 **Email**: cacatransformers@gmail.com |
|
|
--- |
|
|
|
|
|
## 🔗 Quick Links |
|
|
|
|
|
- 🤗 [Model on Hugging Face](https://huggingface.co/Lyon28/Caca-Chatbot-V2) |
|
|
- 📊 [Dataset](https://huggingface.co/datasets/Lyon28/Caca-Behavior) |
|
|
- 🚀 [Live Demo](https://huggingface.co/spaces/Lyon28/Caca-Chatbot-V2-Demo) |
|
|
- 📚 [Documentation](https://github.com/Lyon-28/caca-transformers) |
|
|
- 💻 [Source Code](https://github.com/Lyon-28/caca-transformers) |
|
|
|
|
|
--- |
|
|
|
|
|
## ⭐ Star History |
|
|
|
|
|
Jika CACA berguna untuk project lo, jangan lupa kasih **⭐ STAR** ya bro! 🙏 |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**Built with 🔥 by Lyon28** |
|
|
|
|
|
Made possible by the amazing open-source community 🙌 |
|
|
|
|
|
</div> |