File size: 4,520 Bytes
0c89e77
efbbdcc
 
 
 
 
 
 
0c89e77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c0a093e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e84fcf2
c0a093e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e84fcf2
 
efbbdcc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: Rackspace Knowledge Chatbot
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: '3.10'
app_file: app.py
pinned: false
---

# Rackspace Knowledge Chatbot

This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio.

## Features
- Enhanced retrieval with vector database
- Groq API integration
- Public Gradio interface

## Usage
1. Set your `GROQ_API_KEY` in Hugging Face Spaces secrets.
2. Rebuild the vector DB if missing:
   ```bash
   python enhanced_vector_db.py
   ```
3. Chat with the bot!

# 🎯 Rackspace Knowledge Chatbot - Enhanced Version

## πŸš€ Quick Start

```bash
# Option 1: Use the quick start script
./start_enhanced_chatbot.sh

# Option 2: Manual start
source venv/bin/activate
streamlit run streamlit_app.py

# 3. Open browser: http://localhost:8501
```

## πŸ“ Enhanced Project Structure

```
chatbot-rackspace/
β”œβ”€β”€ streamlit_app.py                    # Main UI application
β”œβ”€β”€ enhanced_rag_chatbot.py             # Core RAG chatbot
β”œβ”€β”€ enhanced_vector_db.py               # Vector database builder
β”œβ”€β”€ integrate_training_data.py          # Data integration script
β”œβ”€β”€ config.py                           # Configuration
β”œβ”€β”€ requirements.txt                    # Dependencies
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ rackspace_knowledge_enhanced.json     # 507 documents (13 old + 494 new)
β”‚   β”œβ”€β”€ training_qa_pairs_enhanced.json       # 5,327 Q&A pairs (4,107 old + 1,220 new)
β”‚   β”œβ”€β”€ training_data_enhanced.jsonl          # 1,220 training entries
β”‚   β”œβ”€β”€ backup_20251125_113739/               # Original data backup
β”‚   └── feedback/                             # Feedback directory (ready for use)
β”‚
β”œβ”€β”€ models/rackspace_finetuned/         # Fine-tuned model (6h 13min)
└── vector_db/                          # ChromaDB (1,158 chunks from 507 docs)
```

## ✨ What's New - Enhanced with Training Data

**Data Integration from rackspace-rag-chatbot:**
- βœ… **494 new documents** - Comprehensive Rackspace documentation
- βœ… **1,220 training examples** - Instruction-following Q&A pairs
- βœ… **39x more documents** - From 13 to 507 documents
- βœ… **1,158 vector chunks** - Enhanced retrieval capability
- βœ… **Smart deduplication** - No duplicate content

**Coverage Improvements:**
- βœ… Cloud migration services (AWS, Azure, Google Cloud)
- βœ… Managed services and platform guides
- βœ… Technical documentation and how-to guides
- βœ… Security and compliance topics
- βœ… Database and storage solutions

## 🎯 System Status

βœ… **Enhanced Data**: 507 docs, comprehensive coverage (39x increase)
βœ… **Proper Embeddings**: 1,158 chunks from real content only
βœ… **No Hallucinations**: Responses use actual content with real URLs
βœ… **Fine-tuned Model**: TinyLlama trained 6h 13min
βœ… **Training Data**: 5,327 Q&A pairs for improved responses

## πŸ“ Documentation

- **README.md** - This file (quick start guide)
- **INTEGRATION_SUMMARY.md** - Detailed integration report
- **FINAL_SYSTEM_STATUS.md** - System documentation  


## 🌐 Deploy on Hugging Face Spaces

You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit):

1. **Fork or upload this repo to Hugging Face Spaces**
	- Go to https://huggingface.co/spaces and create a new Space (Streamlit type).
	- Upload your code and `requirements.txt`.

2. **Set your GROQ_API_KEY**
	- In your Space, go to Settings β†’ Secrets and add `GROQ_API_KEY`.

3. **Rebuild the Vector DB (first run only)**
	- The vector database is not included due to file size limits.
	- After deployment, open the Space terminal and run:
	  ```bash
	  python enhanced_vector_db.py
	  ```
	- This will create the required ChromaDB files in `vector_db/`.

4. **Run the Streamlit app**
	- The app will start automatically. If the vector DB is missing, it will prompt you to rebuild.

5. **Share your Space link!**

---

## πŸ”§ Rebuild Vector DB (Local or Hugging Face)

```bash
python enhanced_vector_db.py
```

## πŸ”„ Re-run Data Integration

If you need to re-integrate data from rackspace-rag-chatbot:

```bash
source venv/bin/activate
python integrate_training_data.py
```

This will:
1. Consolidate chunks into full documents
2. Convert training data to Q&A pairs
3. Merge with existing data (avoiding duplicates)
4. Create automatic backups

---

**Built with YOUR OWN MODEL + Enhanced Training Data! πŸš€**