---
title: Rackspace Knowledge Chatbot
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: '3.10'
app_file: app.py
pinned: false
---

# Rackspace Knowledge Chatbot

This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio.

## Features
- Enhanced retrieval with vector database
- Groq API integration
- Public Gradio interface

## Usage
1. Set your `GROQ_API_KEY` in Hugging Face Spaces secrets.
2. Rebuild the vector DB if missing:
   ```bash
   python enhanced_vector_db.py
   ```
3. Chat with the bot!

# 🎯 Rackspace Knowledge Chatbot - Enhanced Version

## 🚀 Quick Start

```bash
# Option 1: Use the quick start script
./start_enhanced_chatbot.sh

# Option 2: Manual start
source venv/bin/activate
streamlit run streamlit_app.py

# 3. Open browser: http://localhost:8501
```

## 📁 Enhanced Project Structure

```
chatbot-rackspace/
├── streamlit_app.py                    # Main UI application
├── enhanced_rag_chatbot.py             # Core RAG chatbot
├── enhanced_vector_db.py               # Vector database builder
├── integrate_training_data.py          # Data integration script
├── config.py                           # Configuration
├── requirements.txt                    # Dependencies
│
├── data/
│   ├── rackspace_knowledge_enhanced.json     # 507 documents (13 old + 494 new)
│   ├── training_qa_pairs_enhanced.json       # 5,327 Q&A pairs (4,107 old + 1,220 new)
│   ├── training_data_enhanced.jsonl          # 1,220 training entries
│   ├── backup_20251125_113739/               # Original data backup
│   └── feedback/                             # Feedback directory (ready for use)
│
├── models/rackspace_finetuned/         # Fine-tuned model (6h 13min)
└── vector_db/                          # ChromaDB (1,158 chunks from 507 docs)
```

## ✨ What's New - Enhanced with Training Data

**Data Integration from rackspace-rag-chatbot:**
- ✅ **494 new documents** - Comprehensive Rackspace documentation
- ✅ **1,220 training examples** - Instruction-following Q&A pairs
- ✅ **39x more documents** - From 13 to 507 documents
- ✅ **1,158 vector chunks** - Enhanced retrieval capability
- ✅ **Smart deduplication** - No duplicate content

**Coverage Improvements:**
- ✅ Cloud migration services (AWS, Azure, Google Cloud)
- ✅ Managed services and platform guides
- ✅ Technical documentation and how-to guides
- ✅ Security and compliance topics
- ✅ Database and storage solutions

## 🎯 System Status

✅ **Enhanced Data**: 507 docs, comprehensive coverage (39x increase)
✅ **Proper Embeddings**: 1,158 chunks from real content only
✅ **No Hallucinations**: Responses use actual content with real URLs
✅ **Fine-tuned Model**: TinyLlama trained 6h 13min
✅ **Training Data**: 5,327 Q&A pairs for improved responses

## 📝 Documentation

- **README.md** - This file (quick start guide)
- **INTEGRATION_SUMMARY.md** - Detailed integration report
- **FINAL_SYSTEM_STATUS.md** - System documentation  


## 🌐 Deploy on Hugging Face Spaces

You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit):

1. **Fork or upload this repo to Hugging Face Spaces**
	- Go to https://huggingface.co/spaces and create a new Space (Streamlit type).
	- Upload your code and `requirements.txt`.

2. **Set your GROQ_API_KEY**
	- In your Space, go to Settings → Secrets and add `GROQ_API_KEY`.

3. **Rebuild the Vector DB (first run only)**
	- The vector database is not included due to file size limits.
	- After deployment, open the Space terminal and run:
	  ```bash
	  python enhanced_vector_db.py
	  ```
	- This will create the required ChromaDB files in `vector_db/`.

4. **Run the Streamlit app**
	- The app will start automatically. If the vector DB is missing, it will prompt you to rebuild.

5. **Share your Space link!**

---

## 🔧 Rebuild Vector DB (Local or Hugging Face)

```bash
python enhanced_vector_db.py
```

## 🔄 Re-run Data Integration

If you need to re-integrate data from rackspace-rag-chatbot:

```bash
source venv/bin/activate
python integrate_training_data.py
```

This will:
1. Consolidate chunks into full documents
2. Convert training data to Q&A pairs
3. Merge with existing data (avoiding duplicates)
4. Create automatic backups

---

**Built with YOUR OWN MODEL + Enhanced Training Data! 🚀**