Chatbot / README.md
deshnaashok's picture
Update README.md
efbbdcc verified
---
title: Rackspace Knowledge Chatbot
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: '3.10'
app_file: app.py
pinned: false
---
# Rackspace Knowledge Chatbot
This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio.
## Features
- Enhanced retrieval with vector database
- Groq API integration
- Public Gradio interface
## Usage
1. Set your `GROQ_API_KEY` in Hugging Face Spaces secrets.
2. Rebuild the vector DB if missing:
```bash
python enhanced_vector_db.py
```
3. Chat with the bot!
# 🎯 Rackspace Knowledge Chatbot - Enhanced Version
## πŸš€ Quick Start
```bash
# Option 1: Use the quick start script
./start_enhanced_chatbot.sh
# Option 2: Manual start
source venv/bin/activate
streamlit run streamlit_app.py
# 3. Open browser: http://localhost:8501
```
## πŸ“ Enhanced Project Structure
```
chatbot-rackspace/
β”œβ”€β”€ streamlit_app.py # Main UI application
β”œβ”€β”€ enhanced_rag_chatbot.py # Core RAG chatbot
β”œβ”€β”€ enhanced_vector_db.py # Vector database builder
β”œβ”€β”€ integrate_training_data.py # Data integration script
β”œβ”€β”€ config.py # Configuration
β”œβ”€β”€ requirements.txt # Dependencies
β”‚
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ rackspace_knowledge_enhanced.json # 507 documents (13 old + 494 new)
β”‚ β”œβ”€β”€ training_qa_pairs_enhanced.json # 5,327 Q&A pairs (4,107 old + 1,220 new)
β”‚ β”œβ”€β”€ training_data_enhanced.jsonl # 1,220 training entries
β”‚ β”œβ”€β”€ backup_20251125_113739/ # Original data backup
β”‚ └── feedback/ # Feedback directory (ready for use)
β”‚
β”œβ”€β”€ models/rackspace_finetuned/ # Fine-tuned model (6h 13min)
└── vector_db/ # ChromaDB (1,158 chunks from 507 docs)
```
## ✨ What's New - Enhanced with Training Data
**Data Integration from rackspace-rag-chatbot:**
- βœ… **494 new documents** - Comprehensive Rackspace documentation
- βœ… **1,220 training examples** - Instruction-following Q&A pairs
- βœ… **39x more documents** - From 13 to 507 documents
- βœ… **1,158 vector chunks** - Enhanced retrieval capability
- βœ… **Smart deduplication** - No duplicate content
**Coverage Improvements:**
- βœ… Cloud migration services (AWS, Azure, Google Cloud)
- βœ… Managed services and platform guides
- βœ… Technical documentation and how-to guides
- βœ… Security and compliance topics
- βœ… Database and storage solutions
## 🎯 System Status
βœ… **Enhanced Data**: 507 docs, comprehensive coverage (39x increase)
βœ… **Proper Embeddings**: 1,158 chunks from real content only
βœ… **No Hallucinations**: Responses use actual content with real URLs
βœ… **Fine-tuned Model**: TinyLlama trained 6h 13min
βœ… **Training Data**: 5,327 Q&A pairs for improved responses
## πŸ“ Documentation
- **README.md** - This file (quick start guide)
- **INTEGRATION_SUMMARY.md** - Detailed integration report
- **FINAL_SYSTEM_STATUS.md** - System documentation
## 🌐 Deploy on Hugging Face Spaces
You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit):
1. **Fork or upload this repo to Hugging Face Spaces**
- Go to https://huggingface.co/spaces and create a new Space (Streamlit type).
- Upload your code and `requirements.txt`.
2. **Set your GROQ_API_KEY**
- In your Space, go to Settings β†’ Secrets and add `GROQ_API_KEY`.
3. **Rebuild the Vector DB (first run only)**
- The vector database is not included due to file size limits.
- After deployment, open the Space terminal and run:
```bash
python enhanced_vector_db.py
```
- This will create the required ChromaDB files in `vector_db/`.
4. **Run the Streamlit app**
- The app will start automatically. If the vector DB is missing, it will prompt you to rebuild.
5. **Share your Space link!**
---
## πŸ”§ Rebuild Vector DB (Local or Hugging Face)
```bash
python enhanced_vector_db.py
```
## πŸ”„ Re-run Data Integration
If you need to re-integrate data from rackspace-rag-chatbot:
```bash
source venv/bin/activate
python integrate_training_data.py
```
This will:
1. Consolidate chunks into full documents
2. Convert training data to Q&A pairs
3. Merge with existing data (avoiding duplicates)
4. Create automatic backups
---
**Built with YOUR OWN MODEL + Enhanced Training Data! πŸš€**