--- title: Rackspace Knowledge Chatbot emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 6.9.0 python_version: '3.10' app_file: app.py pinned: false --- # Rackspace Knowledge Chatbot This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio. ## Features - Enhanced retrieval with vector database - Groq API integration - Public Gradio interface ## Usage 1. Set your `GROQ_API_KEY` in Hugging Face Spaces secrets. 2. Rebuild the vector DB if missing: ```bash python enhanced_vector_db.py ``` 3. Chat with the bot! # 🎯 Rackspace Knowledge Chatbot - Enhanced Version ## 🚀 Quick Start ```bash # Option 1: Use the quick start script ./start_enhanced_chatbot.sh # Option 2: Manual start source venv/bin/activate streamlit run streamlit_app.py # 3. Open browser: http://localhost:8501 ``` ## 📁 Enhanced Project Structure ``` chatbot-rackspace/ ├── streamlit_app.py # Main UI application ├── enhanced_rag_chatbot.py # Core RAG chatbot ├── enhanced_vector_db.py # Vector database builder ├── integrate_training_data.py # Data integration script ├── config.py # Configuration ├── requirements.txt # Dependencies │ ├── data/ │ ├── rackspace_knowledge_enhanced.json # 507 documents (13 old + 494 new) │ ├── training_qa_pairs_enhanced.json # 5,327 Q&A pairs (4,107 old + 1,220 new) │ ├── training_data_enhanced.jsonl # 1,220 training entries │ ├── backup_20251125_113739/ # Original data backup │ └── feedback/ # Feedback directory (ready for use) │ ├── models/rackspace_finetuned/ # Fine-tuned model (6h 13min) └── vector_db/ # ChromaDB (1,158 chunks from 507 docs) ``` ## ✨ What's New - Enhanced with Training Data **Data Integration from rackspace-rag-chatbot:** - ✅ **494 new documents** - Comprehensive Rackspace documentation - ✅ **1,220 training examples** - Instruction-following Q&A pairs - ✅ **39x more documents** - From 13 to 507 documents - ✅ **1,158 vector chunks** - Enhanced retrieval capability - ✅ **Smart deduplication** - No duplicate content **Coverage Improvements:** - ✅ Cloud migration services (AWS, Azure, Google Cloud) - ✅ Managed services and platform guides - ✅ Technical documentation and how-to guides - ✅ Security and compliance topics - ✅ Database and storage solutions ## 🎯 System Status ✅ **Enhanced Data**: 507 docs, comprehensive coverage (39x increase) ✅ **Proper Embeddings**: 1,158 chunks from real content only ✅ **No Hallucinations**: Responses use actual content with real URLs ✅ **Fine-tuned Model**: TinyLlama trained 6h 13min ✅ **Training Data**: 5,327 Q&A pairs for improved responses ## 📝 Documentation - **README.md** - This file (quick start guide) - **INTEGRATION_SUMMARY.md** - Detailed integration report - **FINAL_SYSTEM_STATUS.md** - System documentation ## 🌐 Deploy on Hugging Face Spaces You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit): 1. **Fork or upload this repo to Hugging Face Spaces** - Go to https://huggingface.co/spaces and create a new Space (Streamlit type). - Upload your code and `requirements.txt`. 2. **Set your GROQ_API_KEY** - In your Space, go to Settings → Secrets and add `GROQ_API_KEY`. 3. **Rebuild the Vector DB (first run only)** - The vector database is not included due to file size limits. - After deployment, open the Space terminal and run: ```bash python enhanced_vector_db.py ``` - This will create the required ChromaDB files in `vector_db/`. 4. **Run the Streamlit app** - The app will start automatically. If the vector DB is missing, it will prompt you to rebuild. 5. **Share your Space link!** --- ## 🔧 Rebuild Vector DB (Local or Hugging Face) ```bash python enhanced_vector_db.py ``` ## 🔄 Re-run Data Integration If you need to re-integrate data from rackspace-rag-chatbot: ```bash source venv/bin/activate python integrate_training_data.py ``` This will: 1. Consolidate chunks into full documents 2. Convert training data to Q&A pairs 3. Merge with existing data (avoiding duplicates) 4. Create automatic backups --- **Built with YOUR OWN MODEL + Enhanced Training Data! 🚀**