Spaces:
Runtime error
Runtime error
| title: Rackspace Knowledge Chatbot | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.9.0 | |
| python_version: '3.10' | |
| app_file: app.py | |
| pinned: false | |
| # Rackspace Knowledge Chatbot | |
| This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio. | |
| ## Features | |
| - Enhanced retrieval with vector database | |
| - Groq API integration | |
| - Public Gradio interface | |
| ## Usage | |
| 1. Set your `GROQ_API_KEY` in Hugging Face Spaces secrets. | |
| 2. Rebuild the vector DB if missing: | |
| ```bash | |
| python enhanced_vector_db.py | |
| ``` | |
| 3. Chat with the bot! | |
| # π― Rackspace Knowledge Chatbot - Enhanced Version | |
| ## π Quick Start | |
| ```bash | |
| # Option 1: Use the quick start script | |
| ./start_enhanced_chatbot.sh | |
| # Option 2: Manual start | |
| source venv/bin/activate | |
| streamlit run streamlit_app.py | |
| # 3. Open browser: http://localhost:8501 | |
| ``` | |
| ## π Enhanced Project Structure | |
| ``` | |
| chatbot-rackspace/ | |
| βββ streamlit_app.py # Main UI application | |
| βββ enhanced_rag_chatbot.py # Core RAG chatbot | |
| βββ enhanced_vector_db.py # Vector database builder | |
| βββ integrate_training_data.py # Data integration script | |
| βββ config.py # Configuration | |
| βββ requirements.txt # Dependencies | |
| β | |
| βββ data/ | |
| β βββ rackspace_knowledge_enhanced.json # 507 documents (13 old + 494 new) | |
| β βββ training_qa_pairs_enhanced.json # 5,327 Q&A pairs (4,107 old + 1,220 new) | |
| β βββ training_data_enhanced.jsonl # 1,220 training entries | |
| β βββ backup_20251125_113739/ # Original data backup | |
| β βββ feedback/ # Feedback directory (ready for use) | |
| β | |
| βββ models/rackspace_finetuned/ # Fine-tuned model (6h 13min) | |
| βββ vector_db/ # ChromaDB (1,158 chunks from 507 docs) | |
| ``` | |
| ## β¨ What's New - Enhanced with Training Data | |
| **Data Integration from rackspace-rag-chatbot:** | |
| - β **494 new documents** - Comprehensive Rackspace documentation | |
| - β **1,220 training examples** - Instruction-following Q&A pairs | |
| - β **39x more documents** - From 13 to 507 documents | |
| - β **1,158 vector chunks** - Enhanced retrieval capability | |
| - β **Smart deduplication** - No duplicate content | |
| **Coverage Improvements:** | |
| - β Cloud migration services (AWS, Azure, Google Cloud) | |
| - β Managed services and platform guides | |
| - β Technical documentation and how-to guides | |
| - β Security and compliance topics | |
| - β Database and storage solutions | |
| ## π― System Status | |
| β **Enhanced Data**: 507 docs, comprehensive coverage (39x increase) | |
| β **Proper Embeddings**: 1,158 chunks from real content only | |
| β **No Hallucinations**: Responses use actual content with real URLs | |
| β **Fine-tuned Model**: TinyLlama trained 6h 13min | |
| β **Training Data**: 5,327 Q&A pairs for improved responses | |
| ## π Documentation | |
| - **README.md** - This file (quick start guide) | |
| - **INTEGRATION_SUMMARY.md** - Detailed integration report | |
| - **FINAL_SYSTEM_STATUS.md** - System documentation | |
| ## π Deploy on Hugging Face Spaces | |
| You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit): | |
| 1. **Fork or upload this repo to Hugging Face Spaces** | |
| - Go to https://huggingface.co/spaces and create a new Space (Streamlit type). | |
| - Upload your code and `requirements.txt`. | |
| 2. **Set your GROQ_API_KEY** | |
| - In your Space, go to Settings β Secrets and add `GROQ_API_KEY`. | |
| 3. **Rebuild the Vector DB (first run only)** | |
| - The vector database is not included due to file size limits. | |
| - After deployment, open the Space terminal and run: | |
| ```bash | |
| python enhanced_vector_db.py | |
| ``` | |
| - This will create the required ChromaDB files in `vector_db/`. | |
| 4. **Run the Streamlit app** | |
| - The app will start automatically. If the vector DB is missing, it will prompt you to rebuild. | |
| 5. **Share your Space link!** | |
| --- | |
| ## π§ Rebuild Vector DB (Local or Hugging Face) | |
| ```bash | |
| python enhanced_vector_db.py | |
| ``` | |
| ## π Re-run Data Integration | |
| If you need to re-integrate data from rackspace-rag-chatbot: | |
| ```bash | |
| source venv/bin/activate | |
| python integrate_training_data.py | |
| ``` | |
| This will: | |
| 1. Consolidate chunks into full documents | |
| 2. Convert training data to Q&A pairs | |
| 3. Merge with existing data (avoiding duplicates) | |
| 4. Create automatic backups | |
| --- | |
| **Built with YOUR OWN MODEL + Enhanced Training Data! π** |