Spaces:
Runtime error
A newer version of the Gradio SDK is available: 6.11.0
title: Rackspace Knowledge Chatbot
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: '3.10'
app_file: app.py
pinned: false
Rackspace Knowledge Chatbot
This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio.
Features
- Enhanced retrieval with vector database
- Groq API integration
- Public Gradio interface
Usage
- Set your
GROQ_API_KEYin Hugging Face Spaces secrets. - Rebuild the vector DB if missing:
python enhanced_vector_db.py - Chat with the bot!
π― Rackspace Knowledge Chatbot - Enhanced Version
π Quick Start
# Option 1: Use the quick start script
./start_enhanced_chatbot.sh
# Option 2: Manual start
source venv/bin/activate
streamlit run streamlit_app.py
# 3. Open browser: http://localhost:8501
π Enhanced Project Structure
chatbot-rackspace/
βββ streamlit_app.py # Main UI application
βββ enhanced_rag_chatbot.py # Core RAG chatbot
βββ enhanced_vector_db.py # Vector database builder
βββ integrate_training_data.py # Data integration script
βββ config.py # Configuration
βββ requirements.txt # Dependencies
β
βββ data/
β βββ rackspace_knowledge_enhanced.json # 507 documents (13 old + 494 new)
β βββ training_qa_pairs_enhanced.json # 5,327 Q&A pairs (4,107 old + 1,220 new)
β βββ training_data_enhanced.jsonl # 1,220 training entries
β βββ backup_20251125_113739/ # Original data backup
β βββ feedback/ # Feedback directory (ready for use)
β
βββ models/rackspace_finetuned/ # Fine-tuned model (6h 13min)
βββ vector_db/ # ChromaDB (1,158 chunks from 507 docs)
β¨ What's New - Enhanced with Training Data
Data Integration from rackspace-rag-chatbot:
- β 494 new documents - Comprehensive Rackspace documentation
- β 1,220 training examples - Instruction-following Q&A pairs
- β 39x more documents - From 13 to 507 documents
- β 1,158 vector chunks - Enhanced retrieval capability
- β Smart deduplication - No duplicate content
Coverage Improvements:
- β Cloud migration services (AWS, Azure, Google Cloud)
- β Managed services and platform guides
- β Technical documentation and how-to guides
- β Security and compliance topics
- β Database and storage solutions
π― System Status
β Enhanced Data: 507 docs, comprehensive coverage (39x increase) β Proper Embeddings: 1,158 chunks from real content only β No Hallucinations: Responses use actual content with real URLs β Fine-tuned Model: TinyLlama trained 6h 13min β Training Data: 5,327 Q&A pairs for improved responses
π Documentation
- README.md - This file (quick start guide)
- INTEGRATION_SUMMARY.md - Detailed integration report
- FINAL_SYSTEM_STATUS.md - System documentation
π Deploy on Hugging Face Spaces
You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit):
Fork or upload this repo to Hugging Face Spaces
- Go to https://huggingface.co/spaces and create a new Space (Streamlit type).
- Upload your code and
requirements.txt.
Set your GROQ_API_KEY
- In your Space, go to Settings β Secrets and add
GROQ_API_KEY.
- In your Space, go to Settings β Secrets and add
Rebuild the Vector DB (first run only)
- The vector database is not included due to file size limits.
- After deployment, open the Space terminal and run:
python enhanced_vector_db.py - This will create the required ChromaDB files in
vector_db/.
Run the Streamlit app
- The app will start automatically. If the vector DB is missing, it will prompt you to rebuild.
Share your Space link!
π§ Rebuild Vector DB (Local or Hugging Face)
python enhanced_vector_db.py
π Re-run Data Integration
If you need to re-integrate data from rackspace-rag-chatbot:
source venv/bin/activate
python integrate_training_data.py
This will:
- Consolidate chunks into full documents
- Convert training data to Q&A pairs
- Merge with existing data (avoiding duplicates)
- Create automatic backups
Built with YOUR OWN MODEL + Enhanced Training Data! π