Spaces:

deshnaashok
/

Chatbot

Runtime error

App Files Files Community

Chatbot / README.md

deshnaashok

Update README.md

efbbdcc verified 23 days ago

preview code

raw

history blame contribute delete

4.52 kB

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

metadata

title: Rackspace Knowledge Chatbot
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: '3.10'
app_file: app.py
pinned: false

Rackspace Knowledge Chatbot

This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio.

Features

Enhanced retrieval with vector database
Groq API integration
Public Gradio interface

Usage

Set your GROQ_API_KEY in Hugging Face Spaces secrets.
Rebuild the vector DB if missing:
```
python enhanced_vector_db.py
```
Chat with the bot!

🎯 Rackspace Knowledge Chatbot - Enhanced Version

🚀 Quick Start

# Option 1: Use the quick start script
./start_enhanced_chatbot.sh

# Option 2: Manual start
source venv/bin/activate
streamlit run streamlit_app.py

# 3. Open browser: http://localhost:8501

📁 Enhanced Project Structure

chatbot-rackspace/
├── streamlit_app.py                    # Main UI application
├── enhanced_rag_chatbot.py             # Core RAG chatbot
├── enhanced_vector_db.py               # Vector database builder
├── integrate_training_data.py          # Data integration script
├── config.py                           # Configuration
├── requirements.txt                    # Dependencies
│
├── data/
│   ├── rackspace_knowledge_enhanced.json     # 507 documents (13 old + 494 new)
│   ├── training_qa_pairs_enhanced.json       # 5,327 Q&A pairs (4,107 old + 1,220 new)
│   ├── training_data_enhanced.jsonl          # 1,220 training entries
│   ├── backup_20251125_113739/               # Original data backup
│   └── feedback/                             # Feedback directory (ready for use)
│
├── models/rackspace_finetuned/         # Fine-tuned model (6h 13min)
└── vector_db/                          # ChromaDB (1,158 chunks from 507 docs)

✨ What's New - Enhanced with Training Data

Data Integration from rackspace-rag-chatbot:

✅ 494 new documents - Comprehensive Rackspace documentation
✅ 1,220 training examples - Instruction-following Q&A pairs
✅ 39x more documents - From 13 to 507 documents
✅ 1,158 vector chunks - Enhanced retrieval capability
✅ Smart deduplication - No duplicate content

Coverage Improvements:

✅ Cloud migration services (AWS, Azure, Google Cloud)
✅ Managed services and platform guides
✅ Technical documentation and how-to guides
✅ Security and compliance topics
✅ Database and storage solutions

🎯 System Status

✅ Enhanced Data: 507 docs, comprehensive coverage (39x increase) ✅ Proper Embeddings: 1,158 chunks from real content only ✅ No Hallucinations: Responses use actual content with real URLs ✅ Fine-tuned Model: TinyLlama trained 6h 13min ✅ Training Data: 5,327 Q&A pairs for improved responses

📝 Documentation

README.md - This file (quick start guide)
INTEGRATION_SUMMARY.md - Detailed integration report
FINAL_SYSTEM_STATUS.md - System documentation

🌐 Deploy on Hugging Face Spaces

You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit):

Fork or upload this repo to Hugging Face Spaces
- Go to https://huggingface.co/spaces and create a new Space (Streamlit type).
- Upload your code and requirements.txt.
Set your GROQ_API_KEY
- In your Space, go to Settings → Secrets and add GROQ_API_KEY.
Rebuild the Vector DB (first run only)
- The vector database is not included due to file size limits.
- After deployment, open the Space terminal and run:
```
python enhanced_vector_db.py
```
- This will create the required ChromaDB files in vector_db/.
Run the Streamlit app
- The app will start automatically. If the vector DB is missing, it will prompt you to rebuild.
Share your Space link!

🔧 Rebuild Vector DB (Local or Hugging Face)

python enhanced_vector_db.py

🔄 Re-run Data Integration

If you need to re-integrate data from rackspace-rag-chatbot:

source venv/bin/activate
python integrate_training_data.py

This will:

Consolidate chunks into full documents
Convert training data to Q&A pairs
Merge with existing data (avoiding duplicates)
Create automatic backups

Built with YOUR OWN MODEL + Enhanced Training Data! 🚀