Spaces:
Runtime error
Runtime error
File size: 4,520 Bytes
0c89e77 efbbdcc 0c89e77 c0a093e e84fcf2 c0a093e e84fcf2 efbbdcc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | ---
title: Rackspace Knowledge Chatbot
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: '3.10'
app_file: app.py
pinned: false
---
# Rackspace Knowledge Chatbot
This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio.
## Features
- Enhanced retrieval with vector database
- Groq API integration
- Public Gradio interface
## Usage
1. Set your `GROQ_API_KEY` in Hugging Face Spaces secrets.
2. Rebuild the vector DB if missing:
```bash
python enhanced_vector_db.py
```
3. Chat with the bot!
# π― Rackspace Knowledge Chatbot - Enhanced Version
## π Quick Start
```bash
# Option 1: Use the quick start script
./start_enhanced_chatbot.sh
# Option 2: Manual start
source venv/bin/activate
streamlit run streamlit_app.py
# 3. Open browser: http://localhost:8501
```
## π Enhanced Project Structure
```
chatbot-rackspace/
βββ streamlit_app.py # Main UI application
βββ enhanced_rag_chatbot.py # Core RAG chatbot
βββ enhanced_vector_db.py # Vector database builder
βββ integrate_training_data.py # Data integration script
βββ config.py # Configuration
βββ requirements.txt # Dependencies
β
βββ data/
β βββ rackspace_knowledge_enhanced.json # 507 documents (13 old + 494 new)
β βββ training_qa_pairs_enhanced.json # 5,327 Q&A pairs (4,107 old + 1,220 new)
β βββ training_data_enhanced.jsonl # 1,220 training entries
β βββ backup_20251125_113739/ # Original data backup
β βββ feedback/ # Feedback directory (ready for use)
β
βββ models/rackspace_finetuned/ # Fine-tuned model (6h 13min)
βββ vector_db/ # ChromaDB (1,158 chunks from 507 docs)
```
## β¨ What's New - Enhanced with Training Data
**Data Integration from rackspace-rag-chatbot:**
- β
**494 new documents** - Comprehensive Rackspace documentation
- β
**1,220 training examples** - Instruction-following Q&A pairs
- β
**39x more documents** - From 13 to 507 documents
- β
**1,158 vector chunks** - Enhanced retrieval capability
- β
**Smart deduplication** - No duplicate content
**Coverage Improvements:**
- β
Cloud migration services (AWS, Azure, Google Cloud)
- β
Managed services and platform guides
- β
Technical documentation and how-to guides
- β
Security and compliance topics
- β
Database and storage solutions
## π― System Status
β
**Enhanced Data**: 507 docs, comprehensive coverage (39x increase)
β
**Proper Embeddings**: 1,158 chunks from real content only
β
**No Hallucinations**: Responses use actual content with real URLs
β
**Fine-tuned Model**: TinyLlama trained 6h 13min
β
**Training Data**: 5,327 Q&A pairs for improved responses
## π Documentation
- **README.md** - This file (quick start guide)
- **INTEGRATION_SUMMARY.md** - Detailed integration report
- **FINAL_SYSTEM_STATUS.md** - System documentation
## π Deploy on Hugging Face Spaces
You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit):
1. **Fork or upload this repo to Hugging Face Spaces**
- Go to https://huggingface.co/spaces and create a new Space (Streamlit type).
- Upload your code and `requirements.txt`.
2. **Set your GROQ_API_KEY**
- In your Space, go to Settings β Secrets and add `GROQ_API_KEY`.
3. **Rebuild the Vector DB (first run only)**
- The vector database is not included due to file size limits.
- After deployment, open the Space terminal and run:
```bash
python enhanced_vector_db.py
```
- This will create the required ChromaDB files in `vector_db/`.
4. **Run the Streamlit app**
- The app will start automatically. If the vector DB is missing, it will prompt you to rebuild.
5. **Share your Space link!**
---
## π§ Rebuild Vector DB (Local or Hugging Face)
```bash
python enhanced_vector_db.py
```
## π Re-run Data Integration
If you need to re-integrate data from rackspace-rag-chatbot:
```bash
source venv/bin/activate
python integrate_training_data.py
```
This will:
1. Consolidate chunks into full documents
2. Convert training data to Q&A pairs
3. Merge with existing data (avoiding duplicates)
4. Create automatic backups
---
**Built with YOUR OWN MODEL + Enhanced Training Data! π** |