Spaces:

deshnaashok
/

Chatbot

Runtime error

App Files Files Community

Chatbot / README.md

deshnaashok

Update README.md

efbbdcc verified 23 days ago

preview code

raw

history blame contribute delete

4.52 kB

	---
	title: Rackspace Knowledge Chatbot
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 6.9.0
	python_version: '3.10'
	app_file: app.py
	pinned: false
	---

	# Rackspace Knowledge Chatbot

	This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio.

	## Features
	- Enhanced retrieval with vector database
	- Groq API integration
	- Public Gradio interface

	## Usage
	1. Set your `GROQ_API_KEY` in Hugging Face Spaces secrets.
	2. Rebuild the vector DB if missing:
	```bash
	python enhanced_vector_db.py
	```
	3. Chat with the bot!

	# 🎯 Rackspace Knowledge Chatbot - Enhanced Version

	## 🚀 Quick Start

	```bash
	# Option 1: Use the quick start script
	./start_enhanced_chatbot.sh

	# Option 2: Manual start
	source venv/bin/activate
	streamlit run streamlit_app.py

	# 3. Open browser: http://localhost:8501
	```

	## 📁 Enhanced Project Structure

	```
	chatbot-rackspace/
	├── streamlit_app.py # Main UI application
	├── enhanced_rag_chatbot.py # Core RAG chatbot
	├── enhanced_vector_db.py # Vector database builder
	├── integrate_training_data.py # Data integration script
	├── config.py # Configuration
	├── requirements.txt # Dependencies
	│
	├── data/
	│ ├── rackspace_knowledge_enhanced.json # 507 documents (13 old + 494 new)
	│ ├── training_qa_pairs_enhanced.json # 5,327 Q&A pairs (4,107 old + 1,220 new)
	│ ├── training_data_enhanced.jsonl # 1,220 training entries
	│ ├── backup_20251125_113739/ # Original data backup
	│ └── feedback/ # Feedback directory (ready for use)
	│
	├── models/rackspace_finetuned/ # Fine-tuned model (6h 13min)
	└── vector_db/ # ChromaDB (1,158 chunks from 507 docs)
	```

	## ✨ What's New - Enhanced with Training Data

	Data Integration from rackspace-rag-chatbot:
	- ✅ 494 new documents - Comprehensive Rackspace documentation
	- ✅ 1,220 training examples - Instruction-following Q&A pairs
	- ✅ 39x more documents - From 13 to 507 documents
	- ✅ 1,158 vector chunks - Enhanced retrieval capability
	- ✅ Smart deduplication - No duplicate content

	Coverage Improvements:
	- ✅ Cloud migration services (AWS, Azure, Google Cloud)
	- ✅ Managed services and platform guides
	- ✅ Technical documentation and how-to guides
	- ✅ Security and compliance topics
	- ✅ Database and storage solutions

	## 🎯 System Status

	✅ Enhanced Data: 507 docs, comprehensive coverage (39x increase)
	✅ Proper Embeddings: 1,158 chunks from real content only
	✅ No Hallucinations: Responses use actual content with real URLs
	✅ Fine-tuned Model: TinyLlama trained 6h 13min
	✅ Training Data: 5,327 Q&A pairs for improved responses

	## 📝 Documentation

	- README.md - This file (quick start guide)
	- INTEGRATION_SUMMARY.md - Detailed integration report
	- FINAL_SYSTEM_STATUS.md - System documentation


	## 🌐 Deploy on Hugging Face Spaces

	You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit):

	1. Fork or upload this repo to Hugging Face Spaces
	- Go to https://huggingface.co/spaces and create a new Space (Streamlit type).
	- Upload your code and `requirements.txt`.

	2. Set your GROQ_API_KEY
	- In your Space, go to Settings → Secrets and add `GROQ_API_KEY`.

	3. Rebuild the Vector DB (first run only)
	- The vector database is not included due to file size limits.
	- After deployment, open the Space terminal and run:
	```bash
	python enhanced_vector_db.py
	```
	- This will create the required ChromaDB files in `vector_db/`.

	4. Run the Streamlit app
	- The app will start automatically. If the vector DB is missing, it will prompt you to rebuild.

	5. Share your Space link!

	---

	## 🔧 Rebuild Vector DB (Local or Hugging Face)

	```bash
	python enhanced_vector_db.py
	```

	## 🔄 Re-run Data Integration

	If you need to re-integrate data from rackspace-rag-chatbot:

	```bash
	source venv/bin/activate
	python integrate_training_data.py
	```

	This will:
	1. Consolidate chunks into full documents
	2. Convert training data to Q&A pairs
	3. Merge with existing data (avoiding duplicates)
	4. Create automatic backups

	---

	Built with YOUR OWN MODEL + Enhanced Training Data! 🚀