Chatbot / README.md
deshnaashok's picture
Update README.md
efbbdcc verified

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: Rackspace Knowledge Chatbot
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
python_version: '3.10'
app_file: app.py
pinned: false

Rackspace Knowledge Chatbot

This chatbot answers questions about Rackspace documentation using Groq API and enhanced RAG retrieval. Deployable on Hugging Face Spaces with Gradio.

Features

  • Enhanced retrieval with vector database
  • Groq API integration
  • Public Gradio interface

Usage

  1. Set your GROQ_API_KEY in Hugging Face Spaces secrets.
  2. Rebuild the vector DB if missing:
    python enhanced_vector_db.py
    
  3. Chat with the bot!

🎯 Rackspace Knowledge Chatbot - Enhanced Version

πŸš€ Quick Start

# Option 1: Use the quick start script
./start_enhanced_chatbot.sh

# Option 2: Manual start
source venv/bin/activate
streamlit run streamlit_app.py

# 3. Open browser: http://localhost:8501

πŸ“ Enhanced Project Structure

chatbot-rackspace/
β”œβ”€β”€ streamlit_app.py                    # Main UI application
β”œβ”€β”€ enhanced_rag_chatbot.py             # Core RAG chatbot
β”œβ”€β”€ enhanced_vector_db.py               # Vector database builder
β”œβ”€β”€ integrate_training_data.py          # Data integration script
β”œβ”€β”€ config.py                           # Configuration
β”œβ”€β”€ requirements.txt                    # Dependencies
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ rackspace_knowledge_enhanced.json     # 507 documents (13 old + 494 new)
β”‚   β”œβ”€β”€ training_qa_pairs_enhanced.json       # 5,327 Q&A pairs (4,107 old + 1,220 new)
β”‚   β”œβ”€β”€ training_data_enhanced.jsonl          # 1,220 training entries
β”‚   β”œβ”€β”€ backup_20251125_113739/               # Original data backup
β”‚   └── feedback/                             # Feedback directory (ready for use)
β”‚
β”œβ”€β”€ models/rackspace_finetuned/         # Fine-tuned model (6h 13min)
└── vector_db/                          # ChromaDB (1,158 chunks from 507 docs)

✨ What's New - Enhanced with Training Data

Data Integration from rackspace-rag-chatbot:

  • βœ… 494 new documents - Comprehensive Rackspace documentation
  • βœ… 1,220 training examples - Instruction-following Q&A pairs
  • βœ… 39x more documents - From 13 to 507 documents
  • βœ… 1,158 vector chunks - Enhanced retrieval capability
  • βœ… Smart deduplication - No duplicate content

Coverage Improvements:

  • βœ… Cloud migration services (AWS, Azure, Google Cloud)
  • βœ… Managed services and platform guides
  • βœ… Technical documentation and how-to guides
  • βœ… Security and compliance topics
  • βœ… Database and storage solutions

🎯 System Status

βœ… Enhanced Data: 507 docs, comprehensive coverage (39x increase) βœ… Proper Embeddings: 1,158 chunks from real content only βœ… No Hallucinations: Responses use actual content with real URLs βœ… Fine-tuned Model: TinyLlama trained 6h 13min βœ… Training Data: 5,327 Q&A pairs for improved responses

πŸ“ Documentation

  • README.md - This file (quick start guide)
  • INTEGRATION_SUMMARY.md - Detailed integration report
  • FINAL_SYSTEM_STATUS.md - System documentation

🌐 Deploy on Hugging Face Spaces

You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit):

  1. Fork or upload this repo to Hugging Face Spaces

  2. Set your GROQ_API_KEY

    • In your Space, go to Settings β†’ Secrets and add GROQ_API_KEY.
  3. Rebuild the Vector DB (first run only)

    • The vector database is not included due to file size limits.
    • After deployment, open the Space terminal and run:
      python enhanced_vector_db.py
      
    • This will create the required ChromaDB files in vector_db/.
  4. Run the Streamlit app

    • The app will start automatically. If the vector DB is missing, it will prompt you to rebuild.
  5. Share your Space link!


πŸ”§ Rebuild Vector DB (Local or Hugging Face)

python enhanced_vector_db.py

πŸ”„ Re-run Data Integration

If you need to re-integrate data from rackspace-rag-chatbot:

source venv/bin/activate
python integrate_training_data.py

This will:

  1. Consolidate chunks into full documents
  2. Convert training data to Q&A pairs
  3. Merge with existing data (avoiding duplicates)
  4. Create automatic backups

Built with YOUR OWN MODEL + Enhanced Training Data! πŸš€