Spaces:
Runtime error
Runtime error
Commit ·
c0a093e
1
Parent(s): e84fcf2
Add all necessary files for Hugging Face Spaces Gradio chatbot deployment
Browse files- README.md +115 -10
- app.py +22 -0
- config.py +124 -0
- enhanced_rag_chatbot.py +933 -0
- enhanced_vector_db.py +239 -0
- rackspace_knowledge_clean.json +93 -0
- rackspace_knowledge_complete.json +0 -0
- rackspace_knowledge_enhanced.json +0 -0
- rackspace_knowledge_from_raw.json +0 -0
- rag_chatbot.py +279 -0
- rebuild_rag_system.py +318 -0
- requirements.txt +17 -0
- test_groq.py +56 -0
- training_data.jsonl +0 -0
- training_data_enhanced.jsonl +0 -0
- training_qa_pairs.json +0 -0
- training_qa_pairs_enhanced.json +0 -0
README.md
CHANGED
|
@@ -1,13 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
|
|
|
| 1 |
+
# 🎯 Rackspace Knowledge Chatbot - Enhanced Version
|
| 2 |
+
|
| 3 |
+
## 🚀 Quick Start
|
| 4 |
+
|
| 5 |
+
```bash
|
| 6 |
+
# Option 1: Use the quick start script
|
| 7 |
+
./start_enhanced_chatbot.sh
|
| 8 |
+
|
| 9 |
+
# Option 2: Manual start
|
| 10 |
+
source venv/bin/activate
|
| 11 |
+
streamlit run streamlit_app.py
|
| 12 |
+
|
| 13 |
+
# 3. Open browser: http://localhost:8501
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
## 📁 Enhanced Project Structure
|
| 17 |
+
|
| 18 |
+
```
|
| 19 |
+
chatbot-rackspace/
|
| 20 |
+
├── streamlit_app.py # Main UI application
|
| 21 |
+
├── enhanced_rag_chatbot.py # Core RAG chatbot
|
| 22 |
+
├── enhanced_vector_db.py # Vector database builder
|
| 23 |
+
├── integrate_training_data.py # Data integration script
|
| 24 |
+
├── config.py # Configuration
|
| 25 |
+
├── requirements.txt # Dependencies
|
| 26 |
+
│
|
| 27 |
+
├── data/
|
| 28 |
+
│ ├── rackspace_knowledge_enhanced.json # 507 documents (13 old + 494 new)
|
| 29 |
+
│ ├── training_qa_pairs_enhanced.json # 5,327 Q&A pairs (4,107 old + 1,220 new)
|
| 30 |
+
│ ├── training_data_enhanced.jsonl # 1,220 training entries
|
| 31 |
+
│ ├── backup_20251125_113739/ # Original data backup
|
| 32 |
+
│ └── feedback/ # Feedback directory (ready for use)
|
| 33 |
+
│
|
| 34 |
+
├── models/rackspace_finetuned/ # Fine-tuned model (6h 13min)
|
| 35 |
+
└── vector_db/ # ChromaDB (1,158 chunks from 507 docs)
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
## ✨ What's New - Enhanced with Training Data
|
| 39 |
+
|
| 40 |
+
**Data Integration from rackspace-rag-chatbot:**
|
| 41 |
+
- ✅ **494 new documents** - Comprehensive Rackspace documentation
|
| 42 |
+
- ✅ **1,220 training examples** - Instruction-following Q&A pairs
|
| 43 |
+
- ✅ **39x more documents** - From 13 to 507 documents
|
| 44 |
+
- ✅ **1,158 vector chunks** - Enhanced retrieval capability
|
| 45 |
+
- ✅ **Smart deduplication** - No duplicate content
|
| 46 |
+
|
| 47 |
+
**Coverage Improvements:**
|
| 48 |
+
- ✅ Cloud migration services (AWS, Azure, Google Cloud)
|
| 49 |
+
- ✅ Managed services and platform guides
|
| 50 |
+
- ✅ Technical documentation and how-to guides
|
| 51 |
+
- ✅ Security and compliance topics
|
| 52 |
+
- ✅ Database and storage solutions
|
| 53 |
+
|
| 54 |
+
## 🎯 System Status
|
| 55 |
+
|
| 56 |
+
✅ **Enhanced Data**: 507 docs, comprehensive coverage (39x increase)
|
| 57 |
+
✅ **Proper Embeddings**: 1,158 chunks from real content only
|
| 58 |
+
✅ **No Hallucinations**: Responses use actual content with real URLs
|
| 59 |
+
✅ **Fine-tuned Model**: TinyLlama trained 6h 13min
|
| 60 |
+
✅ **Training Data**: 5,327 Q&A pairs for improved responses
|
| 61 |
+
|
| 62 |
+
## 📝 Documentation
|
| 63 |
+
|
| 64 |
+
- **README.md** - This file (quick start guide)
|
| 65 |
+
- **INTEGRATION_SUMMARY.md** - Detailed integration report
|
| 66 |
+
- **FINAL_SYSTEM_STATUS.md** - System documentation
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
## 🌐 Deploy on Hugging Face Spaces
|
| 70 |
+
|
| 71 |
+
You can deploy this chatbot publicly using Hugging Face Spaces (Streamlit):
|
| 72 |
+
|
| 73 |
+
1. **Fork or upload this repo to Hugging Face Spaces**
|
| 74 |
+
- Go to https://huggingface.co/spaces and create a new Space (Streamlit type).
|
| 75 |
+
- Upload your code and `requirements.txt`.
|
| 76 |
+
|
| 77 |
+
2. **Set your GROQ_API_KEY**
|
| 78 |
+
- In your Space, go to Settings → Secrets and add `GROQ_API_KEY`.
|
| 79 |
+
|
| 80 |
+
3. **Rebuild the Vector DB (first run only)**
|
| 81 |
+
- The vector database is not included due to file size limits.
|
| 82 |
+
- After deployment, open the Space terminal and run:
|
| 83 |
+
```bash
|
| 84 |
+
python enhanced_vector_db.py
|
| 85 |
+
```
|
| 86 |
+
- This will create the required ChromaDB files in `vector_db/`.
|
| 87 |
+
|
| 88 |
+
4. **Run the Streamlit app**
|
| 89 |
+
- The app will start automatically. If the vector DB is missing, it will prompt you to rebuild.
|
| 90 |
+
|
| 91 |
+
5. **Share your Space link!**
|
| 92 |
+
|
| 93 |
---
|
| 94 |
+
|
| 95 |
+
## 🔧 Rebuild Vector DB (Local or Hugging Face)
|
| 96 |
+
|
| 97 |
+
```bash
|
| 98 |
+
python enhanced_vector_db.py
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
## 🔄 Re-run Data Integration
|
| 102 |
+
|
| 103 |
+
If you need to re-integrate data from rackspace-rag-chatbot:
|
| 104 |
+
|
| 105 |
+
```bash
|
| 106 |
+
source venv/bin/activate
|
| 107 |
+
python integrate_training_data.py
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
This will:
|
| 111 |
+
1. Consolidate chunks into full documents
|
| 112 |
+
2. Convert training data to Q&A pairs
|
| 113 |
+
3. Merge with existing data (avoiding duplicates)
|
| 114 |
+
4. Create automatic backups
|
| 115 |
+
|
| 116 |
---
|
| 117 |
|
| 118 |
+
**Built with YOUR OWN MODEL + Enhanced Training Data! 🚀**
|
app.py
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
from enhanced_rag_chatbot import get_chatbot
|
| 3 |
+
|
| 4 |
+
chatbot = get_chatbot()
|
| 5 |
+
|
| 6 |
+
# Simple chat function for Gradio
|
| 7 |
+
|
| 8 |
+
def chat_fn(message, mode="extract"):
|
| 9 |
+
try:
|
| 10 |
+
response = chatbot.chat(message, mode=mode)
|
| 11 |
+
return response
|
| 12 |
+
except Exception as e:
|
| 13 |
+
return f"❌ Error: {str(e)}"
|
| 14 |
+
|
| 15 |
+
iface = gr.ChatInterface(
|
| 16 |
+
fn=lambda message, history: chat_fn(message),
|
| 17 |
+
title="Rackspace Knowledge Assistant",
|
| 18 |
+
description="Ask questions about Rackspace documentation. Uses Groq API and enhanced RAG retrieval.",
|
| 19 |
+
theme="default",
|
| 20 |
+
)
|
| 21 |
+
|
| 22 |
+
iface.launch()
|
config.py
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Configuration file for Rackspace Knowledge Chatbot
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
from dotenv import load_dotenv
|
| 7 |
+
|
| 8 |
+
# Load environment variables from .env file
|
| 9 |
+
load_dotenv()
|
| 10 |
+
|
| 11 |
+
# Project paths
|
| 12 |
+
PROJECT_ROOT = Path(__file__).parent
|
| 13 |
+
DATA_DIR = PROJECT_ROOT / "data"
|
| 14 |
+
MODELS_DIR = PROJECT_ROOT / "models"
|
| 15 |
+
VECTOR_DB_DIR = PROJECT_ROOT / "vector_db"
|
| 16 |
+
LOGS_DIR = PROJECT_ROOT / "logs"
|
| 17 |
+
|
| 18 |
+
# Create directories
|
| 19 |
+
for dir_path in [DATA_DIR, MODELS_DIR, VECTOR_DB_DIR, LOGS_DIR]:
|
| 20 |
+
dir_path.mkdir(exist_ok=True)
|
| 21 |
+
|
| 22 |
+
# Model configuration - GROQ API
|
| 23 |
+
GROQ_API_KEY = os.environ.get("GROQ_API_KEY")
|
| 24 |
+
GROQ_MODEL = "openai/gpt-oss-120b" # OpenAI GPT OSS 120B model via Groq
|
| 25 |
+
# Alternative Groq models:
|
| 26 |
+
# - "llama-3.3-70b-versatile"
|
| 27 |
+
# - "llama-3.1-70b-versatile"
|
| 28 |
+
# - "mixtral-8x7b-32768"
|
| 29 |
+
# - "gemma2-9b-it"
|
| 30 |
+
|
| 31 |
+
# Legacy local model configs (no longer used)
|
| 32 |
+
# BASE_MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
|
| 33 |
+
# FINE_TUNED_MODEL_PATH = MODELS_DIR / "rackspace_finetuned"
|
| 34 |
+
|
| 35 |
+
# Embedding model for RAG
|
| 36 |
+
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2" # Fast and efficient
|
| 37 |
+
|
| 38 |
+
# Vector database configuration
|
| 39 |
+
VECTOR_DB_NAME = "rackspace_knowledge"
|
| 40 |
+
COLLECTION_NAME = "rackspace_docs"
|
| 41 |
+
CHUNK_SIZE = 512 # Size of text chunks for embedding
|
| 42 |
+
CHUNK_OVERLAP = 50 # Overlap between chunks
|
| 43 |
+
TOP_K_RETRIEVAL = 5 # Number of relevant documents to retrieve
|
| 44 |
+
|
| 45 |
+
# Fine-tuning configuration
|
| 46 |
+
LORA_R = 16
|
| 47 |
+
LORA_ALPHA = 32
|
| 48 |
+
LORA_DROPOUT = 0.05
|
| 49 |
+
LEARNING_RATE = 2e-4
|
| 50 |
+
BATCH_SIZE = 4 # Optimized for 16GB RAM
|
| 51 |
+
GRADIENT_ACCUMULATION_STEPS = 4
|
| 52 |
+
NUM_EPOCHS = 3
|
| 53 |
+
MAX_LENGTH = 512
|
| 54 |
+
WARMUP_STEPS = 100
|
| 55 |
+
|
| 56 |
+
# Generation configuration
|
| 57 |
+
MAX_NEW_TOKENS = 256
|
| 58 |
+
TEMPERATURE = 0.7
|
| 59 |
+
TOP_P = 0.9
|
| 60 |
+
DO_SAMPLE = True
|
| 61 |
+
|
| 62 |
+
# Chat history configuration
|
| 63 |
+
MAX_HISTORY_LENGTH = 5 # Number of conversation turns to maintain
|
| 64 |
+
|
| 65 |
+
# Data collection URLs - Comprehensive coverage of ALL Rackspace domains
|
| 66 |
+
RACKSPACE_URLS = [
|
| 67 |
+
# Main website - complete sections
|
| 68 |
+
"https://www.rackspace.com/",
|
| 69 |
+
"https://www.rackspace.com/cloud",
|
| 70 |
+
"https://www.rackspace.com/cloud-services",
|
| 71 |
+
"https://www.rackspace.com/managed-services",
|
| 72 |
+
"https://www.rackspace.com/professional-services",
|
| 73 |
+
"https://www.rackspace.com/security",
|
| 74 |
+
"https://www.rackspace.com/data-services",
|
| 75 |
+
"https://www.rackspace.com/solutions",
|
| 76 |
+
"https://www.rackspace.com/applications",
|
| 77 |
+
"https://www.rackspace.com/multicloud",
|
| 78 |
+
"https://www.rackspace.com/company",
|
| 79 |
+
"https://www.rackspace.com/blog",
|
| 80 |
+
"https://www.rackspace.com/resources",
|
| 81 |
+
"https://www.rackspace.com/industries",
|
| 82 |
+
"https://www.rackspace.com/partners",
|
| 83 |
+
|
| 84 |
+
# Documentation sites - comprehensive technical docs
|
| 85 |
+
"https://docs.rackspace.com/",
|
| 86 |
+
"https://docs.rackspace.com/docs",
|
| 87 |
+
"https://docs-ospc.rackspace.com/",
|
| 88 |
+
|
| 89 |
+
# Developer resources
|
| 90 |
+
"https://developer.rackspace.com/",
|
| 91 |
+
"https://developer.rackspace.com/docs",
|
| 92 |
+
|
| 93 |
+
# Product-specific
|
| 94 |
+
"https://www.rackspace.com/aws",
|
| 95 |
+
"https://www.rackspace.com/microsoft-azure",
|
| 96 |
+
"https://www.rackspace.com/google-cloud",
|
| 97 |
+
"https://www.rackspace.com/vmware",
|
| 98 |
+
"https://www.rackspace.com/openstack",
|
| 99 |
+
|
| 100 |
+
# SPOT marketplace
|
| 101 |
+
"https://spot.rackspace.com/",
|
| 102 |
+
"https://spot.rackspace.com/innovations",
|
| 103 |
+
]
|
| 104 |
+
|
| 105 |
+
# Allowed domains for crawling (BFS will stay within these)
|
| 106 |
+
ALLOWED_DOMAINS = [
|
| 107 |
+
"rackspace.com",
|
| 108 |
+
"docs.rackspace.com",
|
| 109 |
+
"docs-ospc.rackspace.com",
|
| 110 |
+
"spot.rackspace.com",
|
| 111 |
+
"www.rackspace.com",
|
| 112 |
+
"developer.rackspace.com",
|
| 113 |
+
]
|
| 114 |
+
|
| 115 |
+
# Enhanced crawling configuration for comprehensive data collection
|
| 116 |
+
MAX_CRAWL_DEPTH = 4 # Go deeper for better coverage
|
| 117 |
+
MAX_PAGES_PER_DOMAIN = 200 # More pages per domain
|
| 118 |
+
CRAWL_DELAY = 0.5 # Faster crawling (still polite)
|
| 119 |
+
REQUEST_TIMEOUT = 20 # Longer timeout for complex pages
|
| 120 |
+
MIN_CONTENT_LENGTH = 200 # Minimum text length to be useful
|
| 121 |
+
|
| 122 |
+
# Device configuration (for M3 Mac)
|
| 123 |
+
DEVICE = "mps" # Metal Performance Shaders for Apple Silicon
|
| 124 |
+
USE_MPS = True
|
enhanced_rag_chatbot.py
ADDED
|
@@ -0,0 +1,933 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Enhanced RAG Chatbot with Training Data Integration
|
| 3 |
+
This version:
|
| 4 |
+
1. Uses the enhanced vector database
|
| 5 |
+
2. Leverages training Q&A pairs for better responses
|
| 6 |
+
3. Provides accurate, context-based answers
|
| 7 |
+
4. No repetitive navigation text
|
| 8 |
+
5. Uses Groq API for fast, high-quality responses
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import chromadb
|
| 12 |
+
from chromadb.config import Settings
|
| 13 |
+
from sentence_transformers import SentenceTransformer
|
| 14 |
+
from groq import Groq
|
| 15 |
+
from typing import List, Dict, Tuple
|
| 16 |
+
import re
|
| 17 |
+
|
| 18 |
+
from config import (
|
| 19 |
+
VECTOR_DB_DIR, EMBEDDING_MODEL, GROQ_API_KEY, GROQ_MODEL,
|
| 20 |
+
TOP_K_RETRIEVAL
|
| 21 |
+
)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
class EnhancedRAGChatbot:
|
| 25 |
+
"""Enhanced RAG chatbot with Groq API"""
|
| 26 |
+
|
| 27 |
+
def __init__(self):
|
| 28 |
+
print("🤖 Initializing Enhanced RAG Chatbot with Groq...")
|
| 29 |
+
|
| 30 |
+
# Initialize Groq client
|
| 31 |
+
if not GROQ_API_KEY:
|
| 32 |
+
raise ValueError("GROQ_API_KEY environment variable not set!")
|
| 33 |
+
|
| 34 |
+
self.groq_client = Groq(api_key=GROQ_API_KEY)
|
| 35 |
+
self.groq_model = GROQ_MODEL
|
| 36 |
+
print(f"✅ Using Groq model: {self.groq_model}")
|
| 37 |
+
|
| 38 |
+
# Load vector database
|
| 39 |
+
print("📚 Loading vector database...")
|
| 40 |
+
self.client = chromadb.PersistentClient(
|
| 41 |
+
path=str(VECTOR_DB_DIR),
|
| 42 |
+
settings=Settings(anonymized_telemetry=False)
|
| 43 |
+
)
|
| 44 |
+
self.collection = self.client.get_collection("rackspace_knowledge")
|
| 45 |
+
|
| 46 |
+
# Load embedding model (still needed for RAG)
|
| 47 |
+
print(f"🔤 Loading embedding model: {EMBEDDING_MODEL}")
|
| 48 |
+
self.embedding_model = SentenceTransformer(EMBEDDING_MODEL)
|
| 49 |
+
|
| 50 |
+
# Conversation history
|
| 51 |
+
self.conversation_history: List[Dict[str, str]] = []
|
| 52 |
+
|
| 53 |
+
print("✅ Enhanced RAG Chatbot with Groq ready!")
|
| 54 |
+
|
| 55 |
+
try:
|
| 56 |
+
self.collection = self.client.get_collection("rackspace_knowledge")
|
| 57 |
+
except Exception as e:
|
| 58 |
+
print("❌ Vector DB or collection missing! Attempting to rebuild...")
|
| 59 |
+
try:
|
| 60 |
+
import subprocess
|
| 61 |
+
subprocess.run(["python", "enhanced_vector_db.py"], check=True)
|
| 62 |
+
self.collection = self.client.get_collection("rackspace_knowledge")
|
| 63 |
+
print("✅ Vector DB rebuilt successfully!")
|
| 64 |
+
except Exception as rebuild_e:
|
| 65 |
+
print("❌ Failed to rebuild vector DB. Please run 'python enhanced_vector_db.py' manually.")
|
| 66 |
+
self.collection = None
|
| 67 |
+
"""Retrieve relevant context with source information"""
|
| 68 |
+
# Generate query embedding
|
| 69 |
+
query_embedding = self.embedding_model.encode([query])[0]
|
| 70 |
+
|
| 71 |
+
# Search vector database - get top K most relevant chunks
|
| 72 |
+
results = self.collection.query(
|
| 73 |
+
query_embeddings=[query_embedding.tolist()],
|
| 74 |
+
n_results=top_k
|
| 75 |
+
)
|
| 76 |
+
|
| 77 |
+
# Process results - ONLY real documents with URLs
|
| 78 |
+
context_parts = []
|
| 79 |
+
sources = []
|
| 80 |
+
seen_urls = set() # Track unique URLs
|
| 81 |
+
seen_content = set() # Track unique content
|
| 82 |
+
|
| 83 |
+
# Get distances for relevance scoring (lower distance = more relevant)
|
| 84 |
+
distances = results.get('distances', [[]])[0] if 'distances' in results else []
|
| 85 |
+
|
| 86 |
+
for idx, (doc, metadata) in enumerate(zip(results['documents'][0], results['metadatas'][0])):
|
| 87 |
+
# Skip duplicates
|
| 88 |
+
doc_hash = hash(doc[:100])
|
| 89 |
+
if doc_hash in seen_content:
|
| 90 |
+
continue
|
| 91 |
+
seen_content.add(doc_hash)
|
| 92 |
+
|
| 93 |
+
# Add document chunk
|
| 94 |
+
context_parts.append(doc)
|
| 95 |
+
|
| 96 |
+
# Get URL and title
|
| 97 |
+
url = metadata.get('url', 'N/A')
|
| 98 |
+
title = metadata.get('title', 'N/A')
|
| 99 |
+
|
| 100 |
+
# Only add source if URL is unique and valid
|
| 101 |
+
if url and url != 'N/A' and url not in seen_urls:
|
| 102 |
+
seen_urls.add(url)
|
| 103 |
+
|
| 104 |
+
# Add relevance score (distance from query)
|
| 105 |
+
relevance = 1.0 - (distances[idx] if idx < len(distances) else 0.5)
|
| 106 |
+
|
| 107 |
+
sources.append({
|
| 108 |
+
'url': url,
|
| 109 |
+
'title': title,
|
| 110 |
+
'relevance': relevance
|
| 111 |
+
})
|
| 112 |
+
|
| 113 |
+
# Sort sources by relevance (highest first)
|
| 114 |
+
sources.sort(key=lambda x: x.get('relevance', 0), reverse=True)
|
| 115 |
+
|
| 116 |
+
# Combine context
|
| 117 |
+
context = '\n\n'.join(context_parts)
|
| 118 |
+
|
| 119 |
+
return context, sources
|
| 120 |
+
|
| 121 |
+
def build_prompt(self, query: str, context: str, history: List[Dict[str, str]]) -> str:
|
| 122 |
+
"""Build prompt for the model - Force it to use ONLY the context"""
|
| 123 |
+
|
| 124 |
+
prompt = f"""<|system|>
|
| 125 |
+
You are a helpful assistant. Answer the question using ONLY the information in the Context below. Do not make up information. Be concise and accurate.
|
| 126 |
+
<|user|>
|
| 127 |
+
Context:
|
| 128 |
+
{context}
|
| 129 |
+
|
| 130 |
+
Question: {query}
|
| 131 |
+
|
| 132 |
+
Answer using ONLY the information above:
|
| 133 |
+
<|assistant|>
|
| 134 |
+
"""
|
| 135 |
+
|
| 136 |
+
return prompt
|
| 137 |
+
|
| 138 |
+
def generate_response(self, prompt: str) -> str:
|
| 139 |
+
"""Generate response using Groq API"""
|
| 140 |
+
try:
|
| 141 |
+
chat_completion = self.groq_client.chat.completions.create(
|
| 142 |
+
messages=[
|
| 143 |
+
{
|
| 144 |
+
"role": "system",
|
| 145 |
+
"content": "You are a helpful assistant. Answer questions using ONLY the provided context. Be concise and accurate."
|
| 146 |
+
},
|
| 147 |
+
{
|
| 148 |
+
"role": "user",
|
| 149 |
+
"content": prompt
|
| 150 |
+
}
|
| 151 |
+
],
|
| 152 |
+
model=self.groq_model,
|
| 153 |
+
temperature=0.1,
|
| 154 |
+
max_tokens=256,
|
| 155 |
+
top_p=0.9,
|
| 156 |
+
)
|
| 157 |
+
|
| 158 |
+
response = chat_completion.choices[0].message.content
|
| 159 |
+
return self.clean_response(response)
|
| 160 |
+
|
| 161 |
+
except Exception as e:
|
| 162 |
+
print(f"❌ Groq API error: {e}")
|
| 163 |
+
return "I'm having trouble generating a response right now. Please try again."
|
| 164 |
+
|
| 165 |
+
# Clean up aggressive extraction
|
| 166 |
+
# Remove everything before first actual sentence
|
| 167 |
+
lines = response.split('\n')
|
| 168 |
+
clean_lines = []
|
| 169 |
+
|
| 170 |
+
for line in lines:
|
| 171 |
+
line = line.strip()
|
| 172 |
+
# Skip system-like patterns
|
| 173 |
+
if any(skip in line.lower() for skip in [
|
| 174 |
+
'you are', 'answer the question', 'context:', 'question:',
|
| 175 |
+
'using only', 'be concise', '<|', '|>'
|
| 176 |
+
]):
|
| 177 |
+
continue
|
| 178 |
+
# Skip empty lines
|
| 179 |
+
if not line:
|
| 180 |
+
continue
|
| 181 |
+
clean_lines.append(line)
|
| 182 |
+
|
| 183 |
+
response = ' '.join(clean_lines)
|
| 184 |
+
|
| 185 |
+
# If response is too short or still has issues, extract meaningful content
|
| 186 |
+
if len(response) < 20 or 'based on actual answers' in response.lower():
|
| 187 |
+
# Try to find the actual answer in the response
|
| 188 |
+
sentences = response.split('.')
|
| 189 |
+
good_sentences = []
|
| 190 |
+
for sent in sentences:
|
| 191 |
+
sent = sent.strip()
|
| 192 |
+
# Skip bad patterns
|
| 193 |
+
if any(bad in sent.lower() for bad in [
|
| 194 |
+
'based on actual', 'answer based', 'yes!', 'here\'s some quick facts'
|
| 195 |
+
]):
|
| 196 |
+
continue
|
| 197 |
+
if sent and len(sent) > 10:
|
| 198 |
+
good_sentences.append(sent)
|
| 199 |
+
|
| 200 |
+
if good_sentences:
|
| 201 |
+
response = '. '.join(good_sentences[:3]) # Max 3 sentences
|
| 202 |
+
if response and not response.endswith('.'):
|
| 203 |
+
response += '.'
|
| 204 |
+
|
| 205 |
+
# Clean up
|
| 206 |
+
response = self.clean_response(response)
|
| 207 |
+
|
| 208 |
+
return response
|
| 209 |
+
|
| 210 |
+
def clean_response(self, text: str) -> str:
|
| 211 |
+
"""Clean up the generated response"""
|
| 212 |
+
# Remove any remaining system/user markers
|
| 213 |
+
text = re.sub(r'<\|.*?\|>', '', text)
|
| 214 |
+
|
| 215 |
+
# Remove repetitive sentences
|
| 216 |
+
sentences = text.split('.')
|
| 217 |
+
unique_sentences = []
|
| 218 |
+
seen = set()
|
| 219 |
+
|
| 220 |
+
for sentence in sentences:
|
| 221 |
+
sentence = sentence.strip()
|
| 222 |
+
if sentence and sentence.lower() not in seen:
|
| 223 |
+
unique_sentences.append(sentence)
|
| 224 |
+
seen.add(sentence.lower())
|
| 225 |
+
|
| 226 |
+
text = '. '.join(unique_sentences)
|
| 227 |
+
if text and not text.endswith('.'):
|
| 228 |
+
text += '.'
|
| 229 |
+
|
| 230 |
+
return text.strip()
|
| 231 |
+
|
| 232 |
+
def format_sources(self, sources: List[Dict], response: str = "") -> str:
|
| 233 |
+
"""Format sources for display - DISABLED until we fix the retrieval issue"""
|
| 234 |
+
# Sources are showing same URLs for every question
|
| 235 |
+
# Disable until we properly fix the vector DB retrieval
|
| 236 |
+
return ""
|
| 237 |
+
|
| 238 |
+
def extract_services_list(self, context: str, sources: List[Dict]) -> str:
|
| 239 |
+
"""
|
| 240 |
+
Extract service information directly from context WITHOUT LLM generation.
|
| 241 |
+
This is FULLY EXTRACTIVE - no hallucinations possible.
|
| 242 |
+
"""
|
| 243 |
+
services = []
|
| 244 |
+
seen_services = set()
|
| 245 |
+
|
| 246 |
+
# Extract services from retrieved documents
|
| 247 |
+
lines = context.split('\n')
|
| 248 |
+
|
| 249 |
+
for line in lines:
|
| 250 |
+
line_lower = line.strip().lower()
|
| 251 |
+
|
| 252 |
+
# Look for service mentions
|
| 253 |
+
if 'aws' in line_lower and 'aws' not in seen_services:
|
| 254 |
+
services.append("AWS Cloud Services and Managed AWS Solutions")
|
| 255 |
+
seen_services.add('aws')
|
| 256 |
+
if 'azure' in line_lower and 'azure' not in seen_services:
|
| 257 |
+
services.append("Microsoft Azure Cloud Managed Services")
|
| 258 |
+
seen_services.add('azure')
|
| 259 |
+
if 'google cloud' in line_lower and 'google' not in seen_services:
|
| 260 |
+
services.append("Google Cloud Platform (GCP) Services")
|
| 261 |
+
seen_services.add('google')
|
| 262 |
+
if 'kubernetes' in line_lower and 'kubernetes' not in seen_services:
|
| 263 |
+
services.append("Managed Kubernetes and Container Services")
|
| 264 |
+
seen_services.add('kubernetes')
|
| 265 |
+
if 'security' in line_lower and 'security' not in seen_services:
|
| 266 |
+
services.append("Cloud Security and Cybersecurity Solutions")
|
| 267 |
+
seen_services.add('security')
|
| 268 |
+
if 'migration' in line_lower and 'cloud' in line_lower and 'migration' not in seen_services:
|
| 269 |
+
services.append("Cloud Migration and Adoption Services")
|
| 270 |
+
seen_services.add('migration')
|
| 271 |
+
if (('data' in line_lower and 'analytics' in line_lower) or
|
| 272 |
+
('ai' in line_lower and 'ml' in line_lower)) and 'data' not in seen_services:
|
| 273 |
+
services.append("Data Analytics, AI and Machine Learning")
|
| 274 |
+
seen_services.add('data')
|
| 275 |
+
if ('multicloud' in line_lower or 'multi-cloud' in line_lower) and 'multicloud' not in seen_services:
|
| 276 |
+
services.append("Multi-Cloud and Hybrid Cloud Solutions")
|
| 277 |
+
seen_services.add('multicloud')
|
| 278 |
+
if ('professional services' in line_lower or 'consulting' in line_lower) and 'professional' not in seen_services:
|
| 279 |
+
services.append("Professional Services and Consulting")
|
| 280 |
+
seen_services.add('professional')
|
| 281 |
+
if ('application' in line_lower and
|
| 282 |
+
('modernization' in line_lower or 'development' in line_lower)) and 'apps' not in seen_services:
|
| 283 |
+
services.append("Application Modernization and Development")
|
| 284 |
+
seen_services.add('apps')
|
| 285 |
+
if ('managed hosting' in line_lower or 'dedicated hosting' in line_lower) and 'hosting' not in seen_services:
|
| 286 |
+
services.append("Managed Hosting and Dedicated Infrastructure")
|
| 287 |
+
seen_services.add('hosting')
|
| 288 |
+
if ('compliance' in line_lower and
|
| 289 |
+
('fedramp' in line_lower or 'government' in line_lower)) and 'compliance' not in seen_services:
|
| 290 |
+
services.append("FedRAMP Compliance and Government Cloud")
|
| 291 |
+
seen_services.add('compliance')
|
| 292 |
+
|
| 293 |
+
# Also check URLs for service categories
|
| 294 |
+
for source in sources[:10]:
|
| 295 |
+
url = source.get('url', '').lower()
|
| 296 |
+
if '/aws' in url and 'aws' not in seen_services:
|
| 297 |
+
services.append("AWS Cloud Services and Managed AWS Solutions")
|
| 298 |
+
seen_services.add('aws')
|
| 299 |
+
if '/azure' in url and 'azure' not in seen_services:
|
| 300 |
+
services.append("Microsoft Azure Cloud Managed Services")
|
| 301 |
+
seen_services.add('azure')
|
| 302 |
+
if '/google-cloud' in url and 'google' not in seen_services:
|
| 303 |
+
services.append("Google Cloud Platform (GCP) Services")
|
| 304 |
+
seen_services.add('google')
|
| 305 |
+
if '/kubernetes' in url and 'kubernetes' not in seen_services:
|
| 306 |
+
services.append("Managed Kubernetes and Container Services")
|
| 307 |
+
seen_services.add('kubernetes')
|
| 308 |
+
if '/security' in url and 'security' not in seen_services:
|
| 309 |
+
services.append("Cloud Security and Cybersecurity Solutions")
|
| 310 |
+
seen_services.add('security')
|
| 311 |
+
if '/migration' in url and 'migration' not in seen_services:
|
| 312 |
+
services.append("Cloud Migration and Adoption Services")
|
| 313 |
+
seen_services.add('migration')
|
| 314 |
+
if '/data' in url and 'data' not in seen_services:
|
| 315 |
+
services.append("Data Analytics, AI and Machine Learning")
|
| 316 |
+
seen_services.add('data')
|
| 317 |
+
if '/multi-cloud' in url and 'multicloud' not in seen_services:
|
| 318 |
+
services.append("Multi-Cloud and Hybrid Cloud Solutions")
|
| 319 |
+
seen_services.add('multicloud')
|
| 320 |
+
if '/professional-services' in url and 'professional' not in seen_services:
|
| 321 |
+
services.append("Professional Services and Consulting")
|
| 322 |
+
seen_services.add('professional')
|
| 323 |
+
if '/applications' in url and 'apps' not in seen_services:
|
| 324 |
+
services.append("Application Management and Modernization")
|
| 325 |
+
seen_services.add('apps')
|
| 326 |
+
|
| 327 |
+
if not services:
|
| 328 |
+
return None
|
| 329 |
+
|
| 330 |
+
# Format response
|
| 331 |
+
response = "Based on the available documentation, Rackspace Technology offers the following services:\n\n"
|
| 332 |
+
for i, service in enumerate(services, 1):
|
| 333 |
+
response += f"{i}. {service}\n"
|
| 334 |
+
|
| 335 |
+
# Add source URLs (only unique, relevant ones)
|
| 336 |
+
response += "\n**Learn more at:**\n"
|
| 337 |
+
unique_urls = []
|
| 338 |
+
for source in sources[:5]:
|
| 339 |
+
url = source.get('url', '')
|
| 340 |
+
if url and url not in unique_urls:
|
| 341 |
+
unique_urls.append(url)
|
| 342 |
+
response += f"• {url}\n"
|
| 343 |
+
|
| 344 |
+
return response
|
| 345 |
+
|
| 346 |
+
def generate_summary_with_citations(self, query: str, context: str, sources: List[Dict], history: str = None) -> str:
|
| 347 |
+
"""
|
| 348 |
+
SUMMARIZATION MODE — Uses LLM to generate concise summaries with citations
|
| 349 |
+
|
| 350 |
+
Generate a natural, concise summary from the context and add inline citations.
|
| 351 |
+
|
| 352 |
+
Args:
|
| 353 |
+
query: User's question
|
| 354 |
+
context: Retrieved context from vector DB
|
| 355 |
+
sources: Source documents with URLs
|
| 356 |
+
history: Optional conversation history (for follow-up questions)
|
| 357 |
+
|
| 358 |
+
Rules:
|
| 359 |
+
- Use LLM to synthesize information from multiple sources
|
| 360 |
+
- Generate 2-4 sentence summaries (concise and readable)
|
| 361 |
+
- Add inline citations like [Source: URL] after key facts
|
| 362 |
+
- Avoid marketing fluff - focus on factual information
|
| 363 |
+
- If context insufficient: acknowledge limitations
|
| 364 |
+
"""
|
| 365 |
+
|
| 366 |
+
# Build history context if provided
|
| 367 |
+
history_context = ""
|
| 368 |
+
if history:
|
| 369 |
+
history_context = f"\nPrevious conversation:\n{history}\n"
|
| 370 |
+
|
| 371 |
+
# Build a specialized prompt for summarization
|
| 372 |
+
prompt = f"""<|system|>
|
| 373 |
+
You are a helpful assistant that provides concise summaries with citations.
|
| 374 |
+
Summarize the answer to the question in 2-4 clear sentences using the Context below.
|
| 375 |
+
Focus on factual, technical details. Avoid marketing language.
|
| 376 |
+
After each key fact, add a citation: [Source: doc1], [Source: doc2], etc.
|
| 377 |
+
<|user|>
|
| 378 |
+
{history_context}Context:
|
| 379 |
+
{context[:1500]}
|
| 380 |
+
|
| 381 |
+
Question: {query}
|
| 382 |
+
|
| 383 |
+
Provide a concise 2-4 sentence summary with inline citations:
|
| 384 |
+
<|assistant|>
|
| 385 |
+
"""
|
| 386 |
+
|
| 387 |
+
# Generate summary using Groq API
|
| 388 |
+
try:
|
| 389 |
+
chat_completion = self.groq_client.chat.completions.create(
|
| 390 |
+
messages=[
|
| 391 |
+
{
|
| 392 |
+
"role": "system",
|
| 393 |
+
"content": "You are a helpful assistant that provides concise summaries with citations. Summarize in 2-4 clear sentences using the context. Focus on factual, technical details. Avoid marketing language."
|
| 394 |
+
},
|
| 395 |
+
{
|
| 396 |
+
"role": "user",
|
| 397 |
+
"content": f"{history_context}Context:\n{context[:1500]}\n\nQuestion: {query}\n\nProvide a concise 2-4 sentence summary with inline citations:"
|
| 398 |
+
}
|
| 399 |
+
],
|
| 400 |
+
model=self.groq_model,
|
| 401 |
+
temperature=0.4,
|
| 402 |
+
max_tokens=256,
|
| 403 |
+
top_p=0.9,
|
| 404 |
+
)
|
| 405 |
+
|
| 406 |
+
summary = chat_completion.choices[0].message.content
|
| 407 |
+
|
| 408 |
+
except Exception as e:
|
| 409 |
+
print(f"❌ Groq API error: {e}")
|
| 410 |
+
return "I'm having trouble generating a summary right now. Please try again."
|
| 411 |
+
|
| 412 |
+
# Clean up
|
| 413 |
+
summary = self.clean_response(summary)
|
| 414 |
+
|
| 415 |
+
# Add actual source URLs at the end
|
| 416 |
+
if sources and summary:
|
| 417 |
+
summary += "\n\n**Referenced Sources:**\n"
|
| 418 |
+
for idx, source in enumerate(sources[:3], 1):
|
| 419 |
+
url = source.get('url', '')
|
| 420 |
+
title = source.get('title', 'Document')
|
| 421 |
+
if url:
|
| 422 |
+
summary += f"• [{title}]({url})\n"
|
| 423 |
+
|
| 424 |
+
return summary
|
| 425 |
+
|
| 426 |
+
def extract_answer_from_context(self, query: str, context: str, sources: List[Dict]) -> str:
|
| 427 |
+
"""
|
| 428 |
+
EXTRACTION MODE (STRICT RETRIEVAL) — NO LLM GENERATION
|
| 429 |
+
|
| 430 |
+
Extract answer directly from context using EXACT or NEAR-EXACT wording.
|
| 431 |
+
Do NOT generate, infer, summarize beyond what context states.
|
| 432 |
+
|
| 433 |
+
Rules:
|
| 434 |
+
- For HOW/WHAT: prioritize procedural, operational, architectural details
|
| 435 |
+
- Ignore marketing/promotional language
|
| 436 |
+
- If context doesn't contain answer: return "Context does not contain the answer"
|
| 437 |
+
- Behave like a retrieval engine, NOT a generative model
|
| 438 |
+
"""
|
| 439 |
+
|
| 440 |
+
# Strict noise patterns - SKIP ENTIRELY
|
| 441 |
+
noise_patterns = [
|
| 442 |
+
'rackspace technology privacy notice',
|
| 443 |
+
'to create a ticket', 'log into your account', 'fill out the form',
|
| 444 |
+
'ready to start the conversation', 'you may withdraw your consent',
|
| 445 |
+
'begin your', 'businesses today', 'journey',
|
| 446 |
+
'accelerate digital transformation', 'struggling to',
|
| 447 |
+
'transition to', 'move to', 'moving to',
|
| 448 |
+
'ai launchpad', 'introduces new layers', # Generic AI marketing
|
| 449 |
+
'cuts through that complexity' # Generic promises
|
| 450 |
+
]
|
| 451 |
+
|
| 452 |
+
# Marketing phrases that indicate NON-ANSWER paragraphs
|
| 453 |
+
marketing_indicators = [
|
| 454 |
+
'begin', 'start your', 'embark', 'journey', 'transformation',
|
| 455 |
+
'businesses are', 'organizations are', 'companies are',
|
| 456 |
+
'introducing', 'discover', 'explore', 'learn how'
|
| 457 |
+
]
|
| 458 |
+
|
| 459 |
+
# Answer phrases - paragraphs with these are ACTUAL ANSWERS
|
| 460 |
+
answer_indicators = [
|
| 461 |
+
'provides a', 'solves', 'by providing', 'includes',
|
| 462 |
+
'comprised of', 'consists of', 'offers',
|
| 463 |
+
'single pane of glass', 'curated platform',
|
| 464 |
+
'specialized support', 'managed platform',
|
| 465 |
+
'solution', 'features', 'capabilities'
|
| 466 |
+
]
|
| 467 |
+
|
| 468 |
+
# Detect query type
|
| 469 |
+
query_lower = query.lower()
|
| 470 |
+
is_how_question = any(w in query_lower for w in ['how', 'manage', 'manages', 'managing'])
|
| 471 |
+
is_what_question = any(w in query_lower for w in ['what', 'which', 'describe'])
|
| 472 |
+
is_tell_me_about = 'tell me about' in query_lower or 'tell me more about' in query_lower
|
| 473 |
+
|
| 474 |
+
# Extract query keywords for matching
|
| 475 |
+
query_keywords = [w for w in query_lower.split() if len(w) > 3 and w not in ['does', 'will', 'can', 'tell', 'about']]
|
| 476 |
+
|
| 477 |
+
# Split context into paragraphs AND sentences for better granularity
|
| 478 |
+
paragraphs = []
|
| 479 |
+
|
| 480 |
+
# First try splitting by double newlines (paragraphs)
|
| 481 |
+
raw_paragraphs = context.split('\n\n')
|
| 482 |
+
for para in raw_paragraphs:
|
| 483 |
+
para = para.strip()
|
| 484 |
+
if len(para) > 50:
|
| 485 |
+
paragraphs.append(para)
|
| 486 |
+
|
| 487 |
+
# If paragraphs are too long (>800 chars), split them further by sentences
|
| 488 |
+
expanded_paragraphs = []
|
| 489 |
+
for para in paragraphs:
|
| 490 |
+
if len(para) > 800:
|
| 491 |
+
# Split long paragraph into smaller chunks (by period or newline)
|
| 492 |
+
sentences = para.replace('\n', '. ').split('. ')
|
| 493 |
+
current_chunk = []
|
| 494 |
+
current_length = 0
|
| 495 |
+
|
| 496 |
+
for sent in sentences:
|
| 497 |
+
sent = sent.strip()
|
| 498 |
+
if not sent:
|
| 499 |
+
continue
|
| 500 |
+
|
| 501 |
+
if current_length + len(sent) > 400: # Max 400 chars per chunk
|
| 502 |
+
if current_chunk:
|
| 503 |
+
expanded_paragraphs.append('. '.join(current_chunk) + '.')
|
| 504 |
+
current_chunk = [sent]
|
| 505 |
+
current_length = len(sent)
|
| 506 |
+
else:
|
| 507 |
+
current_chunk.append(sent)
|
| 508 |
+
current_length += len(sent)
|
| 509 |
+
|
| 510 |
+
if current_chunk:
|
| 511 |
+
expanded_paragraphs.append('. '.join(current_chunk) + '.')
|
| 512 |
+
else:
|
| 513 |
+
expanded_paragraphs.append(para)
|
| 514 |
+
|
| 515 |
+
paragraphs = expanded_paragraphs
|
| 516 |
+
|
| 517 |
+
# STRICT scoring - only paragraphs that DIRECTLY answer
|
| 518 |
+
scored_paragraphs = []
|
| 519 |
+
|
| 520 |
+
for para in paragraphs:
|
| 521 |
+
para = para.strip()
|
| 522 |
+
|
| 523 |
+
# Skip if too short
|
| 524 |
+
if len(para) < 50:
|
| 525 |
+
continue
|
| 526 |
+
|
| 527 |
+
para_lower = para.lower()
|
| 528 |
+
|
| 529 |
+
# IMMEDIATE REJECTION for noise
|
| 530 |
+
if any(noise in para_lower for noise in noise_patterns):
|
| 531 |
+
continue
|
| 532 |
+
|
| 533 |
+
# IMMEDIATE REJECTION if starts with marketing
|
| 534 |
+
first_words = ' '.join(para_lower.split()[:5])
|
| 535 |
+
if any(bad in first_words for bad in marketing_indicators):
|
| 536 |
+
continue
|
| 537 |
+
|
| 538 |
+
# IMMEDIATE REJECTION if paragraph is just a list (bullets, dashes, numbered)
|
| 539 |
+
# Lists like "- Item1 - Item2 - Item3" are not descriptive answers
|
| 540 |
+
list_indicators = para.count('\n-') + para.count('\n•') + para.count('\n*')
|
| 541 |
+
is_just_list = list_indicators > 3 or (para.count('-') > 5 and len(para) < 300)
|
| 542 |
+
# For "tell me about" queries, skip lists - we want descriptions
|
| 543 |
+
if is_just_list and (is_tell_me_about or is_what_question):
|
| 544 |
+
continue
|
| 545 |
+
|
| 546 |
+
# Start with negative score
|
| 547 |
+
score = -10
|
| 548 |
+
|
| 549 |
+
# STRONG BOOST for answer indicators
|
| 550 |
+
answer_phrases = sum(5 for indicator in answer_indicators if indicator in para_lower)
|
| 551 |
+
score += answer_phrases
|
| 552 |
+
|
| 553 |
+
# BOOST for containing query keywords
|
| 554 |
+
keyword_matches = sum(3 for kw in query_keywords if kw in para_lower)
|
| 555 |
+
score += keyword_matches
|
| 556 |
+
|
| 557 |
+
# For HOW questions: prioritize procedural/operational language
|
| 558 |
+
if is_how_question:
|
| 559 |
+
how_indicators = [
|
| 560 |
+
'by providing', 'provides a', 'solves', 'through',
|
| 561 |
+
'comprised of', 'team', 'support', 'managed',
|
| 562 |
+
'deployment', 'cluster', 'infrastructure', 'platform'
|
| 563 |
+
]
|
| 564 |
+
score += sum(4 for ind in how_indicators if ind in para_lower)
|
| 565 |
+
|
| 566 |
+
# For WHAT questions: prioritize definitions
|
| 567 |
+
if is_what_question or is_tell_me_about:
|
| 568 |
+
what_indicators = [
|
| 569 |
+
'is a', 'is the', 'powered by', 'solution',
|
| 570 |
+
'offers', 'includes', 'features', 'enables',
|
| 571 |
+
'designed to', 'helps', 'allows', 'service that'
|
| 572 |
+
]
|
| 573 |
+
score += sum(3 for ind in what_indicators if ind in para_lower)
|
| 574 |
+
|
| 575 |
+
# STRONG PENALTY for marketing fluff
|
| 576 |
+
if any(bad in para_lower for bad in ['journey', 'transformation', 'accelerate', 'complexity']):
|
| 577 |
+
score -= 8
|
| 578 |
+
|
| 579 |
+
# Only keep paragraphs with positive score
|
| 580 |
+
if score > 0:
|
| 581 |
+
scored_paragraphs.append((score, para))
|
| 582 |
+
|
| 583 |
+
# Sort by score (highest first)
|
| 584 |
+
scored_paragraphs.sort(reverse=True, key=lambda x: x[0])
|
| 585 |
+
|
| 586 |
+
# Take top 2-3 paragraphs with highest scores (lowered threshold to 3)
|
| 587 |
+
top_paragraphs = [para for score, para in scored_paragraphs[:3] if score > 3]
|
| 588 |
+
|
| 589 |
+
# DEBUG: Log scores for troubleshooting
|
| 590 |
+
if not top_paragraphs and scored_paragraphs:
|
| 591 |
+
print(f"⚠️ No paragraphs scored > 3. Top scores: {[(s, p[:80]+'...') for s, p in scored_paragraphs[:3]]}")
|
| 592 |
+
|
| 593 |
+
if not top_paragraphs:
|
| 594 |
+
return "The provided context does not contain a direct answer to your question. Please try rephrasing or ask about specific Rackspace services."
|
| 595 |
+
|
| 596 |
+
# Build answer from top-scored paragraphs
|
| 597 |
+
answer = '\n\n'.join(top_paragraphs)
|
| 598 |
+
|
| 599 |
+
# Limit length (800 chars for detailed answers)
|
| 600 |
+
if len(answer) > 800:
|
| 601 |
+
truncated = answer[:800]
|
| 602 |
+
last_period = truncated.rfind('.')
|
| 603 |
+
if last_period > 200: # Ensure we keep meaningful content
|
| 604 |
+
answer = truncated[:last_period + 1]
|
| 605 |
+
|
| 606 |
+
# Add sources
|
| 607 |
+
if sources and answer:
|
| 608 |
+
answer += "\n\n**Source:**\n"
|
| 609 |
+
for source in sources[:2]:
|
| 610 |
+
url = source.get('url', '')
|
| 611 |
+
if url:
|
| 612 |
+
answer += f"• {url}\n"
|
| 613 |
+
|
| 614 |
+
return answer
|
| 615 |
+
|
| 616 |
+
def classify_query_type(self, query: str) -> str:
|
| 617 |
+
"""
|
| 618 |
+
Classify query into categories to decide history usage
|
| 619 |
+
|
| 620 |
+
Returns:
|
| 621 |
+
- "independent": New topic, no history needed
|
| 622 |
+
- "follow_up": Needs previous context (elaboration, clarification)
|
| 623 |
+
- "recall": Asking about conversation itself
|
| 624 |
+
"""
|
| 625 |
+
query_lower = query.lower().strip()
|
| 626 |
+
|
| 627 |
+
# 1. RECALL queries (asking about conversation)
|
| 628 |
+
recall_indicators = [
|
| 629 |
+
'what did i ask', 'what was my question', 'what did we talk about',
|
| 630 |
+
'earlier you said', 'you mentioned', 'my previous question',
|
| 631 |
+
'our conversation', 'what have we discussed', 'remind me what'
|
| 632 |
+
]
|
| 633 |
+
|
| 634 |
+
if any(ind in query_lower for ind in recall_indicators):
|
| 635 |
+
return "recall"
|
| 636 |
+
|
| 637 |
+
# 2. INDEPENDENT queries (new topics, facts, greetings)
|
| 638 |
+
independent_indicators = [
|
| 639 |
+
# Greetings
|
| 640 |
+
'hello', 'hi ', 'hey', 'good morning', 'good afternoon',
|
| 641 |
+
# Full questions (usually new topics)
|
| 642 |
+
'what is rackspace', 'what are rackspace', 'who is rackspace',
|
| 643 |
+
'what services does', 'what does rackspace',
|
| 644 |
+
# List/overview requests
|
| 645 |
+
'list', 'show me', 'give me a list'
|
| 646 |
+
]
|
| 647 |
+
|
| 648 |
+
# Check if starts with common question words (likely independent)
|
| 649 |
+
starts_with_wh = any(query_lower.startswith(q) for q in [
|
| 650 |
+
'what is', 'what are', 'who is', 'who are',
|
| 651 |
+
'when is', 'when was', 'where is', 'where are',
|
| 652 |
+
'which ', 'how much', 'how many'
|
| 653 |
+
])
|
| 654 |
+
|
| 655 |
+
# Check independent indicators
|
| 656 |
+
has_independent = any(ind in query_lower for ind in independent_indicators)
|
| 657 |
+
|
| 658 |
+
if has_independent or (starts_with_wh and len(query_lower.split()) > 4):
|
| 659 |
+
return "independent"
|
| 660 |
+
|
| 661 |
+
# 3. FOLLOW-UP queries (needs history)
|
| 662 |
+
follow_up_indicators = [
|
| 663 |
+
# Pronouns (it, that, this, them, they)
|
| 664 |
+
' it ', ' it?', ' it.', 'about it', 'with it', 'of it',
|
| 665 |
+
' that ', ' that?', ' that.', 'about that', 'with that',
|
| 666 |
+
' this ', ' this?', ' this.', 'about this', 'with this',
|
| 667 |
+
' them ', ' them?', ' they ', ' their ', 'those ',
|
| 668 |
+
|
| 669 |
+
# Continuation words
|
| 670 |
+
'more about', 'tell me more', 'elaborate', 'explain that',
|
| 671 |
+
'why did you', 'how did you', 'can you explain',
|
| 672 |
+
'what do you mean', 'clarify', 'expand on', 'go deeper',
|
| 673 |
+
|
| 674 |
+
# Comparative/relational
|
| 675 |
+
'compared to', 'difference between', 'versus',
|
| 676 |
+
'how does that', 'why does that', 'what about that'
|
| 677 |
+
]
|
| 678 |
+
|
| 679 |
+
# Short queries are usually follow-ups
|
| 680 |
+
is_short = len(query_lower.split()) <= 5
|
| 681 |
+
has_follow_up = any(ind in query_lower for ind in follow_up_indicators)
|
| 682 |
+
|
| 683 |
+
if has_follow_up or (is_short and not starts_with_wh):
|
| 684 |
+
return "follow_up"
|
| 685 |
+
|
| 686 |
+
# Default: independent (new topic)
|
| 687 |
+
return "independent"
|
| 688 |
+
|
| 689 |
+
def handle_recall(self, query: str) -> str:
|
| 690 |
+
"""Handle queries asking about conversation history"""
|
| 691 |
+
if not self.conversation_history:
|
| 692 |
+
return "This is the beginning of our conversation. You haven't asked any questions yet."
|
| 693 |
+
|
| 694 |
+
# Return formatted history
|
| 695 |
+
if len(self.conversation_history) == 1:
|
| 696 |
+
first_q = self.conversation_history[0]['user']
|
| 697 |
+
return f"You asked: '{first_q}'"
|
| 698 |
+
else:
|
| 699 |
+
response = "Here's our conversation so far:\n\n"
|
| 700 |
+
for i, exchange in enumerate(self.conversation_history, 1):
|
| 701 |
+
response += f"{i}. You asked: '{exchange['user']}'\n"
|
| 702 |
+
return response
|
| 703 |
+
|
| 704 |
+
def extract_subject(self, question: str) -> str:
|
| 705 |
+
"""Extract main subject from question for pronoun resolution"""
|
| 706 |
+
question_lower = question.lower()
|
| 707 |
+
|
| 708 |
+
# Patterns: "tell me about X", "what is X", "how does X"
|
| 709 |
+
patterns = [
|
| 710 |
+
('tell me about ', 4),
|
| 711 |
+
('what is ', 3),
|
| 712 |
+
('what are ', 3),
|
| 713 |
+
('how does ', 3),
|
| 714 |
+
('how do ', 3),
|
| 715 |
+
('what does ', 3),
|
| 716 |
+
('about ', 2)
|
| 717 |
+
]
|
| 718 |
+
|
| 719 |
+
for pattern, max_words in patterns:
|
| 720 |
+
if pattern in question_lower:
|
| 721 |
+
subject = question_lower.split(pattern)[-1].strip()
|
| 722 |
+
# Take first few words
|
| 723 |
+
subject_words = subject.split()[:max_words]
|
| 724 |
+
# Remove question marks
|
| 725 |
+
subject = ' '.join(subject_words).replace('?', '').strip()
|
| 726 |
+
if subject:
|
| 727 |
+
return subject
|
| 728 |
+
|
| 729 |
+
return ""
|
| 730 |
+
|
| 731 |
+
def rewrite_query_with_history(self, query: str, history: str) -> str:
|
| 732 |
+
"""
|
| 733 |
+
Rewrite follow-up query with relevant history context
|
| 734 |
+
Simple concatenation approach (no LLM needed)
|
| 735 |
+
"""
|
| 736 |
+
if not history:
|
| 737 |
+
return query
|
| 738 |
+
|
| 739 |
+
# Extract last question from history
|
| 740 |
+
history_lines = history.strip().split('\n')
|
| 741 |
+
last_question = None
|
| 742 |
+
|
| 743 |
+
for line in history_lines:
|
| 744 |
+
if line.startswith('User:'):
|
| 745 |
+
last_question = line.replace('User:', '').strip()
|
| 746 |
+
|
| 747 |
+
if not last_question:
|
| 748 |
+
return query
|
| 749 |
+
|
| 750 |
+
query_lower = query.lower()
|
| 751 |
+
|
| 752 |
+
# Resolve pronouns
|
| 753 |
+
query_resolved = query
|
| 754 |
+
|
| 755 |
+
# Replace "it" with subject from last question
|
| 756 |
+
if ' it ' in query_lower or query_lower.endswith('it?') or query_lower.startswith('it '):
|
| 757 |
+
subject = self.extract_subject(last_question)
|
| 758 |
+
if subject:
|
| 759 |
+
query_resolved = query_resolved.replace(' it ', f' {subject} ')
|
| 760 |
+
query_resolved = query_resolved.replace('it?', f'{subject}?')
|
| 761 |
+
query_resolved = query_resolved.replace('It ', f'{subject.capitalize()} ')
|
| 762 |
+
|
| 763 |
+
# Replace "that" similarly
|
| 764 |
+
if ' that ' in query_lower or query_lower.endswith('that?'):
|
| 765 |
+
subject = self.extract_subject(last_question)
|
| 766 |
+
if subject:
|
| 767 |
+
query_resolved = query_resolved.replace(' that ', f' {subject} ')
|
| 768 |
+
query_resolved = query_resolved.replace('that?', f'{subject}?')
|
| 769 |
+
|
| 770 |
+
# Replace "this" similarly
|
| 771 |
+
if ' this ' in query_lower or query_lower.endswith('this?'):
|
| 772 |
+
subject = self.extract_subject(last_question)
|
| 773 |
+
if subject:
|
| 774 |
+
query_resolved = query_resolved.replace(' this ', f' {subject} ')
|
| 775 |
+
query_resolved = query_resolved.replace('this?', f'{subject}?')
|
| 776 |
+
|
| 777 |
+
# For elaboration requests, combine with original question
|
| 778 |
+
if any(word in query_lower for word in ['more', 'elaborate', 'explain', 'why did you', 'how did you']):
|
| 779 |
+
query_resolved = f"{last_question} - {query_resolved}"
|
| 780 |
+
|
| 781 |
+
return query_resolved
|
| 782 |
+
|
| 783 |
+
def get_recent_history(self, n: int = 2) -> str:
|
| 784 |
+
"""Get last N exchanges formatted for context"""
|
| 785 |
+
if not self.conversation_history:
|
| 786 |
+
return ""
|
| 787 |
+
|
| 788 |
+
recent = self.conversation_history[-n:] if len(self.conversation_history) >= n else self.conversation_history
|
| 789 |
+
|
| 790 |
+
history_str = ""
|
| 791 |
+
for exchange in recent:
|
| 792 |
+
history_str += f"User: {exchange['user']}\n"
|
| 793 |
+
# Truncate assistant response to save tokens
|
| 794 |
+
assistant_response = exchange['assistant'][:200]
|
| 795 |
+
if len(exchange['assistant']) > 200:
|
| 796 |
+
assistant_response += "..."
|
| 797 |
+
history_str += f"Assistant: {assistant_response}\n\n"
|
| 798 |
+
|
| 799 |
+
return history_str
|
| 800 |
+
|
| 801 |
+
def chat(self, user_message: str, mode: str = "extract") -> str:
|
| 802 |
+
"""
|
| 803 |
+
Main chat interface with DUAL MODE support
|
| 804 |
+
|
| 805 |
+
Args:
|
| 806 |
+
user_message: The user's question
|
| 807 |
+
mode: "extract" (default) or "summarize"
|
| 808 |
+
- "extract": Returns exact text from documents (STRICT RETRIEVAL)
|
| 809 |
+
- "summarize": Uses LLM to generate concise summaries with citations
|
| 810 |
+
|
| 811 |
+
Returns:
|
| 812 |
+
Response string based on selected mode
|
| 813 |
+
"""
|
| 814 |
+
print(f"\n📝 Processing: {user_message}")
|
| 815 |
+
print(f"🎯 Mode: {mode.upper()}")
|
| 816 |
+
|
| 817 |
+
# 1. CLASSIFY QUERY TYPE (intelligent context detection)
|
| 818 |
+
query_type = self.classify_query_type(user_message)
|
| 819 |
+
print(f"🔍 Query type: {query_type.upper()}")
|
| 820 |
+
|
| 821 |
+
# 2. HANDLE RECALL QUERIES (return from history directly)
|
| 822 |
+
if query_type == "recall":
|
| 823 |
+
return self.handle_recall(user_message)
|
| 824 |
+
|
| 825 |
+
# 3. REWRITE QUERY IF FOLLOW-UP (with history context)
|
| 826 |
+
if query_type == "follow_up":
|
| 827 |
+
history_context = self.get_recent_history(n=2)
|
| 828 |
+
search_query = self.rewrite_query_with_history(user_message, history_context)
|
| 829 |
+
print(f"✅ Using history - Rewritten: {search_query[:80]}...")
|
| 830 |
+
else:
|
| 831 |
+
search_query = user_message
|
| 832 |
+
print(f"🆕 New topic - Using original query")
|
| 833 |
+
|
| 834 |
+
# 4. DETECT LIST QUERIES
|
| 835 |
+
list_keywords = ['list', 'services', 'offer', 'provide', 'what does rackspace',
|
| 836 |
+
'tell me about services', 'what are the services', 'which services']
|
| 837 |
+
is_list_query = any(keyword in user_message.lower() for keyword in list_keywords)
|
| 838 |
+
|
| 839 |
+
# 5. RETRIEVE CONTEXT (use rewritten query for better results)
|
| 840 |
+
top_k = 10 if is_list_query else 5
|
| 841 |
+
context, sources = self.retrieve_context(search_query, top_k=top_k)
|
| 842 |
+
|
| 843 |
+
if not context:
|
| 844 |
+
return "I couldn't find relevant information to answer your question. Please try rephrasing or ask about Rackspace's cloud services, security, migration, or professional services."
|
| 845 |
+
|
| 846 |
+
# 6. FOR SERVICE LIST QUERIES, use service extractor (works for both modes)
|
| 847 |
+
if is_list_query:
|
| 848 |
+
print("🔍 Using extractive approach for service list")
|
| 849 |
+
extractive_response = self.extract_services_list(context, sources)
|
| 850 |
+
if extractive_response:
|
| 851 |
+
# Update history
|
| 852 |
+
self.conversation_history.append({
|
| 853 |
+
'user': user_message,
|
| 854 |
+
'assistant': extractive_response
|
| 855 |
+
})
|
| 856 |
+
if len(self.conversation_history) > 5:
|
| 857 |
+
self.conversation_history = self.conversation_history[-5:]
|
| 858 |
+
return extractive_response
|
| 859 |
+
|
| 860 |
+
# 7. GENERATE RESPONSE based on mode (with conditional history)
|
| 861 |
+
if mode == "summarize":
|
| 862 |
+
print("📝 Using SUMMARIZATION mode - LLM generates concise summary with citations")
|
| 863 |
+
# Pass history ONLY for follow-ups
|
| 864 |
+
history = self.get_recent_history(n=2) if query_type == "follow_up" else None
|
| 865 |
+
response = self.generate_summary_with_citations(user_message, context, sources, history=history)
|
| 866 |
+
else: # mode == "extract" (default)
|
| 867 |
+
print("🔍 Using EXTRACTION mode - returning exact document excerpts")
|
| 868 |
+
response = self.extract_answer_from_context(user_message, context, sources)
|
| 869 |
+
|
| 870 |
+
# 8. UPDATE HISTORY (sliding window - keep last 5)
|
| 871 |
+
self.conversation_history.append({
|
| 872 |
+
'user': user_message,
|
| 873 |
+
'assistant': response
|
| 874 |
+
})
|
| 875 |
+
if len(self.conversation_history) > 5:
|
| 876 |
+
self.conversation_history = self.conversation_history[-5:]
|
| 877 |
+
|
| 878 |
+
return response
|
| 879 |
+
|
| 880 |
+
def reset_conversation(self):
|
| 881 |
+
"""Reset conversation history"""
|
| 882 |
+
self.conversation_history = []
|
| 883 |
+
|
| 884 |
+
|
| 885 |
+
# Global chatbot instance
|
| 886 |
+
_chatbot_instance = None
|
| 887 |
+
|
| 888 |
+
|
| 889 |
+
def get_chatbot():
|
| 890 |
+
"""Get or create chatbot instance"""
|
| 891 |
+
global _chatbot_instance
|
| 892 |
+
if _chatbot_instance is None:
|
| 893 |
+
_chatbot_instance = EnhancedRAGChatbot()
|
| 894 |
+
return _chatbot_instance
|
| 895 |
+
|
| 896 |
+
|
| 897 |
+
def chat(message: str, mode: str = "extract") -> str:
|
| 898 |
+
"""
|
| 899 |
+
Simple chat interface
|
| 900 |
+
|
| 901 |
+
Args:
|
| 902 |
+
message: User's question
|
| 903 |
+
mode: "extract" (default) or "summarize"
|
| 904 |
+
"""
|
| 905 |
+
chatbot = get_chatbot()
|
| 906 |
+
return chatbot.chat(message, mode=mode)
|
| 907 |
+
|
| 908 |
+
|
| 909 |
+
def reset():
|
| 910 |
+
"""Reset conversation"""
|
| 911 |
+
chatbot = get_chatbot()
|
| 912 |
+
chatbot.reset_conversation()
|
| 913 |
+
|
| 914 |
+
|
| 915 |
+
if __name__ == "__main__":
|
| 916 |
+
# Test the chatbot
|
| 917 |
+
print("\n" + "="*80)
|
| 918 |
+
print("🧪 TESTING ENHANCED RAG CHATBOT")
|
| 919 |
+
print("="*80)
|
| 920 |
+
|
| 921 |
+
chatbot = get_chatbot()
|
| 922 |
+
|
| 923 |
+
test_questions = [
|
| 924 |
+
"What are Rackspace's cloud adoption and migration services?",
|
| 925 |
+
"How does Rackspace help with AWS deployment?",
|
| 926 |
+
"What security services does Rackspace offer?"
|
| 927 |
+
]
|
| 928 |
+
|
| 929 |
+
for question in test_questions:
|
| 930 |
+
print(f"\n❓ {question}")
|
| 931 |
+
response = chatbot.chat(question)
|
| 932 |
+
print(f"🤖 {response}")
|
| 933 |
+
print("-" * 80)
|
enhanced_vector_db.py
ADDED
|
@@ -0,0 +1,239 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Enhanced Vector Database Builder
|
| 3 |
+
This script:
|
| 4 |
+
1. Uses the enhanced collected data (filtered content)
|
| 5 |
+
2. Incorporates training Q&A pairs for better retrieval
|
| 6 |
+
3. Creates a high-quality vector database
|
| 7 |
+
4. Adds metadata for better context
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
import json
|
| 11 |
+
import chromadb
|
| 12 |
+
from chromadb.config import Settings
|
| 13 |
+
from sentence_transformers import SentenceTransformer
|
| 14 |
+
from typing import List, Dict
|
| 15 |
+
import re
|
| 16 |
+
|
| 17 |
+
from config import (
|
| 18 |
+
DATA_DIR, VECTOR_DB_DIR, EMBEDDING_MODEL,
|
| 19 |
+
CHUNK_SIZE, CHUNK_OVERLAP
|
| 20 |
+
)
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
class EnhancedVectorDBManager:
|
| 24 |
+
"""Enhanced vector database with training data integration"""
|
| 25 |
+
|
| 26 |
+
def __init__(self):
|
| 27 |
+
print("🔧 Initializing Enhanced Vector Database Manager...")
|
| 28 |
+
|
| 29 |
+
# Initialize ChromaDB
|
| 30 |
+
self.client = chromadb.PersistentClient(
|
| 31 |
+
path=str(VECTOR_DB_DIR),
|
| 32 |
+
settings=Settings(anonymized_telemetry=False)
|
| 33 |
+
)
|
| 34 |
+
|
| 35 |
+
# Initialize embedding model
|
| 36 |
+
print(f"📦 Loading embedding model: {EMBEDDING_MODEL}")
|
| 37 |
+
self.embedding_model = SentenceTransformer(EMBEDDING_MODEL)
|
| 38 |
+
|
| 39 |
+
# Get or create collection
|
| 40 |
+
try:
|
| 41 |
+
self.client.delete_collection("rackspace_knowledge")
|
| 42 |
+
print("🗑️ Deleted old collection")
|
| 43 |
+
except:
|
| 44 |
+
pass
|
| 45 |
+
|
| 46 |
+
self.collection = self.client.create_collection(
|
| 47 |
+
name="rackspace_knowledge",
|
| 48 |
+
metadata={"description": "Enhanced Rackspace knowledge base with training data"}
|
| 49 |
+
)
|
| 50 |
+
|
| 51 |
+
print("✅ Vector database initialized")
|
| 52 |
+
|
| 53 |
+
def clean_text(self, text: str) -> str:
|
| 54 |
+
"""Clean and normalize text"""
|
| 55 |
+
# Remove excessive whitespace
|
| 56 |
+
text = re.sub(r'\s+', ' ', text)
|
| 57 |
+
# Remove special characters but keep punctuation
|
| 58 |
+
text = re.sub(r'[^\w\s\.\,\!\?\-\:\;\(\)]', '', text)
|
| 59 |
+
return text.strip()
|
| 60 |
+
|
| 61 |
+
def chunk_text(self, text: str, chunk_size: int = CHUNK_SIZE, overlap: int = CHUNK_OVERLAP) -> List[str]:
|
| 62 |
+
"""Split text into overlapping chunks"""
|
| 63 |
+
words = text.split()
|
| 64 |
+
chunks = []
|
| 65 |
+
|
| 66 |
+
for i in range(0, len(words), chunk_size - overlap):
|
| 67 |
+
chunk = ' '.join(words[i:i + chunk_size])
|
| 68 |
+
if len(chunk) > 100: # Only keep substantial chunks
|
| 69 |
+
chunks.append(chunk)
|
| 70 |
+
|
| 71 |
+
return chunks
|
| 72 |
+
|
| 73 |
+
def load_documents(self) -> List[Dict]:
|
| 74 |
+
"""Load enhanced crawled documents"""
|
| 75 |
+
# Try enhanced file first, then fall back to original
|
| 76 |
+
doc_file = DATA_DIR / 'rackspace_knowledge_enhanced.json'
|
| 77 |
+
if not doc_file.exists():
|
| 78 |
+
doc_file = DATA_DIR / 'rackspace_knowledge_clean.json'
|
| 79 |
+
if not doc_file.exists():
|
| 80 |
+
doc_file = DATA_DIR / 'rackspace_knowledge.json'
|
| 81 |
+
|
| 82 |
+
if not doc_file.exists():
|
| 83 |
+
print(f"❌ Document file not found: {doc_file}")
|
| 84 |
+
print("⚠️ Please run the data integration script first!")
|
| 85 |
+
return []
|
| 86 |
+
|
| 87 |
+
with open(doc_file, 'r', encoding='utf-8') as f:
|
| 88 |
+
documents = json.load(f)
|
| 89 |
+
|
| 90 |
+
print(f"📄 Loaded {len(documents)} documents from {doc_file.name}")
|
| 91 |
+
return documents
|
| 92 |
+
|
| 93 |
+
def load_training_data(self) -> List[Dict]:
|
| 94 |
+
"""Load training Q&A pairs"""
|
| 95 |
+
# Try enhanced file first, then fall back to original
|
| 96 |
+
qa_file = DATA_DIR / 'training_qa_pairs_enhanced.json'
|
| 97 |
+
if not qa_file.exists():
|
| 98 |
+
qa_file = DATA_DIR / 'training_qa_pairs.json'
|
| 99 |
+
|
| 100 |
+
if not qa_file.exists():
|
| 101 |
+
print(f"⚠️ Training Q&A file not found: {qa_file}")
|
| 102 |
+
return []
|
| 103 |
+
|
| 104 |
+
with open(qa_file, 'r', encoding='utf-8') as f:
|
| 105 |
+
qa_pairs = json.load(f)
|
| 106 |
+
|
| 107 |
+
print(f"📚 Loaded {len(qa_pairs)} training Q&A pairs from {qa_file.name}")
|
| 108 |
+
return qa_pairs
|
| 109 |
+
|
| 110 |
+
def build_database(self):
|
| 111 |
+
"""Build vector database with ONLY real documents (NO Q&A pairs!)"""
|
| 112 |
+
print("\n" + "="*80)
|
| 113 |
+
print("🚀 BUILDING VECTOR DATABASE (RAG - Real Documents Only)")
|
| 114 |
+
print("="*80)
|
| 115 |
+
print("\n⚠️ NOTE: Training Q&A pairs are for fine-tuning ONLY!")
|
| 116 |
+
print("⚠️ Vector DB should contain ONLY actual web content with URLs\n")
|
| 117 |
+
|
| 118 |
+
# Load data
|
| 119 |
+
documents = self.load_documents()
|
| 120 |
+
|
| 121 |
+
if not documents:
|
| 122 |
+
print("❌ No documents to index!")
|
| 123 |
+
return
|
| 124 |
+
|
| 125 |
+
all_chunks = []
|
| 126 |
+
all_metadatas = []
|
| 127 |
+
all_ids = []
|
| 128 |
+
chunk_id = 0
|
| 129 |
+
|
| 130 |
+
# Index ONLY document chunks (no Q&A pairs!)
|
| 131 |
+
print("\n📍 Indexing real document chunks from web crawl...")
|
| 132 |
+
for doc_idx, doc in enumerate(documents):
|
| 133 |
+
content = doc.get('content', '')
|
| 134 |
+
url = doc.get('url', 'unknown')
|
| 135 |
+
title = doc.get('title', 'Untitled')
|
| 136 |
+
|
| 137 |
+
if not content or len(content) < 100:
|
| 138 |
+
continue
|
| 139 |
+
|
| 140 |
+
# Clean and chunk
|
| 141 |
+
cleaned_content = self.clean_text(content)
|
| 142 |
+
chunks = self.chunk_text(cleaned_content)
|
| 143 |
+
|
| 144 |
+
for chunk in chunks:
|
| 145 |
+
all_chunks.append(chunk)
|
| 146 |
+
all_metadatas.append({
|
| 147 |
+
'source': url, # ACTUAL URL!
|
| 148 |
+
'url': url,
|
| 149 |
+
'title': title,
|
| 150 |
+
'type': 'document'
|
| 151 |
+
})
|
| 152 |
+
all_ids.append(f"doc_{chunk_id}")
|
| 153 |
+
chunk_id += 1
|
| 154 |
+
|
| 155 |
+
if (doc_idx + 1) % 50 == 0:
|
| 156 |
+
print(f" Processed {doc_idx + 1}/{len(documents)} documents...")
|
| 157 |
+
|
| 158 |
+
print(f"✅ Created {len(all_chunks)} chunks from {len(documents)} real documents")
|
| 159 |
+
|
| 160 |
+
# Phase 2: Generate embeddings and add to ChromaDB
|
| 161 |
+
print(f"\n📍 Generating embeddings for {len(all_chunks)} chunks...")
|
| 162 |
+
|
| 163 |
+
# Add in batches to avoid memory issues
|
| 164 |
+
batch_size = 100
|
| 165 |
+
total_added = 0
|
| 166 |
+
|
| 167 |
+
for i in range(0, len(all_chunks), batch_size):
|
| 168 |
+
batch_chunks = all_chunks[i:i + batch_size]
|
| 169 |
+
batch_metadatas = all_metadatas[i:i + batch_size]
|
| 170 |
+
batch_ids = all_ids[i:i + batch_size]
|
| 171 |
+
|
| 172 |
+
# Generate embeddings
|
| 173 |
+
embeddings = self.embedding_model.encode(
|
| 174 |
+
batch_chunks,
|
| 175 |
+
show_progress_bar=False,
|
| 176 |
+
convert_to_numpy=True
|
| 177 |
+
)
|
| 178 |
+
|
| 179 |
+
# Add to collection
|
| 180 |
+
self.collection.add(
|
| 181 |
+
embeddings=embeddings.tolist(),
|
| 182 |
+
documents=batch_chunks,
|
| 183 |
+
metadatas=batch_metadatas,
|
| 184 |
+
ids=batch_ids
|
| 185 |
+
)
|
| 186 |
+
|
| 187 |
+
total_added += len(batch_chunks)
|
| 188 |
+
print(f" Added {total_added}/{len(all_chunks)} chunks...")
|
| 189 |
+
|
| 190 |
+
# Final statistics
|
| 191 |
+
print("\n" + "="*80)
|
| 192 |
+
print("✅ VECTOR DATABASE BUILD COMPLETE!")
|
| 193 |
+
print("="*80)
|
| 194 |
+
print(f"📊 Total chunks indexed: {len(all_chunks)}")
|
| 195 |
+
print(f" - Document chunks: {len([m for m in all_metadatas if m['source'] == 'document'])}")
|
| 196 |
+
print(f" - Q&A pairs: {len([m for m in all_metadatas if m['source'] == 'training_qa'])}")
|
| 197 |
+
print(f" - Training contexts: {len([m for m in all_metadatas if m['source'] == 'training_context'])}")
|
| 198 |
+
print(f"💾 Database location: {VECTOR_DB_DIR}")
|
| 199 |
+
print("="*80)
|
| 200 |
+
|
| 201 |
+
def test_search(self, query: str, top_k: int = 5):
|
| 202 |
+
"""Test the vector database with a query"""
|
| 203 |
+
print(f"\n🔍 Testing search: '{query}'")
|
| 204 |
+
|
| 205 |
+
# Generate query embedding
|
| 206 |
+
query_embedding = self.embedding_model.encode([query])[0]
|
| 207 |
+
|
| 208 |
+
# Search
|
| 209 |
+
results = self.collection.query(
|
| 210 |
+
query_embeddings=[query_embedding.tolist()],
|
| 211 |
+
n_results=top_k
|
| 212 |
+
)
|
| 213 |
+
|
| 214 |
+
print(f"\n📋 Top {top_k} results:")
|
| 215 |
+
for i, (doc, metadata) in enumerate(zip(results['documents'][0], results['metadatas'][0])):
|
| 216 |
+
print(f"\n{i+1}. Source: {metadata.get('source', 'unknown')}")
|
| 217 |
+
print(f" Type: {metadata.get('type', 'unknown')}")
|
| 218 |
+
if 'url' in metadata:
|
| 219 |
+
print(f" URL: {metadata['url']}")
|
| 220 |
+
if 'question' in metadata:
|
| 221 |
+
print(f" Question: {metadata['question']}")
|
| 222 |
+
print(f" Content: {doc[:200]}...")
|
| 223 |
+
|
| 224 |
+
|
| 225 |
+
def main():
|
| 226 |
+
"""Main execution"""
|
| 227 |
+
manager = EnhancedVectorDBManager()
|
| 228 |
+
manager.build_database()
|
| 229 |
+
|
| 230 |
+
# Test with sample query
|
| 231 |
+
print("\n" + "="*80)
|
| 232 |
+
print("🧪 TESTING DATABASE")
|
| 233 |
+
print("="*80)
|
| 234 |
+
manager.test_search("What are Rackspace's cloud adoption and migration services?")
|
| 235 |
+
manager.test_search("How do I deploy applications on AWS with Rackspace?")
|
| 236 |
+
|
| 237 |
+
|
| 238 |
+
if __name__ == "__main__":
|
| 239 |
+
main()
|
rackspace_knowledge_clean.json
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"url": "https://www.rackspace.com/cloud/cloud-migration",
|
| 4 |
+
"title": "Cloud Migration and Adoption Services | Rackspace Technology",
|
| 5 |
+
"content": "Take the next step towards cloud migration\n\nYou can take one of many paths when starting your cloud adoption journey. But to get started, you should first examine each individual workload and determine the best migration route for your organization. At Rackspace Technology, we’ve found the most effective migration plans for each workload, and we can help you build a compelling migration strategy that accelerates the return on your cloud investment.\n\nWe create customized cloud migration plans designed to support your business goals while minimizing business risks, disruption and downtime. Our cloud migration consultants will conduct a holistic assessment of your IT environment and handle the migration of your applications and workloads.\n\nDownload this white paper and discover the key migration path to success →\n\nReceive a holistic assessment of your IT environment. For a secure, compliant, and unbiased roadmap to the cloud, we’ll:\n\nCatalog your existing applications, infrastructure and network architecture to help prioritize workloads and applications.\n\nDetermine which cloud platform(s) fit your overall transformation objectives.\n\nDesign a high-level target infrastructure and cloud platform architecture that accommodates your security and risk requirements.\n\nAssess your existing workloads and predict future cloud consumption to estimate your costs.\n\nExplore your detailed deployment strategy and migration tools for all applications and future recommendations.\n\nOptimising Your AWS Cloud Migration: 4 Paths for a Successful Move to the Cloud →\n\nWe’ll help you find the right cloud for each application, then migrate it for you.\n\nParticipate in our cloud migration workshops, where our consultants work with you to identify which applications should be migrated to the cloud, determine your migration method (lift and shift, cloud native refactoring, etc.) and develop a high-level plan to get it there.\n\nBuild confidence in a manageable and cost-effective cloud migration by starting with a single, meaningful workload. Our experts will handle your application migration and remediation.\n\nWe’ll work with you to validate your applications, data and network accessibility, and perform cut-over and go-live procedures.\n\nSimplify your cloud migration with SDDC Flex→\n\nRethinking support models for a cloud native world\n\nSee how Rackspace Elastic Engineering breaks down the traditional build and operate barriers and opens up new possibilities to accelerate innovation.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nWe’re on your side, doing what it takes to get the job done right — from the first consultation to your daily operations. Contact us for a free quote.\n\nExplore our expert credentials and industry recognition.\n\nProviding innovation and leadership in the technology services industry for 20+ years.\n\nA leader in the 2020 Gartner Magic Quadrant for Public Cloud Infrastructure Professional and Managed Service Providers, Worldwide\n\nBest Consultancy or System Integrator 2020, The Cloud Awards\n\nOne of the leading AWS consulting partners with 14 competencies\n\nGoogle Cloud First Premier Managed Services Partner\n\nGoogle Cloud Partner of the Year\n\n2018 Google Migration Partner of the Year\n\nMicrosoft Five-time Hosting Partner of the Year\n\nAzure Expert MSP designation for 2021\n\nDell Technologies Titanium Partner - 4x Dell Global Alliances Partner of the Year\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nExperience how AWS’s leading-edge cloud capabilities can help you work smarter, lower costs and innovate with agility.\n\nAccelerate your business with Rackspace Services and tailored solutions on Google Cloud's innovative technology.\n\nSolve cloud challenges with a managed Microsoft Azure solution that helps you build new revenue streams, increase efficiency and deliver incredible experiences.\n\nAs a VMware Cloud Service Provider Pinnacle partner, Rackspace Technology delivers fully managed VMware Cloud Foundation™ solutions to help optimize performance, lower costs and ensure compliance.\n\nEnable multicloud and hybrid cloud to improve performance and optimize costs, so you can put your core business first.\n\nAchieve optimized performance, improved agility and cost savings — along with the strong security, compliance and control capabilities of private cloud.\n\nAccess expert cloud services to design, build, migrate, manage and optimize your public cloud for continual IT innovation.\n\nStop Overlooking The Cloud’s Biggest Benefit\n\nStop Overlooking The Cloud’s Biggest Benefit\n\nBeginners Guide to Cloud Migration\n\nE-Book: Charting Your Cloud Migration for Financial Services\n\nReady to start the conversation?\n\nReady to start the conversation?\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 6 |
+
"word_count": 802,
|
| 7 |
+
"crawled_at": "2025-11-25 09:07:21"
|
| 8 |
+
},
|
| 9 |
+
{
|
| 10 |
+
"url": "https://www.rackspace.com/cloud/aws",
|
| 11 |
+
"title": "AWS for your Cloud, Apps, Data and Security | Rackspace Cloud",
|
| 12 |
+
"content": "Discover how we’re rewriting the rules of enterprise modernization\n\nIn this video, you'll see how we transformed a full-stack Python application in under 10 minutes — a process that usually takes months and costs hundreds of thousands of dollars. Learn how our advanced automation and tools like Kiro deliver hyperspeed modernization, zero downtime and maximum security, all backed by the deep cloud expertise of Rackspace Technology. If you're ready to see what true transformation looks like, this is a must-watch.\n\nAccess continued support along your cloud journey.\n\nDemystify your AWS spend and maximize your ROI.\n\nAdopt cloud native technologies where it makes sense for your business.\n\nMigrate legacy media archives to AWS using AI to digitize, ingest, process, tag, store and leverage digital media content via a digital supply chain management system.\n\nExplore Rackspace Professional Services for AWS→\n\nBring your software into the future with improved scalability, reliability and performance while adding new capabilities.\n\nModernize your applications with a cloud native approach to development, data, teams and processes.\n\nEnhance customer experiences and improve operational efficiencies with intelligent applications powered byIoTcapabilities. Watch our webinar “Industrial IoT and Smart Manufacturing”.\n\nMigrate to AWS data and analytics services for improved efficiency.\n\nBuild a modern data platform tailored to your business needs that delivers business intelligence when and where you need it.\n\nEmpower your data to make automated recommendations, take preemptive action, and streamline decision-making with AI and machine learning.\n\nMaintain peak performance with access to a dedicated team of multi-disciplinary data experts for ongoing architecture, enablement and engineering services.\n\nAlleviate the complexity of security and compliance in your AWS environment with consultative services to help share the responsibility of defining security requirements for new AWS deployments, as well as migrations from existing vendors.\n\nModernize your approach to security with expert deployment and management of the right security technologies for your AWS environment – including AWS security tools like such as AWS Security Hub, AWS IAM Access Analyzer, Amazon GuardDuty, AWS Shield, AWS WAF, and AWS Firewall Manager.\n\nDesign and build cloud security controls to address compliance mandates, such as PCI-DSS, HIPAA, and more.\n\nImprove your cloud security posture by understanding cloud threats and vulnerabilities, with expert support to remediate settings that don’t align to industry benchmarks and best practices.\n\nReduce risk in your AWS environment with access to certified AWS security experts, including those who have been trained and are directly supported by the AWS Shield Response Team (ASRT).\n\nRackspace Government Cloud (RGC) by Rackspace Government Solutions is a NIST-based offering purpose-built with inheritable security controls for the independent software vendor and systems integrator with federal and state compliance needs.\n\nRackspace Government Solutions provides deep expertise dedicated to a FedRAMP AWS Platform-as-a-Service and a “born-in-the-cloud” mindset to help you modernize in a “Cloud Smart” public sector domain.\n\nAs an AWS Premier Consulting Partner with 15 consulting competencies, Rackspace Technology expedites your FedRAMP compliance journey through our JAB P-ATO Platform as a Service.\n\nRGC is purpose-built to help organizations achieve Assessment & Authorization faster and with cost savings of up to 70%. Whether you are a government agency, systems integrator, or independent software vendor, RGC is the foundation you need to rapidly deliver mission value via the cloud.\n\nExpertise across many AWS competencies including: data analytics, DevOps, higher education, financial services, healthcare, industrial software, IoT, machine learning and migration.\n\nExplore Rackspace FedRAMP Compliance→\n\nTake advantage of a Infrastructure Modernization solution across 11 services aimed at creating a secure, resilient and enterprise-grade cloud environment. Our certified experts work closely with you to align cloud architecture to meet your specific requirements and goals, empower your teams and drive rapid innovation on AWS.\n\nRackspace has an extensive standardized and purpose-built library of Reference Architectures for AWS that speeds up delivery, fully embraces cloud-native, maximizes AWS funding and reduces your costs.\n\nReduced complexity:We’ll break down gargantuan tasks into achievable milestones.\n\nMinimized risk:Weekly sprints during engagement and stakeholder alignments are key components to our delivery methodology to reduce risk.\n\nFaster time-to-value:Through our expertise, proven delivery methodology, and reference architectures (IP), we accelerate time-to-value and business outcomes.\n\nCloud Migration Services:Elevate workloads in the cloud\n\nApplications:Complete build, release and go-live support\n\nInfrastructure build-out services:Get infrastructure-as-code templates\n\nSecurity:Guidance and architecture based on platform best practices and industry standards\n\nDevOps services:CI/CD pipelines and automated deployments\n\nBackup and disaster recovery (DR) planning:Expert DR environment implementation and testing\n\nPerformance management for load testing and optimization.\n\nCost optimization:Tagging, billing review, usage and RIs.\n\nExploreRackspace Elastic Engineering→\n\nAlways-on managed support:Allows you to focus internal resources on core business activities and accelerate your journey through the cloud with the peace of mind that your environment is fully managed.\n\nOn-demand cloud expertise:Gain the benefits of public cloud without having to incur the challenge and expense of self-managing it.\n\nAccess to expertise:Immediate access to technical expertise and capacity to manage cloud infrastructure, tools and applications.\n\nCloud resiliency:Leverage cloud native and Rackspace Technology proprietary tooling to remove manual intervention and increase resiliency.\n\nInnovate with cloud services:Take advantage of the latest cloud features and capabilities to solve challenging business problems.\n\nExplore Rackspace Modern Operations→\n\nRethinking support models for a cloud native world\n\nSee how Rackspace Elastic Engineering breaks down the traditional build and operate barriers and opens up new possibilities to accelerate innovation.\n\nWe’re experts on your side, doing what it takes to get the job done right — from the first consultation to your daily operations.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nExplore our expert credentials and industry recognition.\n\nProviding innovation and leadership in the technology services industry for 20+ years.\n\nMicrosoft Workloads Consulting Competency\n\nSmall and Medium Business (SMB) Competency\n\nTravel & Hospitality Competency\n\nMarketplace Skilled Consulting Partner\n\nAWS Public Sector Solution Provider\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nMaximize the value of your data and the power of AWS cloud with the experience and expertise you need from Rackspace Technology.\n\nLearn about AWS Marketplace offerings from Rackspace Technology.\n\nGet expert guidance that helps you realize the benefits of modern applications and improve your return on investment.\n\nAccelerate results with AI-driven innovation\n\nRackspace Technology helps organizations turn data into impact with production-ready AI solutions built on private, public and hybrid cloud platforms. Our expertise in generative and agentic AI empowers you to deliver secure, scalable and responsible outcomes across your enterprise.\n\nAdopt clouds and ease migration to accelerate innovation and optimize performance.\n\nTap into the full benefits of the cloud by building applications with cloud native technologies, modern architectures and automated development workflows.\n\nGain cloud transparency, reduce waste and drive savings with FinOps.\n\nMake predictive decisions that accelerate innovation and increase ROI with integrated data architectures and AI.\n\nLeverage our IoT and Edge experience to increase efficiencies and reduce time to market for your IoT adoption projects\n\nInfrastructure Modernization: Professional Services\n\nE-Book: Charting Your Cloud Migration for Financial Services\n\nFrom AI to IoT: How the Power of AWS Cloud Propels Innovation\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 13 |
+
"word_count": 1267,
|
| 14 |
+
"crawled_at": "2025-11-25 09:07:24"
|
| 15 |
+
},
|
| 16 |
+
{
|
| 17 |
+
"url": "https://www.rackspace.com/cloud/azure",
|
| 18 |
+
"title": "Microsoft Azure Cloud Managed Services | Rackspace Technology",
|
| 19 |
+
"content": "Whether you’re looking to spur innovation and agility withAzure cloud, lower costs or build operational efficiencies, Rackspace Technology™ can help.\n\nOurMicrosoft® Azure® certified cloud experts put cutting-edge capabilities to work for your business. We apply deep expertise in cloud strategy, cloud-native development, containers, application modernization, AI/ML Ops, IoT and workload management to help you accelerate innovation withMicrosoft Azure.\n\nWherever you are in your cloud journey, whatever business outcomes you would like to solve, whichever workloads you would like to migrate, we’ll meet you there and simplify and manage your path forward with our end-to-end cloud services.\n\nWorking alongside your team, we’ll help you understand your options, identify, develop and deploy solutions that help you achieve smarter business outcomes, then migrate and manage your Azure cloud solutions so that you can focus on innovation.\n\nOur Modern Analytics MVP engagement liberates you from the challenges of your legacy systems and jumpstarts your modernization journey on your Microsoft Azure platform, putting you on the path to enable secure and intelligent decision making.\n\nGet inspired for your journey by attending our Hands on Analytics In a Day Workshops, checking out our eBook or scheduling your strategy session.\n\nArtificial Intelligence and Machine Learning\n\nWith the help of Artificial Intelligence (AI), Machine Learning (ML) can use complex algorithms to simplify tasks and parse large amounts of data. This provides major value to businesses seeking to optimize for trends and patterns within your Azure cloud instance. Fast track using your data to improve your business, make smarter decisions and transform customer experiences with our Ideation Workshop engagement.\n\nAlleviate the complexity of security and compliance in your Azure environment with consultative services to help share the responsibility of defining security requirements for new deployments, as well as migrations from existing vendors.\n\nModernize your approach to security with expert deployment and management of the right security technologies for your Azure environment – including but not limited to Azure Security Center and other Azure security offerings.\n\nDesign and build cloud security controls to address compliance mandates, such as PCI-DSS, HIPAA, and more.\n\nImprove your cloud security posture by understanding cloud threats and vulnerabilities, with expert support to remediate settings that don’t align to industry benchmarks and best practices.\n\nGet Offer:Hybrid Cloud Security Workshop: Get a customized threat & vulnerability analysis of your hybrid and multi-cloud environment and learn how to build and operate a more robust cloud security system.\n\nCloud Migration Services:Elevate workloads in the cloud\n\nApplications:Complete build, release and go-live support\n\nInfrastructure build-out services:Get infrastructure-as-code templates\n\nSecurity:Guidance and architecture based on platform best practices and industry standards\n\nDevOps services:CI/CD pipelines and automated deployments\n\nBackup and disaster recovery (DR) planning:Expert DR environment implementation and testing\n\nPerformance management for load testing and optimization.\n\nCost optimization:Tagging, billing review, usage and RIs.\n\nExploreRackspace Elastic Engineering→\n\nAlways-on managed support:Allows you to focus internal resources on core business activities and accelerate your journey through the cloud with the peace of mind that your environment is fully managed.\n\nOn-demand cloud expertise:Gain the benefits of public cloud without having to incur the challenge and expense of self-managing it.\n\nAccess to expertise:Immediate access to technical expertise and capacity to manage cloud infrastructure, tools and applications.\n\nCloud resiliency:Leverage cloud native and Rackspace Technology proprietary tooling to remove manual intervention and increase resiliency.\n\nInnovate with cloud services:Take advantage of the latest cloud features and capabilities to solve challenging business problems.\n\nWe’re experts on your side, doing what it takes to get the job done right — from the first consultation to your daily operations.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nExplore our expert credentials and industry recognition.\n\nFor over 25 years, Rackspace and Microsoft have cultivated a global relationship focused on helping businesses make the most of Microsoft technologies. Through innovative product delivery and unmatched service and support, we work together to create solutions across the Microsoft portfolio.\n\nYou can trust Rackspace to provide the expertise you need for the Microsoft technologies your business relies on.\n\nSolution Designations & Specializations\n\n- Infrastructure: Infra and DB migration to Microsoft Azure\n\n- Modern Work – Copilot Jumpstart ‘Ready-Tier’\n\n- Data & AI – Analytics on Microsoft Azure\n\n- Data & AI – AI and Machine Learning\n\n- Digital & App Innovation – Build and Modernize AI Apps\n\n- Security – Identity & Access Management\n\n- Azure Well Architected Review\n\n- Copilot for M365 Pre-Flight Checks\n\n- Secure Power Platform Framework\n\n- Secure Multicloud Environments Workshop\n\n- Managed XDR powered by Sentinel\n\nIntellectual Property & Artifacts\n\n- Foundry for A.I. by Rackspace – ICE\n\n- Foundry for A.I. by Rackspace – RITA\n\n- Foundry for A.I. by Rackspace – FAIR A.I. Landing Zone\n\n- Foundry for A.I. by Rackspace – Copilot for M365 Pre-Flight Checks\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nMake the most of Microsoft Azure for innovation, agility, cost savings and operational efficiency.\n\nUnlock the power of intelligent CRM and ERP business applications in the cloud through connected platforms across Office 365 productivity applications and modern features that accelerate results.\n\nFind the right Microsoft 365 solutions for your specific business needs.\n\nAccelerate SAP adoption and optimization by engaging a team of certified experts and managed services across multicloud solutions that integrate across your environment.\n\nEnable multicloud and hybrid cloud to improve performance and optimize costs, so you can put your core business first.\n\nAdopt clouds and ease migration to accelerate innovation and optimize performance.\n\nGain cloud transparency, reduce waste and drive savings with FinOps.\n\nMove your cloud projects forward with expert cloud native security services\n\nMigrating Enterprise Applications to Microsoft Azure\n\nRackspace Technology is the Azure Solutions Expert\n\nThree Cost Management Stats IT Leaders Can’t Afford to Ignore\n\nThree Cost Management Stats IT Leaders Can’t Afford to Ignore\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 20 |
+
"word_count": 1073,
|
| 21 |
+
"crawled_at": "2025-11-25 09:07:26"
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"url": "https://www.rackspace.com/cloud/google-cloud",
|
| 25 |
+
"title": "Google Cloud Solutions | Private & Public Google Cloud",
|
| 26 |
+
"content": "As Google Cloud's first managed services partner (MSP), Rackspace Technology helps organizations of all sizes get the most out of their investment in Google Cloud. We have over 25 years of managed services experience and a track record of receiving recognition from Google Cloud. Our dedicated sales and delivery teams tackle cloud complexities so our customers can focus on pushing the boundaries of what's possible for their business.\n\nWe help you get the most out of your investment in Google Cloud by:\n\nPlanning, architecting and operating projects at scale\n\nLeveraging our 500+ Google Cloud technical certifications\n\nDelivering high-quality cloud computing expertise and end-to-end support 24x7x365\n\nProviding unparalleled Google Cloud operational support while deploying Rackspace Elastic Engineering to cost-effectively modernize your cloud environment\n\nFostering application modernization, new product innovation and advanced technology adoption\n\nDiscover the limitless possibilities of your Google Cloud data by partnering with FAIR.\n\nGoogle Cloud Migration Services\n\nExecute a fit-for-purpose migration of application workloads by lifting, shifting and optimizing VM-based workloads to Google Cloud.\n\nAccess continued support along your cloud journey.\n\nLanding Zone: Deliver a fully functional, secure and optimized foundation for your first workload in Google Cloud.\n\nDiscover: Automated application discovery utilizing Stratozone.\n\nLift and shift VMs and applications utilizing Migration Center\n\nPipeline-driven Migrations utilizing Cloud Build to add modern automation to your application deployments.\n\nExplore Rackspace Professional Services→\n\nGoogle Cloud Infrastructure Services\n\nSecure your business with Google Cloud's enterprise-grade environment. Our experts craft your cloud journey, empower teams, and ignite innovation with pre-built solutions, minimizing complexity, risk, and time to results.\n\nArchitecture Design: gain access to a team of experts to help you discover a solution design for your cloud computing needs\n\nInfrastructure Landing Zone build-outInfrastructure-as-code to automate building landing zones.\n\nInfrastructure-as-code to automate building landing zones.\n\nNetwork Setup & Configuration: Get expert help building secure access controls and firewalls for your systems.\n\nCloud ProvisioningUtilizing CI/CD and Infrastructure-as-code to automate building out Compute Engine Servers and Kubernetes\n\nUtilizing CI/CD and Infrastructure-as-code to automate building out Compute Engine Servers and Kubernetes\n\nSecurity Design and ImplementationSecurity Design and implantation based on platform and industry standards utilizing items such as IAM, Organization Policies, Cloud KMS, Monitoring.\n\nSecurity Design and implantation based on platform and industry standards utilizing items such as IAM, Organization Policies, Cloud KMS, Monitoring.\n\nIdentity & Access Management configuration\n\nBackup and Disaster Recovery (DR)\n\nGoogle Cloud Application Development & Solutions\n\nRun applications across any IT infrastructure with cloud-native modernization for superior scalability, performance and reliability.\n\nCloud Application Discovery & Requirements:Our team of business analysts and architects work to understand your business goals and visions and translate those into actionable requirements and user stories.\n\nUI/UX:Our experienced UI/UX team will translate requirements into an immersive, intuitive and modern user interface.\n\nCloud Application Architecture & Software Development:With user stories and user interface designs as their guide, our solutions architects and software engineers will work to iteratively design and build your applications.\n\nCloud Application Automation & Deployment:Our cloud engineers automate managing infrastructure and deploying applications, turning software updates from major events to daily occurrences.\n\nRead more on application development in this customer success story. Clickhere.\n\nAlways-on managed support:Allows you to focus internal resources on core business activities and accelerate your journey through the cloud with the peace of mind that your environment is fully managed.\n\nOn-demand cloud expertise:Gain the benefits of public cloud without having to incur the challenge and expense of self-managing it.\n\nAccess to expertise:Immediate access to technical expertise and capacity to manage cloud infrastructure, tools and applications.\n\nCloud resiliency:Leverage cloud native and Rackspace Technology proprietary tooling to remove manual intervention and increase resiliency.\n\nInnovate with cloud services:Take advantage of the latest cloud features and capabilities to solve challenging business problems.\n\nCloud Migration Services:Elevate workloads in the cloud\n\nApplications:Complete build, release and go-live support\n\nInfrastructure build-out services:Get infrastructure-as-code templates\n\nSecurity:Guidance and architecture based on platform best practices and industry standards\n\nDevOps services:CI/CD pipelines and automated deployments\n\nBackup and disaster recovery (DR) planning:Expert DR environment implementation and testing\n\nPerformance management for load testing and optimization.\n\nCost optimization:Tagging, billing review, usage and RIs.\n\nExploreRackspace Elastic Engineering→\n\nFAIR for Google Cloud is a global practice dedicated to accelerating the responsible adoption of Artificial Intelligence (AI) solutions across industries, leveraging the cutting-edge technology of Google Cloud.\n\nOur team of certified Google Cloud experts will work closely with you to design, migrate, fortify, operate and optimize your Google Cloud data environment so you can leverage the power of modern solutions like generative AI. Realize the value of data-driven insights and decision-making — and its potential to help you innovate and drive your business into your desired vision of the future.\n\nDiscover the limitless possibilities of your Google Cloud data by partnering with FAIR.\n\nAccelerated Migration to Google Cloud — Powered by Rackspace Technology\n\nGet the best of both worlds: Google Cloud’s performance capabilities with the migration expertise of Rackspace Technology.\n\nWe’re experts on your side, doing what it takes to get the job done right — from the first consultation to your daily operations.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nExplore our expert credentials and industry recognition.\n\nWe established our partnership with Google Cloud in 2017, making us one of the first Managed Service Providers for Google Cloud.\n\nArtificial Intelligence - Vertex AI\n\nSmart Analytics - Data Warehouse Modernization\n\nSmart Analytics - Asset tracking\n\nData Management - Enterprise Databases Migration\n\nData Management - MySQL/postgre SQL migration\n\nSecurity - Identity & device management\n\nIndustrial Goods & Manufacturing\n\nApp Modernization - Custom-built app migration\n\nApp Modernization - Modernize Legacy Applications\n\nApp Modernization - Cloud Native Application Development\n\nInfrastructure Modernization - VM Migration\n\nInfrastructure Modernization - SAP on Google Cloud\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nMobilize the power of Google Cloud data, backed by deep experience and expertise that can help you maximize its value.\n\nEssential tools and capabilities across a global platform, backed by Rackspace Application Services.\n\nAdopt clouds and ease migration to accelerate innovation and optimize performance.\n\nGain cloud transparency, reduce waste and drive savings with FinOps.\n\nAccess expert cloud services to design, build, migrate, manage and optimize your public cloud for continual IT innovation.\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 27 |
+
"word_count": 1137,
|
| 28 |
+
"crawled_at": "2025-11-25 09:07:29"
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"url": "https://www.rackspace.com/cloud/multi-cloud",
|
| 32 |
+
"title": "Rackspace Multicloud Solutions | Optimize & Secure",
|
| 33 |
+
"content": "Multicloud embodies a variety of cloud types (public, private and on-premises) from a variety of cloud providers (e.g., Amazon Web Services, Google Cloud Platform and Microsoft® Azure®) and/or SaaS providers (e.g., Salesforce, SAP, Oracle) in the same environment. While most companies would agree that a multicloud approach is the future, not many realize that multicloud is already their present.\n\nWhile multicloud is on the rise, data centers are increasingly falling out of favor. According to theMulticloud Annual Research Report 2022 by Rackspace Technology, 56% of technology decision makers surveyed no longer envision plans of owning a data center in five years. The cloud is well established, and companies are investing in public or private cloud containers, serverless and edge computing technologies.\n\nRackspace Technology optimizes your multicloud environment, so you can realize its full benefits faster:\n\nTake advantage of public cloud scalability for applications with heavy or unpredictable traffic\n\nUse edge computing technologies for applications requiring low latency\n\nDeploy business-critical applications on your private cloud to ensure security and control — all within your connected, multicloud environment\n\nEstablish multicloud security for your data and applications\n\nProfessional and Managed Services\n\nDefining your seamless multicloud\n\nOur multicloud specialists evaluate your current state of application and data integration across the entire organization, including hybrid IT deployments.\n\nWe’ll deliver integration recommendations that accelerate your business goals and strategies.\n\nRackspace Technology provides professional and managed services that deliver successful hybrid multicloud environments.\n\nExplore Rackspace Technology Professional Services→\n\nRackConnect® Globalprovides multicloud connectivity to unify your entire cloud deployment.\n\nSecurely connect your Rackspace Technology dedicated hosting environment to other data center locations and the private and public clouds of your choice, including AWS, Microsoft Azure, OpenStack®, VMware® and Google Cloud Platform.\n\nRackConnect Global includes our 24x7x365 support and a 99.95% connectivity uptime guarantee.\n\nUse VMware HCX, Managed Guest OS Services and Managed External Storage for your VMware Cloud™ on AWS workloads.\n\nGet a multicloud interconnect that enables fast, simple and secure application migration and mobility between VMware-based private clouds and VMware Cloud on AWS.\n\nIncrease storage scalability without the need to purchase additional hosts.\n\nAccess 24x7x365 OS support for your VMs.\n\nAnthos gives you a consistent platform for all your application-oriented cloud deployments, from legacy, on-premises applications to cloud native, while offering a service-centric view of all your environments.\n\nRackspace Technology managed and professional services help you architect, deploy, configure and manage workloads.\n\nAWS Outposts Managed by Rackspace\n\nTake advantage of our extensive global data center footprint to place your AWS Outposts close to your workloads, while Rackspace Technology addresses your migration and facilitates cloud native application modernization.\n\nDeploy AWS Outposts on-premises, giving you the specialized, comprehensive capabilities you need to address your toughest challenges while recouping value from your data center investment.\n\nAccess our global footprint of colocation data centers to help bring your workloads as close to your data as possible.\n\nMicrosoft Azure Stack delivers agility, scalability, security and control in your own private Azure cloud.\n\nAzure Stack is as complex as it is powerful and can be challenging for organizations to deploy and run without supplementary expertise. With years of Azure experience, deep expertise and unmatched support, we’ll deploy and manage your Azure Stack environment.\n\nDell Technologies Cloud Platform\n\nFuture-proof your VMware SDDC with automated lifecycle management\n\nEstablish a frictionless path to public cloud\n\nA single hybrid cloud platform for modern and traditional applications\n\nConsistent operations from private cloud to public cloud and multicloud environments\n\nStore your data adjacent to the cloud — not in it. Cost-effectively make your data available to whichever cloud services you need. Eliminate public cloud egress fees and avoid cloud lock-in.\n\nRackspace Data Freedom is available globally in all Rackspace Technology data centers and colocation facilities and connects to most cloud environments, including all VMware Cloud environments, AWS, Microsoft® Azure® and Google Cloud Platform™.\n\nRethinking support models for a cloud native world\n\nSee how Rackspace Elastic Engineering breaks down the traditional build and operate barriers and opens up new possibilities to accelerate innovation.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nWe’re on your side, doing what it takes to get the job done right — from the first consultation to your daily operations. Contact us for a free quote.\n\nExplore our expert credentials and industry recognition.\n\nProviding innovation and leadership in the technology services industry for 25+ years.\n\nMajor Player in IDC MarketScape: Worldwide Cloud Professional Services 2024 Vendor Assessment\n\nMajor Player in IDC MarketScape: Worldwide Managed Public Cloud Services 2023 Vendor Assessment\n\nLeader in ISG Provider Lens: Private/Hybrid Cloud – Data Center Services & Solutions, 2024\n\nLeader in ISG Provider Lens: AWS Partners Ecosystem, 2024\n\nLeader in ISG Provider Lens: Microsoft Partner Ecosystem, 2024\n\nLeader in ISG Provider Lens: Google Cloud Partner Ecosystem, 2024\n\nLeader in ISG Provider Lens: Multi Public Cloud Solutions & Services, 2023\n\nNiche Player in Gartner Magic Quadrant and Critical Capabilities for Public Cloud IT Transformation Services, 2023\n\nGold Microsoft Partner Azure Expert MSP\n\nDell Technologies Global Titanium Partner\n\nBroadcom Pinnacle Partner VMware Cloud Service Provider\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nExperience how AWS’s leading-edge cloud capabilities can help you work smarter, lower costs and innovate with agility.\n\nManaged Hosting on bare metal delivers maximum uptime, visibility, security and control for your custom dedicated needs.\n\nMaintain ownership, choice and hands-on access to your compute and storage, without the costs of HVAC and building management.\n\nAccelerate your business with Rackspace Services and tailored solutions on Google Cloud's innovative technology.\n\nSolve cloud challenges with a managed Microsoft Azure solution that helps you build new revenue streams, increase efficiency and deliver incredible experiences.\n\nHarness the power of a fully customized private cloud with enterprise-grade security, compliance and expert management. Tailored for mission-critical workloads, this solution offers unrivaled flexibility and control, giving your enterprise a secure, scalable foundation to innovate and grow.\n\nAs a VMware Cloud Service Provider Pinnacle partner, Rackspace Technology delivers fully managed VMware Cloud Foundation™ solutions to help optimize performance, lower costs and ensure compliance.\n\nAccelerate results with AI-driven innovation\n\nRackspace Technology helps organizations turn data into impact with production-ready AI solutions built on private, public and hybrid cloud platforms. Our expertise in generative and agentic AI empowers you to deliver secure, scalable and responsible outcomes across your enterprise.\n\nLeverage our IoT and Edge experience to increase efficiencies and reduce time to market for your IoT adoption projects\n\nRide Rackspace’s private network to lower your connectivity costs, improve resilience and improve data privacy.\n\nSimplifying Multicloud Security in a Cloud Native World\n\nSimplifying Multicloud Security in a Cloud Native World\n\nDigging Into the Data: What’s the Future of Multicloud?\n\nHow businesses can combat complexities to become multicloud masters\n\nReady to start the conversation?\n\nReady to start the conversation?\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 34 |
+
"word_count": 1237,
|
| 35 |
+
"crawled_at": "2025-11-25 09:07:32"
|
| 36 |
+
},
|
| 37 |
+
{
|
| 38 |
+
"url": "https://www.rackspace.com/security",
|
| 39 |
+
"title": "Cloud Security & Cybersecurity Solution | Rackspace Technology",
|
| 40 |
+
"content": "Explore Cloud Security Solutions\n\nThe constant evolution of threats and the race for more sophisticated tools to combat them means the security landscape changes quickly. The goal is continuous improvement, but these factors make it challenging to maintain a cutting-edge security posture as well as deep expertise. Many IT teams struggle being caught in an ongoing cycle of “reactive mode” which limits the team’s ability to look ahead in a proactive manner.\n\nNow more than ever, it is critical to understand your vulnerabilities and assemble the right solutions to strengthen and secure your environments. The security experts at Rackspace Technology™ can help you detect and proactively respond to threats, address your compliance requirements, and help minimize damage and downtime from breaches. We can augment your team to provide specialized, around-the-clock support for your security needs, so you can keep your business moving forward with confidence.\n\nProtect web applications against data breaches and security threats, bolster security and performance, and build better experiences for your end users.\n\nWe’ll ensure your data security while helping you address compliance and regulatory storage requirements.\n\nSecure your infrastructure stack from malicious intrusion and unauthorized system access.\n\nGain 24x7x365 expert SOC and endpoint security services to detect and respond to vulnerabilities and emerging threats in your multicloud environments.\n\nKeep your cloud environments in compliance and your architecture secure, giving you the confidence and peace of mind to accelerate your business.\n\nYour Elastic Engineering for Security pod works as an extension of your team helping you meet cloud security and compliance goals.\n\nProvide secure access, anywhere, on any device\n\nRestore critical business operations faster and more securely in the event of a cyber attack.\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nMove your cloud projects forward with expert cloud native security services\n\nOn-Premises Security and Compliance Services\n\nExpert SOC services to protect your applications and data across all of your multicloud environments, data centers and branch locations.\n\nDo You Know Your Cybersecurity Risk Score?\n\nCybersecurity Risk Self-Assessment\n\nAs your business becomes more digital, your cybersecurity must keep pace. In a free 30-minute assessment, you’ll speak directly with our team to receive a personalized risk score, best-practice guidance, and tailored recommendations to strengthen your defenses and stay ahead of threats.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nElastic Engineering for Security\n\nSix security challenges — and how to overcome them\n\nReady to start the conversation?\n\nReady to start the conversation?\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 41 |
+
"word_count": 526,
|
| 42 |
+
"crawled_at": "2025-11-25 09:07:35"
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"url": "https://www.rackspace.com/security/compliance",
|
| 46 |
+
"title": "Cloud Security & Cybersecurity Solution | Rackspace Technology",
|
| 47 |
+
"content": "Explore Cloud Security Solutions\n\nThe constant evolution of threats and the race for more sophisticated tools to combat them means the security landscape changes quickly. The goal is continuous improvement, but these factors make it challenging to maintain a cutting-edge security posture as well as deep expertise. Many IT teams struggle being caught in an ongoing cycle of “reactive mode” which limits the team’s ability to look ahead in a proactive manner.\n\nNow more than ever, it is critical to understand your vulnerabilities and assemble the right solutions to strengthen and secure your environments. The security experts at Rackspace Technology™ can help you detect and proactively respond to threats, address your compliance requirements, and help minimize damage and downtime from breaches. We can augment your team to provide specialized, around-the-clock support for your security needs, so you can keep your business moving forward with confidence.\n\nProtect web applications against data breaches and security threats, bolster security and performance, and build better experiences for your end users.\n\nWe’ll ensure your data security while helping you address compliance and regulatory storage requirements.\n\nSecure your infrastructure stack from malicious intrusion and unauthorized system access.\n\nGain 24x7x365 expert SOC and endpoint security services to detect and respond to vulnerabilities and emerging threats in your multicloud environments.\n\nKeep your cloud environments in compliance and your architecture secure, giving you the confidence and peace of mind to accelerate your business.\n\nYour Elastic Engineering for Security pod works as an extension of your team helping you meet cloud security and compliance goals.\n\nProvide secure access, anywhere, on any device\n\nRestore critical business operations faster and more securely in the event of a cyber attack.\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nMove your cloud projects forward with expert cloud native security services\n\nOn-Premises Security and Compliance Services\n\nExpert SOC services to protect your applications and data across all of your multicloud environments, data centers and branch locations.\n\nDo You Know Your Cybersecurity Risk Score?\n\nCybersecurity Risk Self-Assessment\n\nAs your business becomes more digital, your cybersecurity must keep pace. In a free 30-minute assessment, you’ll speak directly with our team to receive a personalized risk score, best-practice guidance, and tailored recommendations to strengthen your defenses and stay ahead of threats.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nElastic Engineering for Security\n\nSix security challenges — and how to overcome them\n\nReady to start the conversation?\n\nReady to start the conversation?\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 48 |
+
"word_count": 526,
|
| 49 |
+
"crawled_at": "2025-11-25 09:07:38"
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"url": "https://www.rackspace.com/managed-kubernetes",
|
| 53 |
+
"title": "Kubernetes | Tap into a Unified Multi Cloud MPK Service",
|
| 54 |
+
"content": "Begin Your Application Modernization Journey\n\nBusinesses today are moving to containers and Kubernetes, also known as K8s, to improve delivery metrics, accelerate digital transformation and decrease time to market. But this transition to a modern application framework often leaves businesses struggling to streamline the delivery of critical applications and infrastructure while delivering meaningful results to their customers and shareholders.\n\nRackspace Managed Platform for Kubernetes (MPK), powered by Platform9’s Managed Kubernetes (PMK) solution, solves these common customer challenges by providing a:\n\nSingle pane of glass for deploying and clusters managing across private and public cloud\n\nCurated platform experience through frequently requested infrastructure services for containerized workloads and applications with a high degree of Kubernetes security\n\nSpecialized support team comprised of Certified Kubernetes Administrators versed in Kubernetes, Platform9 and multicloud\n\nDownload Kubernetes-as-a-service Data Sheet→\n\nA pod includes a small group of Certified Kubernetes Administrator (CKA)-certified engineers, a lead architect and an engagement manager who work together to:\n\nDesign your MPK and Kubernetes deployment\n\nManage operations of the entire stack using infrastructure as code\n\nProvide monitoring, alerting and response for the cluster, cluster services and underlying infrastructure\n\nAssist with the platform upgrade for all Infrastructure Applications and Kubernetes features alongside supported cluster services\n\nEnsure clusters are properly configured and conform to official Kubernetes specifications\n\nOptimize performance by establishing and maintaining performance targets through routine audits, analysis and recommendations\n\nManage security through timely upgrades, sound identity posture, container image scanning, runtime security and network protection\n\nBy design, Kubernetes offers portability across clouds. Rackspace Managed Platform for Kubernetes enables the movement of your container-based applications across multiple clouds and environments for greater flexibility and accelerated time to market.\n\nThe Rackspace Managed Platform for Kubernetes includes support for Rackspace Hosted Kubernetes on Bare Metal, AWS EKS and support for Azure Kubernetes Service.\n\nThe Rackspace Kubernetes Stack provides a consistent set of technologies across all your clusters for repeatable and scalable deployments.\n\nAt the core of the stack, you’ll find the Infrastructure, Certified, and Catalog Application categories:\n\nInfrastructure Applications are fully integrated into MPK's software with one-click virtual machine deployments and managed upgrades.\n\nCertified Applications are deployable through MPK's software, but integration is more limited with lifecycle activities managed by Rackspace Technology and the customer.\n\nCatalog Applications are commonly open source and limited to community support. We deliberately defined this category because we realize that no two customers are alike, and it is critical that MPK allows for some level of customization.\n\nWe secure your technology stack from the infrastructure to the Kubernetes cluster, including the containers running inside the cluster and the additional services required to run applications. We care of it, so you don’t have to.\n\nBased on the same technology that allows Google to run billions of containers each week, Kubernetes can easily scale to meet the needs of even the fastest-growing enterprise.\n\nKubernetes Certified Service Provider\n\n500+ Kubernetes Clusters Managed\n\n50+ Certified Kubernetes Administrators (CKA)\n\nWe’re experts on your side, doing what it takes to get the job done right — from the first consultation to your daily operations.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nDiscover related solutions to help you achieve smarter business outcomes.\n\nExperience how AWS’s leading-edge cloud capabilities can help you work smarter, lower costs and innovate with agility.\n\nAccelerate your business with Rackspace Services and tailored solutions on Google Cloud's innovative technology.\n\nSolve cloud challenges with a managed Microsoft Azure solution that helps you build new revenue streams, increase efficiency and deliver incredible experiences.\n\nTap into the full benefits of the cloud by building applications with cloud native technologies, modern architectures and automated development workflows.\n\nAdopt clouds and ease migration to accelerate innovation and optimize performance.\n\nUnlocking Agility and Innovation in the Cloud with Containers and Serverless\n\nUnlocking Agility and Innovation in the Cloud with Containers and Serverless\n\nRackspace Technology and Platform9 partnership simplifies Kubernetes adoption\n\nKubernetes Explained for Business Leaders\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 55 |
+
"word_count": 761,
|
| 56 |
+
"crawled_at": "2025-11-25 09:07:41"
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"url": "https://www.rackspace.com/managed-aws",
|
| 60 |
+
"title": "AWS for your Cloud, Apps, Data and Security | Rackspace Cloud",
|
| 61 |
+
"content": "Discover how we’re rewriting the rules of enterprise modernization\n\nIn this video, you'll see how we transformed a full-stack Python application in under 10 minutes — a process that usually takes months and costs hundreds of thousands of dollars. Learn how our advanced automation and tools like Kiro deliver hyperspeed modernization, zero downtime and maximum security, all backed by the deep cloud expertise of Rackspace Technology. If you're ready to see what true transformation looks like, this is a must-watch.\n\nAccess continued support along your cloud journey.\n\nDemystify your AWS spend and maximize your ROI.\n\nAdopt cloud native technologies where it makes sense for your business.\n\nMigrate legacy media archives to AWS using AI to digitize, ingest, process, tag, store and leverage digital media content via a digital supply chain management system.\n\nExplore Rackspace Professional Services for AWS→\n\nBring your software into the future with improved scalability, reliability and performance while adding new capabilities.\n\nModernize your applications with a cloud native approach to development, data, teams and processes.\n\nEnhance customer experiences and improve operational efficiencies with intelligent applications powered byIoTcapabilities. Watch our webinar “Industrial IoT and Smart Manufacturing”.\n\nMigrate to AWS data and analytics services for improved efficiency.\n\nBuild a modern data platform tailored to your business needs that delivers business intelligence when and where you need it.\n\nEmpower your data to make automated recommendations, take preemptive action, and streamline decision-making with AI and machine learning.\n\nMaintain peak performance with access to a dedicated team of multi-disciplinary data experts for ongoing architecture, enablement and engineering services.\n\nAlleviate the complexity of security and compliance in your AWS environment with consultative services to help share the responsibility of defining security requirements for new AWS deployments, as well as migrations from existing vendors.\n\nModernize your approach to security with expert deployment and management of the right security technologies for your AWS environment – including AWS security tools like such as AWS Security Hub, AWS IAM Access Analyzer, Amazon GuardDuty, AWS Shield, AWS WAF, and AWS Firewall Manager.\n\nDesign and build cloud security controls to address compliance mandates, such as PCI-DSS, HIPAA, and more.\n\nImprove your cloud security posture by understanding cloud threats and vulnerabilities, with expert support to remediate settings that don’t align to industry benchmarks and best practices.\n\nReduce risk in your AWS environment with access to certified AWS security experts, including those who have been trained and are directly supported by the AWS Shield Response Team (ASRT).\n\nRackspace Government Cloud (RGC) by Rackspace Government Solutions is a NIST-based offering purpose-built with inheritable security controls for the independent software vendor and systems integrator with federal and state compliance needs.\n\nRackspace Government Solutions provides deep expertise dedicated to a FedRAMP AWS Platform-as-a-Service and a “born-in-the-cloud” mindset to help you modernize in a “Cloud Smart” public sector domain.\n\nAs an AWS Premier Consulting Partner with 15 consulting competencies, Rackspace Technology expedites your FedRAMP compliance journey through our JAB P-ATO Platform as a Service.\n\nRGC is purpose-built to help organizations achieve Assessment & Authorization faster and with cost savings of up to 70%. Whether you are a government agency, systems integrator, or independent software vendor, RGC is the foundation you need to rapidly deliver mission value via the cloud.\n\nExpertise across many AWS competencies including: data analytics, DevOps, higher education, financial services, healthcare, industrial software, IoT, machine learning and migration.\n\nExplore Rackspace FedRAMP Compliance→\n\nTake advantage of a Infrastructure Modernization solution across 11 services aimed at creating a secure, resilient and enterprise-grade cloud environment. Our certified experts work closely with you to align cloud architecture to meet your specific requirements and goals, empower your teams and drive rapid innovation on AWS.\n\nRackspace has an extensive standardized and purpose-built library of Reference Architectures for AWS that speeds up delivery, fully embraces cloud-native, maximizes AWS funding and reduces your costs.\n\nReduced complexity:We’ll break down gargantuan tasks into achievable milestones.\n\nMinimized risk:Weekly sprints during engagement and stakeholder alignments are key components to our delivery methodology to reduce risk.\n\nFaster time-to-value:Through our expertise, proven delivery methodology, and reference architectures (IP), we accelerate time-to-value and business outcomes.\n\nCloud Migration Services:Elevate workloads in the cloud\n\nApplications:Complete build, release and go-live support\n\nInfrastructure build-out services:Get infrastructure-as-code templates\n\nSecurity:Guidance and architecture based on platform best practices and industry standards\n\nDevOps services:CI/CD pipelines and automated deployments\n\nBackup and disaster recovery (DR) planning:Expert DR environment implementation and testing\n\nPerformance management for load testing and optimization.\n\nCost optimization:Tagging, billing review, usage and RIs.\n\nExploreRackspace Elastic Engineering→\n\nAlways-on managed support:Allows you to focus internal resources on core business activities and accelerate your journey through the cloud with the peace of mind that your environment is fully managed.\n\nOn-demand cloud expertise:Gain the benefits of public cloud without having to incur the challenge and expense of self-managing it.\n\nAccess to expertise:Immediate access to technical expertise and capacity to manage cloud infrastructure, tools and applications.\n\nCloud resiliency:Leverage cloud native and Rackspace Technology proprietary tooling to remove manual intervention and increase resiliency.\n\nInnovate with cloud services:Take advantage of the latest cloud features and capabilities to solve challenging business problems.\n\nExplore Rackspace Modern Operations→\n\nRethinking support models for a cloud native world\n\nSee how Rackspace Elastic Engineering breaks down the traditional build and operate barriers and opens up new possibilities to accelerate innovation.\n\nWe’re experts on your side, doing what it takes to get the job done right — from the first consultation to your daily operations.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nExplore our expert credentials and industry recognition.\n\nProviding innovation and leadership in the technology services industry for 20+ years.\n\nMicrosoft Workloads Consulting Competency\n\nSmall and Medium Business (SMB) Competency\n\nTravel & Hospitality Competency\n\nMarketplace Skilled Consulting Partner\n\nAWS Public Sector Solution Provider\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nMaximize the value of your data and the power of AWS cloud with the experience and expertise you need from Rackspace Technology.\n\nLearn about AWS Marketplace offerings from Rackspace Technology.\n\nGet expert guidance that helps you realize the benefits of modern applications and improve your return on investment.\n\nAccelerate results with AI-driven innovation\n\nRackspace Technology helps organizations turn data into impact with production-ready AI solutions built on private, public and hybrid cloud platforms. Our expertise in generative and agentic AI empowers you to deliver secure, scalable and responsible outcomes across your enterprise.\n\nAdopt clouds and ease migration to accelerate innovation and optimize performance.\n\nTap into the full benefits of the cloud by building applications with cloud native technologies, modern architectures and automated development workflows.\n\nGain cloud transparency, reduce waste and drive savings with FinOps.\n\nMake predictive decisions that accelerate innovation and increase ROI with integrated data architectures and AI.\n\nLeverage our IoT and Edge experience to increase efficiencies and reduce time to market for your IoT adoption projects\n\nInfrastructure Modernization: Professional Services\n\nE-Book: Charting Your Cloud Migration for Financial Services\n\nFrom AI to IoT: How the Power of AWS Cloud Propels Innovation\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 62 |
+
"word_count": 1267,
|
| 63 |
+
"crawled_at": "2025-11-25 09:07:46"
|
| 64 |
+
},
|
| 65 |
+
{
|
| 66 |
+
"url": "https://www.rackspace.com/professional-services",
|
| 67 |
+
"title": "Cloud Services and Solutions | Rackspace Professional Services",
|
| 68 |
+
"content": "Eliminate technical debt, fix flawed security and bring disparate data together with our IT security professional services. When your technology environment is working for you, your company can focus on getting ahead rather than on what holds you back. Our experts design and build cloud solutions that move your business forward.\n\nUsing cloud-native approaches based on leading DevOps practices, we help you modernize your applications and data, build in the leading security solutions and deploy everything on the right cloud infrastructure for maximum effectiveness. We’ve assembled a professional services team that brings you expertise for leading applications, data architectures and technologies, and security best practices – across every major cloud platform.\n\nWe make keeping up with rapidly changing cloud technologies our business, so you can get the most out of yours.\n\nCloud strategy architecture roadmap:Develop a cloud strategy and architecture roadmap using a structured Cloud Adoption Framework and expert guidance.\n\nCIO advisory workshop:Complimentary advisory session to discuss how leading organizations are using the public cloud to make shifts across technology, processes and people to drive business solutions.\n\nCloud Readiness Assessment:Cloud expert-led, multi-week consultative engagement that equips you with a holistic review of your IT environment, and provides a strategic plan with actionable insights and unbiased recommendations.\n\nCloud Maturity Workshop:A complete, objective people, process and technology assessment against cloud best practices so you know what development areas your business should prioritize.\n\nFinOps Assessment Workshop:FinOps consultant led, two-day, interactive workshop to discuss components of best-in-class cost governance practices, highlighting potential configuration gaps and building a prioritized roadmap for optimal cost savings.\n\nPlatform migrations:Execute a fit-for-purpose migration of application workloads by lifting, shifting and optimizing VM-based workloads to the cloud.\n\nLanding Zone:Deliver a fully functional, secure and optimized foundation for your first workload in the public cloud.\n\nLift-and-shift migrations:Accelerate your time-to-market with a fast and proven application migration to the public cloud in a like-for-like manner that minimizes disruption, downtime and risk.\n\nPipeline Driven Migration:Accelerate your application-focused migration journey to the cloud and add modern automation.\n\nTake advantage of a Modern Cloud Infrastructure solution across 11services aimed at creating a secure, resilient, and enterprise-grade cloud environment. Our certified experts work closely with you to align cloud architecture to meet your specific requirements and goals, empower your teams, and drive rapid innovation across AWS, Azure, & GCP.\n\nackspace has an extensive standardized and purpose-built library of Reference Architectures across AWS, Azure, & GCP that speeds up delivery, fully embraces cloud-native, maximizes Hyperscaler funding, and reduces your costs.\n\nBenefits of leveraging Rackspace for your Modern Cloud Infrastructure needs:\n\nReduce complexity:We’ll break down gargantuan tasks into achievable milestones.\n\nMinimize risk:Weekly sprints during engagement and stakeholder alignments are key components to our delivery methodology to reduce risk.\n\nDeliver business objectives faster:Through our expertise, proven delivery methodology, and reference architectures (IP), we accelerate time-to-value and business outcomes.\n\nModernization & Application Development\n\nApplication modernization:Develop an application strategy and architecture roadmap using structured offerings and expert guidance.\n\nRackspace Application Modernization Workshop:Overcome the complexity of modernization and get on the fast track to modern application development. Determine which path is right for your business — rehost, re-platform, refactor, rebuild or replace.\n\nLegacy Systems Assessment:Maximize the agility, reliability and cost-saving benefits of the cloud. We evaluate your application portfolio and provide recommendations for a pilot application to modernize, including backlog, roadmap and cost estimate.\n\nSaaS workshop:No matter what stage you’re on in your SaaS journey, we have the expertise to provide an assessment and reference points with clear recommendations for improvements.\n\nCloud Native Applications & Development:Modernization of your legacy workloads to get the benefits of cloud native platforms.\n\nModern Application Development:Don’t just run applications in the cloud, run applications made for the cloud. Leverage our cloud native development expertise to build a modern infrastructure and applications.\n\nApplication Discovery and Wireframing Assessment:Our team of business analysts and architects work to understand your goals and vision, and translate them into actionable requirements and an intuitive, modern user interface.\n\nIoT design and development:Design and implement your IoT strategy leveraging our expertise. We provide everything you need to move forward quickly and efficiently with your IoT business goals.\n\nIoT Assessment & Advisory:Kickstart your IoT journey by working with our experts to define your business challenges and goals and develop a roadmap that maximizes your ROI.\n\nIoT Platform Modernization:An IoT platform is responsible for provisioning and managing your fleet of devices, as well as ingesting, processing, aggregating and exposing the data that makes IoT valuable. Through our discovery and design, you gain a cloud native, robust and scalable IoT platform that can seamlessly grow with your business.\n\nIIoT Smart Factory Accelerator:Reduce the time required to monitor and optimize IIoT. Realize the value of predictive maintenance, real-time alerts and analytics to lower costs, mitigate potential risks and respond quickly to time-sensitive issues.\n\nData and AI strategy and roadmap:Create a holistic data and AI strategy that allows your organization to drive business value, mitigate risk and build a data-driven company.\n\nDatabase Migration and Modernization Assessment:Data experts will review modern cloud architectures and explore the tools available and make recommendations. We’ll overview where you are today in your data modernization journey — and where you want to go.\n\nData warehouse assessment:Data experts will assess and provide recommendations on the right-fit cloud data warehouse architecture for modernization to improve your data products’ efficiency, performance, scalability, security and reliability.\n\nHadoop assessment:Improve the performance, scalability, security and reliability of your Hadoop environment, while making it easier to accelerate innovation at scale. We can assess and provide recommendations on the right-fit cloud data architecture for accelerating modernization and realizing the benefits of scale.\n\nWell Architected Review for data assessment:A high-level audit of all data critical workloads against the Well Architected Framework.\n\nCloud data management capabilities assessment:An assessment to benchmark your current technology, processes and people against the enterprise-grade CDMC framework used by Fortune 500 data teams.\n\nData design labs:Helps you build a strategy and accelerate your path to production by providing a deeper understanding of data and analytics services, including database modernization, ETL migration, and modernization for batch streaming, modern data platforms, data lakehouse architecture and machine learning.\n\nCloud data migration:Move data across technologies and platforms, while minimizing risks, disruption and downtime.\n\nDatabase modernization:Modernize your databases to make them more efficient, scalable, reliable, secure and easier to manage.\n\nData warehouse migration:Shift data to the cloud to enable powerful reporting of critical analytics workloads.\n\nData lake migration:Migrate and store data in a flexible and cost-effective solution.\n\nBusiness intelligence modernization:Make data and analytics actionable and operational to your teams with the latest BI tools and integrations.\n\nCloud data platform:Realize value from your data faster with a cloud data platform. Gain insights and efficiencies from automated reporting and visualization.\n\nCloud data platform:Envision the possibilities with data analytics and we’ll help you develop practical solutions that focus on technology and business transformation.\n\nData-driven applications:Build a foundational platform that enables sharing siloed data across teams and other networks.\n\nOperating model and governance:Ensure data quality and define data workflows for successful data lifecycle management. We help you define and align to industry best practices for long-term data management.\n\nAnalytics and applied AI:Enable automated insights with analytics by training data to be intelligent and actionable with AI and machine learning.\n\nCloud native analytics:Enable a data-driven organization via robust analytics capabilities designed to meet your goals.\n\nDemand forecasting:Make intelligent guesses about future behaviors, like predicting churn, forecasting purchasing habits and anticipating inventory.\n\nAnomaly detection:Quickly identify data that doesn’t align with historical patterns.\n\nComputer vision:Interpret and classify information from images, videos and other content.\n\nConversational AI:Leverage and train data for the advanced speech and text capabilities used by virtual agents.\n\nIntelligent document processing:Scalable, cloud native solution for turning documents, photos and other data points into digitized, searchable content.\n\nCloud security advisory and assurance:To protect digital investments, modernize your cloud security and compliance with expert consultative services to help assess and transform cloud security and governance at scale, including security policies, architecture, assurance, governance and risk management — all while ensuring resiliency and compliance.\n\nCloud Security Posture Assessment:Understand your current cloud security posture with expert design and implementation recommendations to create a security operation that can address gaps and help reduce threat risks.\n\nRisk management and compliance:A suite of services designed to help you understand your business risks and compliance posture in the cloud, and help you build a plan to achieve and maintain compliance.\n\nSecurity architecture:Build security and privacy into your architecture, and provide technical support to teams that may not have the resources to staff senior security executive and technical roles.\n\nOffensive security:Enhance your cyber defenses by uncovering vulnerabilities, assessing the impact of a cyberattack and providing security assurance in your digital estate.\n\nSecurity advisory, engineering and compliance services also extend into:\n\nAdvanced identity protection and governance\n\nZero Trust architecture and networking\n\nApplication security and lifecycles\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nWe’re experts on your side, doing what it takes to get the job done right — from the first consultation to your daily operations.\n\nDecades of expertise matched with thousands of experts\n\nExpert guidance so you can maximize the benefits of modern cloud.\n\nProviding innovation and leadership in the technology services industry for 20+ years.\n\nBest Consultancy or System Integrator 2020, The Cloud Awards\n\nThe Forrester Wave™: Hosted Private Cloud Services in North America and Europe, Q2 2020\n\nOne of the leading AWS consulting partners with 14 competencies\n\nDell Technologies Four-time Partner of the Year Award\n\nGoogle Cloud First Premier Managed Services Partner\n\nGoogle Cloud Partner of the Year Award Winner\n\nMicrosoft Six-time Hosting Partner of the Year\n\nMicrosoft Azure Expert MSP designation for 2021\n\nOracle Cloud Service and Resell Partner\n\nSalesforce Platinum Consulting Partner\n\nVMware Premiere Service Provider\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nGrow revenue, increase efficiency and deliver the future with powerful cloud solutions delivered with industry-leading expertise.\n\nExplore powerful app solutions that help you gain efficiencies, improve operations and drive smarter business outcomes.\n\nHandle large volumes of data effectively and economically so you can accelerate your path to business insights.\n\nLet our certified security experts to secure your environments, helping you build efficiencies that let you deliver your future.\n\nAccelerating Enterprise IT Modernization with Hybrid Cloud\n\nReady to start the conversation?\n\nReady to start the conversation?\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 69 |
+
"word_count": 1826,
|
| 70 |
+
"crawled_at": "2025-11-25 09:07:53"
|
| 71 |
+
},
|
| 72 |
+
{
|
| 73 |
+
"url": "https://www.rackspace.com/data",
|
| 74 |
+
"title": "Data Solutions, Data Value and Insights | Rackspace Technology",
|
| 75 |
+
"content": "Continuous, reliable access to the right data at the right time can help you make quick, informed decisions that take you closer to reaching your business goals. But you must ensure your legacy data architecture and technology solutions can handle your ever-increasing volumes of data. Rackspace Technology® offers a broad portfolio of data analytics solutions and data services designed to help you accelerate the adoption of modern data solutions while enabling your business transformation.\n\nData Analytics and Business Insights\n\nGain real-time, actionable insights that help you assess risks, reduce costs and shape business decisions.\n\nUse your data to make data-driven decisions that will help you achieve your business goals.\n\nAccelerate results with AI-driven innovation\n\nRackspace Technology helps organizations turn data into impact with production-ready AI solutions built on private, public and hybrid cloud platforms. Our expertise in generative and agentic AI empowers you to deliver secure, scalable and responsible outcomes across your enterprise.\n\nDeploy the right database management systems to maximize the value of your data and increase agility.\n\nAccelerate data optimization while helping to ensure security, uptime and seamless access across your organization.\n\nMake predictive decisions that accelerate innovation and increase ROI with integrated data architectures and AI.\n\nMap out a winning data strategy and data governance plan to extract the most value from your data.\n\nEmbrace the transformative power of generative AI to help you leverage advanced AI/ML capabilities.\n\nAccelerate the value of your data and the cloud with leading-edge advisory, professional and managed services for leading next-gen platforms.\n\nUtilize unstructured data to gain insights that raise your revenue, drive customer experiences and power business success.\n\nDiscover related technology platforms and solutions to help you achieve smarter business outcomes.\n\nMaximize the value of your data and the power of AWS cloud with the experience and expertise you need from Rackspace Technology.\n\nGenerate powerful business insights from your data quickly with a highly experienced team of big data experts.\n\nGain more business insight and identify data-driven growth opportunities.\n\nModernize your data warehouse so you can accelerate the delivery of deeper insights that can help you scale and drive your business forward.\n\nWith storage costs decreasing, it’s easier to keep more of your data for its potential value — if it’s indexed, protected and accessible.\n\nMobilize the power of Google Cloud data, backed by deep experience and expertise that can help you maximize its value.\n\nMake the most of Microsoft Azure for innovation, agility, cost savings and operational efficiency.\n\nScale flexibly and easily, no matter how large your database or user base is.\n\nGet fully managed, highly available datastores that minimize downtime, scale with your needs, and reduce risk.\n\nSimplify database management with expert support from Rackspace Technology. Our certified DBAs provide the expertise you need to optimize and secure your database environments.\n\nExperience peace of mind that your existing databases are expertly managed and performance optimized.\n\nHarness your data for a cloud native world\n\nRackspace Elastic Engineering for Data\n\nNo matter where you are in your data journey, Rackspace Technology Elastic Engineering for Data can help accelerate your path to data-driven innovation.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nLiberate Your IoT Data to Expand Your Business Opportunities\n\nLiberate Your IoT Data to Expand Your Business Opportunities\n\nGrow opportunities and customer connections with integrated data\n\nSimplifying the big data dilemma\n\nReady to start the conversation?\n\nReady to start the conversation?\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 76 |
+
"word_count": 679,
|
| 77 |
+
"crawled_at": "2025-11-25 09:07:58"
|
| 78 |
+
},
|
| 79 |
+
{
|
| 80 |
+
"url": "https://www.rackspace.com/applications",
|
| 81 |
+
"title": "Application Solutions & Managed Services | Rackspace Technology",
|
| 82 |
+
"content": "Business runs a little smoother when your applications are working their hardest for you. Rackspace Technology™ application management services help you make the most of what you use today while preparing your organization for the future.\n\nApplication Management and Operations\n\nOptimize your environment with expert administration, management and configuration so you can focus on staying ahead of the curve.\n\nGet expert guidance that helps you realize the benefits of modern applications and improve your return on investment.\n\nTap into the full benefits of the cloud by building applications with cloud native technologies, modern architectures and automated development workflows.\n\nLet our experts handle your email and productivity solutions from setup to ongoing support, so you can stay focused on your core business.\n\nCustomer Relationship Management\n\nMaximize your CRM platform so you can better serve your customers and transform your business.\n\nWe’ll ensure your applications are fast, secure and available — so you can get the most from your digital experience.\n\nYour ERP solution should work with where your business is now — and where it’s headed.\n\nLeverage our IoT and Edge experience to increase efficiencies and reduce time to market for your IoT adoption projects\n\nOptimization experts help you realize the full savings and business advantages of well-run SaaS applications.\n\nDiscover related technologies and platforms to help you achieve smarter business outcomes.\n\nGet an AEM solution that’s built for scalability, reliability and security — supported by a team of AEM experts.\n\nUnlock the power of intelligent CRM and ERP business applications in the cloud through connected platforms across Office 365 productivity applications and modern features that accelerate results.\n\nFind the right Microsoft 365 solutions for your specific business needs.\n\nWe help you adopt new cloud-based Oracle applications and manage legacy Oracle apps and databases faster and with less risk.\n\nAccelerate SAP adoption and optimization by engaging a team of certified experts and managed services across multicloud solutions that integrate across your environment.\n\nEssential tools and capabilities across a global platform, backed by Rackspace Application Services.\n\nDiscover how Microsoft SharePoint from Rackspace Technology can help transform the way your business gets work done.\n\nWe help you make the most of your digital experience platforms and turn more casual browsers into dedicated customers.\n\nDiscover a partnership that can help you achieve more – for your people, your business and your customers – today and into the future.\n\nConsulting and Advisory Services\n\nReady to start the conversation?\n\nReady to start the conversation?\n\nFill out the form to be connected to one of our experts.\n\nFill out the form to be connected to one of our experts.\n\nYou may withdraw your consent to receive additional information from Rackspace Technology at any time. Information collected in this form is subject to theRackspace Technology Privacy Notice.\n\nTo create a ticket or chat with a specialist regarding your account, log into your account.\n\nLegacy Datapipe One (Service Now)\n\nRackspace Sovereign Services UK\n\nRackspace Technology accelerates the value of the cloud during every phase of a customer’s digital transformation. Join us on our mission.",
|
| 83 |
+
"word_count": 505,
|
| 84 |
+
"crawled_at": "2025-11-25 09:08:01"
|
| 85 |
+
},
|
| 86 |
+
{
|
| 87 |
+
"url": "https://www.rackspace.com/blog/strengthening-healthcare-operations-through-cyber-resilience",
|
| 88 |
+
"title": "Strengthening Healthcare Operations Through Cyber Resilience",
|
| 89 |
+
"content": "by Rich Fletcher, Global Marketing Director – Healthcare, Rackspace Technology\n\nStrengthening Healthcare Operations Through Cyber ResilienceNovember 24th, 2025\n\nThe Hidden Complexity of Microsoft 365 Copilot and How to Get Ready for ItNovember 19th, 2025\n\nThe Hidden Complexity of Microsoft 365 Copilot and How to Get Ready for ItNovember 19th, 2025\n\nOvercoming Cloud Adoption Challenges in HealthcareNovember 18th, 2025\n\nWhy Azure Arc Is Essential for Hybrid SuccessNovember 17th, 2025\n\nCloud InsightsNovember 24th, 2025\n\nCloud Insights, ProductsNovember 19th, 2025\n\nCloud Insights, ProductsNovember 19th, 2025\n\nCloud InsightsNovember 18th, 2025\n\nCloud Insights, ProductsNovember 17th, 2025\n\nCyber resilience has become a defining capability in healthcare. Learn how resilient infrastructure protects data, sustains operations and preserves patient trust amid rising cyber threats.\n\nThe new foundation for healthcare stability\n\nHealthcare depends on trust. Every clinical workflow, every care decision and every patient interaction relies on systems that must stay available and secure. As your organization adopts more digital platforms, from EHRs and clinical imaging systems to connected medical devices and patient engagement tools, the stakes rise. That reality became clear in 2024 when more than 276 million patient records were exposed, showing how modern threats can overwhelm traditional security programs.\n\nYou already know cybersecurity is essential, but the challenge has shifted. Today, the question isn’t whether you prevent every attack. It’s whether your clinical operations continue when something goes wrong. Cyber resilience provides that capability. It gives you the operational strength to anticipate disruption, withstand impact and recover with confidence while protecting patient data and maintaining trust.\n\nMoving beyond compliance to operational resilience\n\nMany healthcare security programs were designed around compliance reporting and regulatory requirements. Compliance still matters, but the threat landscape demands a broader, operational model. You need an approach that connects security, continuity and patient safety into one coordinated strategy.\n\nThat’s the purpose of our new white paper,Strengthening Healthcare Operations Through Cyber Resilience. It breaks down the four pillars that help you build a stronger foundation:\n\n1. Business impact analysisUnderstand which systems and processes are essential to patient care, revenue and daily operations. When you know exactly what matters most, you can prioritize protection and recovery with greater precision.\n\n2. Enhanced business continuity planningMove from static binders to dynamic, well-tested plans. Effective continuity work identifies how systems fail, how quickly services must return and which dependencies need isolation or backup.\n\n3. Isolated recovery environments (IREs)Recover from a secure, standalone environment designed to remain safe even if production systems are compromised. IREs help you restore critical services faster and with greater integrity.\n\n4. Infrastructure as code (IaC)Use automated, repeatable configuration definitions to rebuild environments quickly and accurately. IaC reduces guesswork and speeds up recovery after an attack.\n\nTogether, these pillars form a resilience-driven operating model that strengthens every part of your digital ecosystem.\n\nWhy cyber resilience matters now\n\nThe financial impact of a breach is significant, with the average cost in healthcare reaching $10.1 million and roughly 21 days of downtime, but the operational consequences are even more urgent. When systems fail, patient care slows. Clinical workflows stall. Decisions take longer. That disruption can have real-world consequences.\n\nRecent studies show what you may already be seeing across the industry:\n\n70% of attacks delay patient services\n\n56% postpone diagnostic tests or procedures\n\n28% correlate with increased patient mortality\n\nThese numbers highlight the reason cyber resilience has become central to operational strategy. Downtime affects more than revenue or reputation. It affects clinical integrity, care delivery and patient outcomes. A resilient model helps you reduce that risk and strengthen your ability to operate confidently in a connected environment.\n\nBuilding a more resilient future\n\nResilience begins long before an incident. You can take meaningful steps now that build real strength across your organization:\n\nRun scenario-based simulations that mimic real-world attacks\n\nValidate continuity plans under pressure, not just on paper\n\nEstablish out-of-band communication channels that remain usable even if identity systems are compromised\n\nAutomate recovery processes to accelerate the path back to normal operations\n\nEach step builds more stability into your environment and gives your teams clarity during high-stress moments. When your security, governance and operations functions align around a resilience-focused approach, you create a health system that can protect patients, preserve uptime and sustain trust, even during disruption.\n\nCyber resilience doesn’t replace cybersecurity. It elevates it. It helps you deliver dependable care with greater confidence and positions your organization for long-term stability in a world where digital systems are inseparable from clinical operations.\n\nRead the fullwhite paperto explore how a resilience-based strategy supports secure, uninterrupted healthcare delivery.",
|
| 90 |
+
"word_count": 747,
|
| 91 |
+
"crawled_at": "2025-11-25 09:08:02"
|
| 92 |
+
}
|
| 93 |
+
]
|
rackspace_knowledge_complete.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
rackspace_knowledge_enhanced.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
rackspace_knowledge_from_raw.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
rag_chatbot.py
ADDED
|
@@ -0,0 +1,279 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
RAG (Retrieval-Augmented Generation) pipeline with Groq API
|
| 3 |
+
Combines vector database retrieval with Groq LLM generation
|
| 4 |
+
"""
|
| 5 |
+
from groq import Groq
|
| 6 |
+
import chromadb
|
| 7 |
+
from chromadb.config import Settings
|
| 8 |
+
from sentence_transformers import SentenceTransformer
|
| 9 |
+
from typing import List, Tuple, Dict
|
| 10 |
+
import logging
|
| 11 |
+
from pathlib import Path
|
| 12 |
+
from config import (
|
| 13 |
+
VECTOR_DB_DIR,
|
| 14 |
+
EMBEDDING_MODEL,
|
| 15 |
+
GROQ_API_KEY,
|
| 16 |
+
GROQ_MODEL,
|
| 17 |
+
MAX_NEW_TOKENS,
|
| 18 |
+
TEMPERATURE,
|
| 19 |
+
TOP_P,
|
| 20 |
+
MAX_HISTORY_LENGTH,
|
| 21 |
+
TOP_K_RETRIEVAL
|
| 22 |
+
)
|
| 23 |
+
|
| 24 |
+
logging.basicConfig(level=logging.INFO)
|
| 25 |
+
logger = logging.getLogger(__name__)
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
class ConversationHistory:
|
| 29 |
+
"""Manages conversation history for context-aware responses"""
|
| 30 |
+
|
| 31 |
+
def __init__(self, max_length: int = MAX_HISTORY_LENGTH):
|
| 32 |
+
self.max_length = max_length
|
| 33 |
+
self.history = []
|
| 34 |
+
|
| 35 |
+
def add_turn(self, user_message: str, bot_response: str):
|
| 36 |
+
"""Add a conversation turn"""
|
| 37 |
+
self.history.append({
|
| 38 |
+
'user': user_message,
|
| 39 |
+
'assistant': bot_response
|
| 40 |
+
})
|
| 41 |
+
|
| 42 |
+
# Keep only last N turns
|
| 43 |
+
if len(self.history) > self.max_length:
|
| 44 |
+
self.history = self.history[-self.max_length:]
|
| 45 |
+
|
| 46 |
+
def get_history_text(self) -> str:
|
| 47 |
+
"""Get formatted history for context"""
|
| 48 |
+
if not self.history:
|
| 49 |
+
return ""
|
| 50 |
+
|
| 51 |
+
history_text = "Previous conversation:\n"
|
| 52 |
+
for i, turn in enumerate(self.history, 1):
|
| 53 |
+
history_text += f"User: {turn['user']}\n"
|
| 54 |
+
history_text += f"Assistant: {turn['assistant']}\n"
|
| 55 |
+
|
| 56 |
+
return history_text
|
| 57 |
+
|
| 58 |
+
def get_last_user_message(self, n: int = 1) -> List[str]:
|
| 59 |
+
"""Get last n user messages"""
|
| 60 |
+
return [turn['user'] for turn in self.history[-n:]]
|
| 61 |
+
|
| 62 |
+
def clear(self):
|
| 63 |
+
"""Clear conversation history"""
|
| 64 |
+
self.history = []
|
| 65 |
+
|
| 66 |
+
def to_dict(self) -> List[Dict]:
|
| 67 |
+
"""Convert to dictionary format"""
|
| 68 |
+
return self.history.copy()
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
class RAGChatbot:
|
| 72 |
+
"""RAG-based chatbot with Groq API"""
|
| 73 |
+
|
| 74 |
+
def __init__(self):
|
| 75 |
+
logger.info("🤖 Initializing RAG Chatbot with Groq...")
|
| 76 |
+
|
| 77 |
+
# Initialize Groq client
|
| 78 |
+
if not GROQ_API_KEY:
|
| 79 |
+
raise ValueError("GROQ_API_KEY not found! Please set it in .env file")
|
| 80 |
+
|
| 81 |
+
self.groq_client = Groq(api_key=GROQ_API_KEY)
|
| 82 |
+
self.groq_model = GROQ_MODEL
|
| 83 |
+
logger.info(f"✅ Using Groq model: {self.groq_model}")
|
| 84 |
+
|
| 85 |
+
# Load vector database
|
| 86 |
+
logger.info("📚 Loading vector database...")
|
| 87 |
+
self.client = chromadb.PersistentClient(
|
| 88 |
+
path=str(VECTOR_DB_DIR),
|
| 89 |
+
settings=Settings(anonymized_telemetry=False)
|
| 90 |
+
)
|
| 91 |
+
self.collection = self.client.get_collection("rackspace_knowledge")
|
| 92 |
+
|
| 93 |
+
# Load embedding model
|
| 94 |
+
logger.info(f"🔤 Loading embedding model: {EMBEDDING_MODEL}")
|
| 95 |
+
self.embedding_model = SentenceTransformer(EMBEDDING_MODEL)
|
| 96 |
+
|
| 97 |
+
# Initialize conversation history
|
| 98 |
+
self.conversation = ConversationHistory()
|
| 99 |
+
|
| 100 |
+
logger.info("✅ RAG Chatbot ready!")
|
| 101 |
+
|
| 102 |
+
def retrieve_context(self, query: str, top_k: int = TOP_K_RETRIEVAL) -> str:
|
| 103 |
+
"""Retrieve relevant context from vector database"""
|
| 104 |
+
# Generate query embedding
|
| 105 |
+
query_embedding = self.embedding_model.encode([query])[0]
|
| 106 |
+
|
| 107 |
+
# Search vector database
|
| 108 |
+
results = self.collection.query(
|
| 109 |
+
query_embeddings=[query_embedding.tolist()],
|
| 110 |
+
n_results=top_k
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
if not results or not results['documents'][0]:
|
| 114 |
+
return ""
|
| 115 |
+
|
| 116 |
+
# Combine retrieved documents
|
| 117 |
+
context_parts = []
|
| 118 |
+
for i, doc in enumerate(results['documents'][0], 1):
|
| 119 |
+
context_parts.append(f"[Source {i}]: {doc}")
|
| 120 |
+
|
| 121 |
+
context = "\n\n".join(context_parts)
|
| 122 |
+
return context
|
| 123 |
+
|
| 124 |
+
def build_prompt(self, user_message: str, context: str) -> str:
|
| 125 |
+
"""Build optimized prompt with history and context for accurate responses"""
|
| 126 |
+
# Get conversation history
|
| 127 |
+
history_text = self.conversation.get_history_text()
|
| 128 |
+
|
| 129 |
+
# Enhanced prompt engineering for accuracy and user-friendliness
|
| 130 |
+
prompt = "<|system|>\n"
|
| 131 |
+
prompt += "You are a Rackspace Technology expert. Answer questions using ONLY the information provided in the context below.\n\n"
|
| 132 |
+
prompt += "CRITICAL RULES:\n"
|
| 133 |
+
prompt += "1. Use ONLY facts from the CONTEXT section below - do not make up information\n"
|
| 134 |
+
prompt += "2. If the context doesn't contain the answer, say 'I don't have specific information about that in my knowledge base'\n"
|
| 135 |
+
prompt += "3. Be direct and concise - answer in 2-4 sentences maximum\n"
|
| 136 |
+
prompt += "4. Do not repeat phrases or generate lists unless they are in the context\n"
|
| 137 |
+
prompt += "5. Quote specific facts from the context when possible\n\n"
|
| 138 |
+
|
| 139 |
+
if context:
|
| 140 |
+
prompt += f"CONTEXT (Your ONLY source of information):\n{context}\n\n"
|
| 141 |
+
else:
|
| 142 |
+
prompt += "CONTEXT: No relevant information found.\n\n"
|
| 143 |
+
|
| 144 |
+
if history_text:
|
| 145 |
+
prompt += f"PREVIOUS CONVERSATION:\n{history_text}\n"
|
| 146 |
+
|
| 147 |
+
prompt += f"USER QUESTION: {user_message}\n\n"
|
| 148 |
+
prompt += "<|assistant|>\n"
|
| 149 |
+
|
| 150 |
+
return prompt
|
| 151 |
+
|
| 152 |
+
def generate_response(self, prompt: str) -> str:
|
| 153 |
+
"""Generate response using Groq API"""
|
| 154 |
+
try:
|
| 155 |
+
chat_completion = self.groq_client.chat.completions.create(
|
| 156 |
+
messages=[
|
| 157 |
+
{
|
| 158 |
+
"role": "system",
|
| 159 |
+
"content": "You are a Rackspace Technology expert. Answer questions using ONLY the information provided in the context. Be direct and concise - answer in 2-4 sentences maximum."
|
| 160 |
+
},
|
| 161 |
+
{
|
| 162 |
+
"role": "user",
|
| 163 |
+
"content": prompt
|
| 164 |
+
}
|
| 165 |
+
],
|
| 166 |
+
model=self.groq_model,
|
| 167 |
+
temperature=0.1,
|
| 168 |
+
max_tokens=MAX_NEW_TOKENS,
|
| 169 |
+
top_p=TOP_P,
|
| 170 |
+
)
|
| 171 |
+
|
| 172 |
+
response = chat_completion.choices[0].message.content
|
| 173 |
+
return response.strip()
|
| 174 |
+
|
| 175 |
+
except Exception as e:
|
| 176 |
+
logger.error(f"Groq API error: {e}")
|
| 177 |
+
return "I'm having trouble generating a response right now. Please try again."
|
| 178 |
+
|
| 179 |
+
def chat(self, user_message: str) -> str:
|
| 180 |
+
"""Main chat function with RAG and history"""
|
| 181 |
+
# Check if user is asking about conversation history
|
| 182 |
+
history_keywords = ['what did i ask', 'what was my question', 'previous question',
|
| 183 |
+
'earlier question', 'first question', 'asked before']
|
| 184 |
+
|
| 185 |
+
if any(keyword in user_message.lower() for keyword in history_keywords):
|
| 186 |
+
# Return from history
|
| 187 |
+
if self.conversation.history:
|
| 188 |
+
last_messages = self.conversation.get_last_user_message(n=len(self.conversation.history))
|
| 189 |
+
if 'first' in user_message.lower():
|
| 190 |
+
return f"Your first question was: {last_messages[0]}"
|
| 191 |
+
else:
|
| 192 |
+
return f"Your previous question was: {last_messages[-1]}"
|
| 193 |
+
else:
|
| 194 |
+
return "We haven't had any previous conversation yet."
|
| 195 |
+
|
| 196 |
+
# Retrieve relevant context
|
| 197 |
+
logger.info(f"User: {user_message}")
|
| 198 |
+
context = self.retrieve_context(user_message)
|
| 199 |
+
|
| 200 |
+
# If no context found, return helpful message
|
| 201 |
+
if not context or len(context.strip()) < 50:
|
| 202 |
+
response = "I don't have specific information about that in my Rackspace knowledge base. Could you try rephrasing your question or ask about Rackspace's services, mission, or cloud platforms?"
|
| 203 |
+
self.conversation.add_turn(user_message, response)
|
| 204 |
+
logger.info(f"Assistant: {response}")
|
| 205 |
+
return response
|
| 206 |
+
|
| 207 |
+
# Extract key sentences from context (extractive approach)
|
| 208 |
+
# This is more reliable than generative for base models
|
| 209 |
+
sentences = []
|
| 210 |
+
for line in context.split('\n'):
|
| 211 |
+
line = line.strip()
|
| 212 |
+
if line and len(line) > 30 and not line.startswith('[Source'):
|
| 213 |
+
# Clean up the line
|
| 214 |
+
if ':' in line:
|
| 215 |
+
line = line.split(':', 1)[1].strip()
|
| 216 |
+
sentences.append(line)
|
| 217 |
+
|
| 218 |
+
# Take first 2-3 most relevant sentences
|
| 219 |
+
if sentences:
|
| 220 |
+
response = ' '.join(sentences[:3])
|
| 221 |
+
# Clean up
|
| 222 |
+
if len(response) > 400:
|
| 223 |
+
response = response[:400] + '...'
|
| 224 |
+
else:
|
| 225 |
+
# Fallback to generation if extraction fails
|
| 226 |
+
prompt = self.build_prompt(user_message, context)
|
| 227 |
+
response = self.generate_response(prompt)
|
| 228 |
+
|
| 229 |
+
logger.info(f"Assistant: {response}")
|
| 230 |
+
|
| 231 |
+
# Add to conversation history
|
| 232 |
+
self.conversation.add_turn(user_message, response)
|
| 233 |
+
|
| 234 |
+
return response
|
| 235 |
+
|
| 236 |
+
def reset_conversation(self):
|
| 237 |
+
"""Reset conversation history"""
|
| 238 |
+
self.conversation.clear()
|
| 239 |
+
logger.info("Conversation history cleared")
|
| 240 |
+
|
| 241 |
+
def get_conversation_history(self) -> List[Dict]:
|
| 242 |
+
"""Get current conversation history"""
|
| 243 |
+
return self.conversation.to_dict()
|
| 244 |
+
|
| 245 |
+
|
| 246 |
+
def main():
|
| 247 |
+
"""Test the RAG chatbot"""
|
| 248 |
+
# Initialize chatbot (will use base model if fine-tuned not available)
|
| 249 |
+
chatbot = RAGChatbot(use_base_model=False)
|
| 250 |
+
|
| 251 |
+
# Test conversation with history
|
| 252 |
+
test_queries = [
|
| 253 |
+
"What is Rackspace?",
|
| 254 |
+
"What is their mission?",
|
| 255 |
+
"What did I ask first?"
|
| 256 |
+
]
|
| 257 |
+
|
| 258 |
+
print(f"\n{'='*80}")
|
| 259 |
+
print("Testing RAG Chatbot with Conversation History")
|
| 260 |
+
print(f"{'='*80}\n")
|
| 261 |
+
|
| 262 |
+
for query in test_queries:
|
| 263 |
+
print(f"User: {query}")
|
| 264 |
+
response = chatbot.chat(query)
|
| 265 |
+
print(f"Bot: {response}\n")
|
| 266 |
+
print("-" * 80 + "\n")
|
| 267 |
+
|
| 268 |
+
# Show conversation history
|
| 269 |
+
print("\nConversation History:")
|
| 270 |
+
print(f"{'='*80}")
|
| 271 |
+
history = chatbot.get_conversation_history()
|
| 272 |
+
for i, turn in enumerate(history, 1):
|
| 273 |
+
print(f"\nTurn {i}:")
|
| 274 |
+
print(f" User: {turn['user']}")
|
| 275 |
+
print(f" Assistant: {turn['assistant']}")
|
| 276 |
+
|
| 277 |
+
|
| 278 |
+
if __name__ == "__main__":
|
| 279 |
+
main()
|
rebuild_rag_system.py
ADDED
|
@@ -0,0 +1,318 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
COMPLETE RAG SYSTEM REBUILD
|
| 3 |
+
Fixes hallucination issues by:
|
| 4 |
+
1. Crawling high-quality content from key Rackspace pages
|
| 5 |
+
2. Extracting ONLY main article content (no navigation/menus)
|
| 6 |
+
3. Creating proper embeddings with clean chunks
|
| 7 |
+
4. Building correct Vector DB for retrieval
|
| 8 |
+
|
| 9 |
+
This addresses the root cause: poor data quality → bad embeddings → hallucinations
|
| 10 |
+
"""
|
| 11 |
+
|
| 12 |
+
import requests
|
| 13 |
+
from bs4 import BeautifulSoup
|
| 14 |
+
import json
|
| 15 |
+
import time
|
| 16 |
+
from pathlib import Path
|
| 17 |
+
from typing import List, Dict
|
| 18 |
+
import re
|
| 19 |
+
import chromadb
|
| 20 |
+
from chromadb.config import Settings
|
| 21 |
+
from sentence_transformers import SentenceTransformer
|
| 22 |
+
from urllib.parse import urljoin
|
| 23 |
+
|
| 24 |
+
# Paths
|
| 25 |
+
DATA_DIR = Path("data")
|
| 26 |
+
VECTOR_DB_DIR = Path("vector_db")
|
| 27 |
+
DATA_DIR.mkdir(exist_ok=True)
|
| 28 |
+
|
| 29 |
+
print("="*80)
|
| 30 |
+
print("REBUILDING RAG SYSTEM FROM SCRATCH")
|
| 31 |
+
print("="*80)
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
# Step 1: Define HIGH-VALUE URLs with actual content
|
| 35 |
+
HIGH_VALUE_URLS = [
|
| 36 |
+
# Cloud Services
|
| 37 |
+
'https://www.rackspace.com/cloud/cloud-migration',
|
| 38 |
+
'https://www.rackspace.com/cloud/aws',
|
| 39 |
+
'https://www.rackspace.com/cloud/azure',
|
| 40 |
+
'https://www.rackspace.com/cloud/google-cloud',
|
| 41 |
+
'https://www.rackspace.com/cloud/multi-cloud',
|
| 42 |
+
'https://www.rackspace.com/cloud/private-cloud',
|
| 43 |
+
|
| 44 |
+
# Security
|
| 45 |
+
'https://www.rackspace.com/security',
|
| 46 |
+
'https://www.rackspace.com/security/data-security',
|
| 47 |
+
'https://www.rackspace.com/security/compliance',
|
| 48 |
+
|
| 49 |
+
# Managed Services
|
| 50 |
+
'https://www.rackspace.com/managed-hosting',
|
| 51 |
+
'https://www.rackspace.com/managed-kubernetes',
|
| 52 |
+
'https://www.rackspace.com/managed-aws',
|
| 53 |
+
'https://www.rackspace.com/managed-azure',
|
| 54 |
+
|
| 55 |
+
# Professional Services
|
| 56 |
+
'https://www.rackspace.com/professional-services',
|
| 57 |
+
'https://www.rackspace.com/data',
|
| 58 |
+
'https://www.rackspace.com/applications',
|
| 59 |
+
|
| 60 |
+
# Specific blogs (if you had URLs)
|
| 61 |
+
'https://www.rackspace.com/blog/strengthening-healthcare-operations-through-cyber-resilience',
|
| 62 |
+
]
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
def extract_clean_content(html: str, url: str) -> Dict:
|
| 66 |
+
"""Extract ONLY main content, filter out navigation"""
|
| 67 |
+
soup = BeautifulSoup(html, 'html.parser')
|
| 68 |
+
|
| 69 |
+
# Remove noise
|
| 70 |
+
for element in soup(['script', 'style', 'nav', 'footer', 'header',
|
| 71 |
+
'aside', 'iframe', 'form', 'button']):
|
| 72 |
+
element.decompose()
|
| 73 |
+
|
| 74 |
+
# Remove navigation classes
|
| 75 |
+
for nav_class in soup.find_all(class_=re.compile('nav|menu|sidebar|footer|header', re.I)):
|
| 76 |
+
nav_class.decompose()
|
| 77 |
+
|
| 78 |
+
# Find main content
|
| 79 |
+
main = (soup.find('main') or
|
| 80 |
+
soup.find('article') or
|
| 81 |
+
soup.find('div', class_=re.compile('content|article', re.I)) or
|
| 82 |
+
soup.body)
|
| 83 |
+
|
| 84 |
+
if not main:
|
| 85 |
+
return None
|
| 86 |
+
|
| 87 |
+
# Extract title
|
| 88 |
+
h1 = soup.find('h1')
|
| 89 |
+
title = h1.get_text(strip=True) if h1 else soup.find('title').get_text(strip=True) if soup.find('title') else url
|
| 90 |
+
|
| 91 |
+
# Extract ONLY paragraphs and headings (real content)
|
| 92 |
+
content_parts = []
|
| 93 |
+
for elem in main.find_all(['p', 'h2', 'h3', 'li']):
|
| 94 |
+
text = elem.get_text(strip=True)
|
| 95 |
+
text = re.sub(r'\s+', ' ', text)
|
| 96 |
+
|
| 97 |
+
# Skip short fragments and navigation text
|
| 98 |
+
if len(text) > 30 and not any(skip in text.lower() for skip in ['click here', 'learn more', 'view all', 'home']):
|
| 99 |
+
content_parts.append(text)
|
| 100 |
+
|
| 101 |
+
if len(content_parts) < 3: # Need at least 3 substantial paragraphs
|
| 102 |
+
return None
|
| 103 |
+
|
| 104 |
+
content = '\n\n'.join(content_parts)
|
| 105 |
+
word_count = len(content.split())
|
| 106 |
+
|
| 107 |
+
if word_count < 100: # Too short
|
| 108 |
+
return None
|
| 109 |
+
|
| 110 |
+
return {
|
| 111 |
+
'url': url,
|
| 112 |
+
'title': title,
|
| 113 |
+
'content': content,
|
| 114 |
+
'word_count': word_count,
|
| 115 |
+
'crawled_at': time.strftime('%Y-%m-%d %H:%M:%S')
|
| 116 |
+
}
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
def crawl_urls(urls: List[str]) -> List[Dict]:
|
| 120 |
+
"""Crawl URLs and extract clean content"""
|
| 121 |
+
print("\n📡 STEP 1: CRAWLING HIGH-VALUE URLS")
|
| 122 |
+
print("-" * 80)
|
| 123 |
+
|
| 124 |
+
session = requests.Session()
|
| 125 |
+
session.headers.update({
|
| 126 |
+
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
|
| 127 |
+
})
|
| 128 |
+
|
| 129 |
+
data = []
|
| 130 |
+
for i, url in enumerate(urls, 1):
|
| 131 |
+
try:
|
| 132 |
+
print(f"[{i}/{len(urls)}] Crawling: {url}")
|
| 133 |
+
response = session.get(url, timeout=10)
|
| 134 |
+
response.raise_for_status()
|
| 135 |
+
|
| 136 |
+
content_data = extract_clean_content(response.text, url)
|
| 137 |
+
|
| 138 |
+
if content_data:
|
| 139 |
+
data.append(content_data)
|
| 140 |
+
print(f" ✅ Extracted {content_data['word_count']} words")
|
| 141 |
+
else:
|
| 142 |
+
print(f" ❌ No quality content found")
|
| 143 |
+
|
| 144 |
+
time.sleep(0.5) # Be nice
|
| 145 |
+
|
| 146 |
+
except Exception as e:
|
| 147 |
+
print(f" ❌ Error: {e}")
|
| 148 |
+
continue
|
| 149 |
+
|
| 150 |
+
print(f"\n✅ Successfully crawled {len(data)} pages with quality content")
|
| 151 |
+
return data
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
def save_data(data: List[Dict], filename: str = 'rackspace_knowledge_clean.json'):
|
| 155 |
+
"""Save crawled data"""
|
| 156 |
+
output_path = DATA_DIR / filename
|
| 157 |
+
|
| 158 |
+
with open(output_path, 'w', encoding='utf-8') as f:
|
| 159 |
+
json.dump(data, f, indent=2, ensure_ascii=False)
|
| 160 |
+
|
| 161 |
+
total_words = sum(d['word_count'] for d in data)
|
| 162 |
+
print(f"\n💾 Saved to: {output_path}")
|
| 163 |
+
print(f" Documents: {len(data)}")
|
| 164 |
+
print(f" Total words: {total_words:,}")
|
| 165 |
+
print(f" Avg words/doc: {total_words // len(data) if data else 0}")
|
| 166 |
+
|
| 167 |
+
return output_path
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
def create_proper_chunks(text: str, chunk_size: int = 200, overlap: int = 50) -> List[str]:
|
| 171 |
+
"""Create overlapping chunks from text"""
|
| 172 |
+
words = text.split()
|
| 173 |
+
chunks = []
|
| 174 |
+
|
| 175 |
+
for i in range(0, len(words), chunk_size - overlap):
|
| 176 |
+
chunk = ' '.join(words[i:i + chunk_size])
|
| 177 |
+
if len(chunk.split()) >= 50: # Minimum 50 words per chunk
|
| 178 |
+
chunks.append(chunk)
|
| 179 |
+
|
| 180 |
+
return chunks
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
def build_vector_db(data: List[Dict]):
|
| 184 |
+
"""Build proper vector database with clean embeddings"""
|
| 185 |
+
print("\n🔧 STEP 2: BUILDING VECTOR DATABASE")
|
| 186 |
+
print("-" * 80)
|
| 187 |
+
|
| 188 |
+
# Initialize ChromaDB
|
| 189 |
+
client = chromadb.PersistentClient(
|
| 190 |
+
path=str(VECTOR_DB_DIR),
|
| 191 |
+
settings=Settings(anonymized_telemetry=False)
|
| 192 |
+
)
|
| 193 |
+
|
| 194 |
+
# Delete old collection
|
| 195 |
+
try:
|
| 196 |
+
client.delete_collection("rackspace_knowledge")
|
| 197 |
+
print("🗑️ Deleted old collection")
|
| 198 |
+
except:
|
| 199 |
+
pass
|
| 200 |
+
|
| 201 |
+
# Create new collection
|
| 202 |
+
collection = client.create_collection(
|
| 203 |
+
name="rackspace_knowledge",
|
| 204 |
+
metadata={"description": "Clean Rackspace knowledge - main content only"}
|
| 205 |
+
)
|
| 206 |
+
|
| 207 |
+
# Load embedding model
|
| 208 |
+
print("📦 Loading embedding model...")
|
| 209 |
+
embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
|
| 210 |
+
|
| 211 |
+
# Process documents into chunks
|
| 212 |
+
print("✂️ Creating chunks...")
|
| 213 |
+
all_chunks = []
|
| 214 |
+
all_metadatas = []
|
| 215 |
+
all_ids = []
|
| 216 |
+
|
| 217 |
+
chunk_id = 0
|
| 218 |
+
for doc in data:
|
| 219 |
+
# Create chunks from content
|
| 220 |
+
chunks = create_proper_chunks(doc['content'])
|
| 221 |
+
|
| 222 |
+
for chunk in chunks:
|
| 223 |
+
all_chunks.append(chunk)
|
| 224 |
+
all_metadatas.append({
|
| 225 |
+
'url': doc['url'],
|
| 226 |
+
'title': doc['title'],
|
| 227 |
+
'source': 'document'
|
| 228 |
+
})
|
| 229 |
+
all_ids.append(f"chunk_{chunk_id}")
|
| 230 |
+
chunk_id += 1
|
| 231 |
+
|
| 232 |
+
print(f" Created {len(all_chunks)} chunks from {len(data)} documents")
|
| 233 |
+
|
| 234 |
+
# Generate embeddings
|
| 235 |
+
print("🧮 Generating embeddings...")
|
| 236 |
+
embeddings = embedding_model.encode(
|
| 237 |
+
all_chunks,
|
| 238 |
+
show_progress_bar=True,
|
| 239 |
+
convert_to_numpy=True
|
| 240 |
+
)
|
| 241 |
+
|
| 242 |
+
# Add to ChromaDB
|
| 243 |
+
print("💾 Adding to vector database...")
|
| 244 |
+
collection.add(
|
| 245 |
+
embeddings=embeddings.tolist(),
|
| 246 |
+
documents=all_chunks,
|
| 247 |
+
metadatas=all_metadatas,
|
| 248 |
+
ids=all_ids
|
| 249 |
+
)
|
| 250 |
+
|
| 251 |
+
print(f"\n✅ Vector DB built successfully!")
|
| 252 |
+
print(f" Total chunks indexed: {len(all_chunks)}")
|
| 253 |
+
print(f" Database location: {VECTOR_DB_DIR}")
|
| 254 |
+
|
| 255 |
+
|
| 256 |
+
def test_retrieval():
|
| 257 |
+
"""Test vector DB retrieval"""
|
| 258 |
+
print("\n🧪 STEP 3: TESTING RETRIEVAL")
|
| 259 |
+
print("-" * 80)
|
| 260 |
+
|
| 261 |
+
client = chromadb.PersistentClient(path=str(VECTOR_DB_DIR))
|
| 262 |
+
collection = client.get_collection("rackspace_knowledge")
|
| 263 |
+
embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
|
| 264 |
+
|
| 265 |
+
test_queries = [
|
| 266 |
+
"cloud migration services",
|
| 267 |
+
"AWS managed services",
|
| 268 |
+
"healthcare cyber resilience"
|
| 269 |
+
]
|
| 270 |
+
|
| 271 |
+
for query in test_queries:
|
| 272 |
+
print(f"\n🔍 Query: '{query}'")
|
| 273 |
+
query_embedding = embedding_model.encode([query])[0]
|
| 274 |
+
|
| 275 |
+
results = collection.query(
|
| 276 |
+
query_embeddings=[query_embedding.tolist()],
|
| 277 |
+
n_results=3
|
| 278 |
+
)
|
| 279 |
+
|
| 280 |
+
for i, (doc, meta) in enumerate(zip(results['documents'][0], results['metadatas'][0]), 1):
|
| 281 |
+
print(f"\n Result {i}:")
|
| 282 |
+
print(f" URL: {meta['url']}")
|
| 283 |
+
print(f" Content: {doc[:150]}...")
|
| 284 |
+
|
| 285 |
+
|
| 286 |
+
def main():
|
| 287 |
+
"""Main execution"""
|
| 288 |
+
print("\n" + "="*80)
|
| 289 |
+
print("STARTING RAG SYSTEM REBUILD")
|
| 290 |
+
print("="*80)
|
| 291 |
+
|
| 292 |
+
# Step 1: Crawl clean data
|
| 293 |
+
data = crawl_urls(HIGH_VALUE_URLS)
|
| 294 |
+
|
| 295 |
+
if not data:
|
| 296 |
+
print("\n❌ No data collected! Please check URLs and network connection.")
|
| 297 |
+
return
|
| 298 |
+
|
| 299 |
+
# Save data
|
| 300 |
+
save_data(data)
|
| 301 |
+
|
| 302 |
+
# Step 2: Build vector DB
|
| 303 |
+
build_vector_db(data)
|
| 304 |
+
|
| 305 |
+
# Step 3: Test retrieval
|
| 306 |
+
test_retrieval()
|
| 307 |
+
|
| 308 |
+
print("\n" + "="*80)
|
| 309 |
+
print("✅ RAG SYSTEM REBUILD COMPLETE!")
|
| 310 |
+
print("="*80)
|
| 311 |
+
print("\nNext steps:")
|
| 312 |
+
print("1. Restart Streamlit: streamlit run streamlit_app.py")
|
| 313 |
+
print("2. Test with queries about cloud migration and healthcare")
|
| 314 |
+
print("3. Verify responses use actual content (no more hallucinations!)")
|
| 315 |
+
|
| 316 |
+
|
| 317 |
+
if __name__ == "__main__":
|
| 318 |
+
main()
|
requirements.txt
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=4.7.0
|
| 2 |
+
chromadb>=0.4.15
|
| 3 |
+
faiss-cpu>=1.7.4
|
| 4 |
+
sentence-transformers>=2.2.2
|
| 5 |
+
groq>=0.4.0
|
| 6 |
+
beautifulsoup4>=4.12.0
|
| 7 |
+
requests>=2.31.0
|
| 8 |
+
selenium>=4.15.0
|
| 9 |
+
lxml>=4.9.3
|
| 10 |
+
flask>=3.0.0
|
| 11 |
+
numpy>=1.24.0
|
| 12 |
+
pandas>=2.0.0
|
| 13 |
+
tqdm>=4.66.0
|
| 14 |
+
python-dotenv>=1.0.0
|
| 15 |
+
nltk>=3.8.1
|
| 16 |
+
langchain>=0.1.0
|
| 17 |
+
langchain-community>=0.0.10
|
test_groq.py
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Test script to verify Groq API integration
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
from groq import Groq
|
| 6 |
+
|
| 7 |
+
def test_groq_api():
|
| 8 |
+
"""Test basic Groq API functionality"""
|
| 9 |
+
print("Testing Groq API connection...")
|
| 10 |
+
|
| 11 |
+
# Check for API key
|
| 12 |
+
api_key = os.environ.get("GROQ_API_KEY")
|
| 13 |
+
if not api_key:
|
| 14 |
+
print("❌ GROQ_API_KEY environment variable not set!")
|
| 15 |
+
print("\nTo set it, run:")
|
| 16 |
+
print("export GROQ_API_KEY='your_api_key_here'")
|
| 17 |
+
return False
|
| 18 |
+
|
| 19 |
+
print(f"✅ API Key found: {api_key[:10]}...")
|
| 20 |
+
|
| 21 |
+
try:
|
| 22 |
+
# Initialize client
|
| 23 |
+
client = Groq(api_key=api_key)
|
| 24 |
+
print("✅ Groq client initialized")
|
| 25 |
+
|
| 26 |
+
# Test a simple completion
|
| 27 |
+
print("\nTesting chat completion...")
|
| 28 |
+
chat_completion = client.chat.completions.create(
|
| 29 |
+
messages=[
|
| 30 |
+
{
|
| 31 |
+
"role": "user",
|
| 32 |
+
"content": "Say 'Hello from Groq!' if you can hear me.",
|
| 33 |
+
}
|
| 34 |
+
],
|
| 35 |
+
model="openai/gpt-oss-120b",
|
| 36 |
+
)
|
| 37 |
+
|
| 38 |
+
response = chat_completion.choices[0].message.content
|
| 39 |
+
print(f"✅ Response received: {response}")
|
| 40 |
+
|
| 41 |
+
return True
|
| 42 |
+
|
| 43 |
+
except Exception as e:
|
| 44 |
+
print(f"❌ Error: {e}")
|
| 45 |
+
return False
|
| 46 |
+
|
| 47 |
+
if __name__ == "__main__":
|
| 48 |
+
success = test_groq_api()
|
| 49 |
+
if success:
|
| 50 |
+
print("\n🎉 Groq API integration successful!")
|
| 51 |
+
print("\nYou can now use the chatbot with:")
|
| 52 |
+
print(" python enhanced_rag_chatbot.py")
|
| 53 |
+
print(" or")
|
| 54 |
+
print(" streamlit run streamlit_app.py")
|
| 55 |
+
else:
|
| 56 |
+
print("\n⚠️ Please fix the errors above and try again.")
|
training_data.jsonl
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
training_data_enhanced.jsonl
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
training_qa_pairs.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
training_qa_pairs_enhanced.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|