Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.55.0
metadata
title: AB Testing RAG Agent
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.32.0
app_file: streamlit_app.py
pinned: false
AB Testing RAG Agent
This repository contains a Retrieval Augmented Generation (RAG) agent specialized in A/B Testing that:
- Answers questions about A/B Testing using a collection of Ron Kohavi's work
- Automatically searches ArXiv for academic papers when needed for better responses
- Preserves privacy by pre-processing PDFs locally and only deploying processed data
Features
- Interactive chat interface built with Streamlit
- Vector search using Qdrant with OpenAI embeddings
- Two-tier approach:
- Initial RAG search for efficiency
- Advanced agent with tools for complex questions
- Smart source handling and deduplication
- ArXiv integration
Quick Start
Local Development
- Clone this repository:
git clone https://github.com/yourusername/AB_Testing_RAG_Agent.git
cd AB_Testing_RAG_Agent
- Install dependencies:
pip install -r requirements.txt
- Create a
.envfile with your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here
- Process your PDF files (only needed once):
python scripts/preprocess_data.py
- Run the Streamlit app:
streamlit run streamlit_app.py
Docker Deployment
- Build the Docker image:
docker build -t ab-testing-rag-agent .
- Run the container:
docker run -p 8000:8000 -e OPENAI_API_KEY=your_openai_api_key_here ab-testing-rag-agent
Deployment to Hugging Face
- Prepare for deployment (check if all required files are ready):
python scripts/prepare_for_deployment.py
- Push to your Hugging Face Space:
# Initialize git repository if not already done
git init
git add .
git commit -m "Initial commit"
# Add Hugging Face Space remote
git remote add hf https://huggingface.co/spaces/yourusername/ab-testing-rag
# Push to Hugging Face
git push hf main
- Set both required environment variables in the Hugging Face Space settings:
OPENAI_API_KEY: Your OpenAI API keyHF_TOKEN: Your Hugging Face token with access to the dataset
Setting Up The PDF Dataset on Hugging Face
The deployment uses PDFs stored in a separate Hugging Face dataset repo. To set up your own:
Create a dataset repository on Hugging Face called
yourusername/ab_testing_pdfsUpload all your PDF files to this repository via the Hugging Face UI or git:
git clone https://huggingface.co/datasets/yourusername/ab_testing_pdfs cd ab_testing_pdfs cp /path/to/your/pdfs/*.pdf . git add . git commit -m "Add AB Testing PDFs" git pushUpdate the dataset name in
download_pdfs.pyif you used a different repository nameMake sure your
HF_TOKENhas read access to this dataset repository
Architecture
- Pre-processing Pipeline: PDF files are processed locally, converted to embeddings, and stored in a vector database
- Retrieval System: Uses OpenAI's text-embedding-3-small model and Qdrant for vector search
- Response Generation:
- Initial attempt with gpt-4.1-mini for efficiency
- Falls back to gpt-4.1 with tools for complex queries
- ArXiv Integration: Searches academic papers when necessary
Adding Your Own PDFs
- Add PDF files to the
data/directory - Run the preprocessing script:
python scripts/preprocess_data.py
Implementation Notes
- Uses the text-embedding-3-small model for embeddings
- Uses gpt-4.1-mini for initial responses
- Uses gpt-4.1 for agent tools and quality evaluation
- Stores preprocessed data in
processed_data/directory