Spaces:
Sleeping
Sleeping
| title: AB Testing RAG Agent | |
| emoji: 🤖 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| sdk_version: 3.14 | |
| app_port: 8501 | |
| pinned: false | |
| # AB Testing RAG Agent | |
| This application is a Streamlit-based frontend for an AB Testing QA system that uses a carefully designed retrieval-augmented generation (RAG) approach with a LangGraph architecture. | |
| ## Features | |
| - QA system specialized in AB Testing topics | |
| - Intelligent query routing with LangGraph | |
| - Source citations for all answers | |
| - Streamlit interface for easy interaction | |
| ## Setup for Development | |
| ### Prerequisites | |
| - Python 3.9+ | |
| - OpenAI API key | |
| - Huggingface account and token (for deployment) | |
| ### Environment Setup | |
| 1. Clone this repository | |
| 2. Create a `.env` file in the root directory with the following content: | |
| ``` | |
| OPENAI_API_KEY=your_openai_api_key_here | |
| HF_TOKEN=your_huggingface_token_here | |
| ``` | |
| ### Process the PDFs | |
| Before running the app, you need to process the PDF files to create the vectorstore: | |
| ```bash | |
| python process_data.py | |
| ``` | |
| This will: | |
| 1. Load PDFs from `notebook_version/data/` | |
| 2. Process, chunk, and embed the documents | |
| 3. Create a Qdrant vectorstore in `data/processed_data/` | |
| ### Running the App Locally | |
| Once the data is processed, you can run the Streamlit app: | |
| ```bash | |
| streamlit run app/app.py | |
| ``` | |
| ## Deployment to Huggingface Spaces | |
| ### Prerequisites for Deployment | |
| 1. Huggingface account | |
| 2. Docker installed locally | |
| ### Steps to Deploy | |
| 1. Process the PDFs locally: `python process_data.py` | |
| 2. Build the Docker image: `docker build -t ab-testing-qa .` | |
| 3. Create a new Huggingface Space (Docker-based) | |
| 4. Add your Huggingface token and OpenAI API key as secrets in the space | |
| 5. Push the Docker image to Huggingface | |
| ### Huggingface Spaces Configuration | |
| The application is configured to use the following secrets: | |
| - `OPENAI_API_KEY`: Your OpenAI API key | |
| - `HF_TOKEN`: Your Huggingface token | |
| ## System Architecture | |
| The AB Testing QA system uses a sophisticated LangGraph architecture: | |
| 1. **Initial RAG Node**: Retrieves documents and attempts to answer the query | |
| 2. **Helpfulness Judge**: Determines if: | |
| - The query is related to AB Testing | |
| - The initial response is helpful enough | |
| 3. **Agent Node**: If needed, uses specialized tools to improve the answer: | |
| - Standard retrieval tool | |
| - Query-rephrasing retrieval tool | |
| - ArXiv search tool | |
| ## Data Processing | |
| The system processes PDFs using a specific approach: | |
| 1. Merges PDF pages while maintaining page metadata | |
| 2. Uses RecursiveCharacterTextSplitter with specific parameters | |
| 3. Embeds using OpenAI's text-embedding-3-small model | |
| 4. Stores in a Qdrant vectorstore |