Spaces:
Sleeping
Sleeping
| title: Mini RAG - Track B Assessment | |
| emoji: ๐ค | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| # Mini RAG - Track B Assessment | |
| A production-ready RAG (Retrieval-Augmented Generation) application that demonstrates text input, vector storage, retrieval + reranking, and LLM answering with inline citations. | |
| ## ๐ฏ Goal | |
| Build and host a small RAG app where users input text (upload file is optional) from the frontend, store it in a cloud-hosted vector DB, retrieve the most relevant chunks with a retriever + reranker, and answer queries via an LLM with proper citations. | |
| ## ๐๏ธ Architecture | |
| ``` | |
| โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ | |
| โ Frontend โ โ Backend โ โ External โ | |
| โ (Gradio UI) โโโโโบโ (Python) โโโโโบโ Services โ | |
| โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ | |
| โ โ โ | |
| โ โข Text Input/Upload โ โข Text Processing โ โข OpenAI API โ | |
| โ โข Query Interface โ โข Chunking Strategy โ โข Groq API โ | |
| โ โข Results Display โ โข Embedding Generation โ โข Cohere API โ | |
| โ โข Citations & Sources โ โข Vector Storage โ โข Pinecone โ | |
| โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ | |
| ``` | |
| ### Data Flow | |
| 1. **Ingestion**: Text โ Chunking โ Embedding โ Pinecone Vector DB | |
| 2. **Query**: Question โ Embedding โ Vector Search โ Top-K Retrieval | |
| 3. **Reranking**: Retrieved chunks โ Cohere Reranker โ Reordered results | |
| 4. **Generation**: Reranked chunks โ LLM โ Answer with inline citations [1], [2] | |
| ## ๐ Features | |
| ### โ Requirements Met | |
| - **Vector Database**: Pinecone cloud-hosted with serverless index | |
| - **Embeddings & Chunking**: OpenAI embeddings with configurable chunk size (400-1200 tokens) and overlap (10-15%) | |
| - **Retriever + Reranker**: Top-k retrieval with optional Cohere reranker | |
| - **LLM & Answering**: OpenAI/Groq with inline citations and source mapping | |
| - **Frontend**: Text input/upload, query interface, citations display, timing & cost estimates | |
| - **Metadata Storage**: Source, title, section, position tracking | |
| ### ๐ง Technical Details | |
| - **Chunking Strategy**: 800 tokens default with 120 token overlap (15%) | |
| - **Vector Dimension**: 1536 (OpenAI text-embedding-3-small) | |
| - **Index Configuration**: Pinecone serverless, cosine similarity | |
| - **Upsert Strategy**: Batch processing (100 chunks) with metadata preservation | |
| ## ๐ ๏ธ Setup | |
| ### Prerequisites | |
| - Python 3.8+ | |
| - Pinecone account and API key | |
| - OpenAI API key | |
| - Groq API key (optional) | |
| - Cohere API key (optional, for reranking) | |
| ### Installation | |
| 1. **Clone and setup environment** | |
| ```bash | |
| git clone <your-repo-url> | |
| cd mini-rag | |
| python -m venv .venv | |
| source .venv/bin/activate # On Windows: .\.venv\Scripts\activate | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Configure environment variables** | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env with your API keys | |
| ``` | |
| 3. **Create data directory** | |
| ```bash | |
| mkdir data | |
| ``` | |
| 4. **Run the application** | |
| ```bash | |
| python app.py | |
| ``` | |
| ### Environment Variables | |
| ```bash | |
| # Pinecone | |
| PINECONE_API_KEY=your_pinecone_key | |
| PINECONE_INDEX=mini-rag-index | |
| PINECONE_CLOUD=aws | |
| PINECONE_REGION=us-east-1 | |
| # LLMs | |
| OPENAI_API_KEY=your_openai_key | |
| GROQ_API_KEY=your_groq_key | |
| # Reranker | |
| COHERE_API_KEY=your_cohere_key | |
| # Models | |
| EMBEDDING_MODEL=text-embedding-3-small | |
| LLM_PROVIDER=openai | |
| LLM_MODEL=gpt-4o-mini | |
| RERANK_PROVIDER=cohere | |
| RERANK_MODEL=rerank-3 | |
| # Chunking | |
| CHUNK_SIZE=800 | |
| CHUNK_OVERLAP=120 | |
| DATA_DIR=./data | |
| ``` | |
| ## ๐ Evaluation | |
| ### Gold Set Q&A Pairs | |
| 1. **Q:** What is the main topic of the document? | |
| **Expected:** Clear identification of document subject | |
| 2. **Q:** What are the key findings or conclusions? | |
| **Expected:** Specific facts or conclusions from the text | |
| 3. **Q:** What methodology was used? | |
| **Expected:** Description of approach or methods mentioned | |
| 4. **Q:** What are the limitations discussed? | |
| **Expected:** Any limitations or constraints mentioned | |
| 5. **Q:** What future work is suggested? | |
| **Expected:** Recommendations or future directions | |
| ### Success Metrics | |
| - **Precision**: Relevant information in answers | |
| - **Recall**: Coverage of available information | |
| - **Citation Accuracy**: Proper source attribution with [1], [2] format | |
| - **Response Time**: Query processing speed | |
| - **Cost Efficiency**: Token usage and API cost estimates | |
| ## ๐ Deployment | |
| ### Free Hosting Options | |
| - **Hugging Face Spaces**: Gradio apps with free tier | |
| - **Render**: Free tier for Python web services | |
| - **Railway**: Free tier for small applications | |
| - **Vercel**: Free tier for static sites (with API routes) | |
| ### Deployment Steps | |
| 1. **Prepare for deployment** | |
| - Ensure all API keys are environment variables | |
| - Test locally with production settings | |
| - Add proper error handling and logging | |
| 2. **Deploy to chosen platform** | |
| - Follow platform-specific deployment guides | |
| - Set environment variables in platform dashboard | |
| - Configure domain and SSL if needed | |
| ## ๐ Project Structure | |
| ``` | |
| mini-rag/ | |
| โโโ app.py # Gradio UI and main application | |
| โโโ rag_core.py # RAG orchestration logic | |
| โโโ llm.py # LLM provider abstraction | |
| โโโ pinecone_client.py # Pinecone vector DB client | |
| โโโ ingest.py # Document ingestion pipeline | |
| โโโ chunker.py # Text chunking strategy | |
| โโโ requirements.txt # Python dependencies | |
| โโโ .env.example # Environment variables template | |
| โโโ README.md # This file | |
| โโโ data/ # Document storage directory | |
| ``` | |
| ## ๐ Usage Examples | |
| ### 1. Text Input Processing | |
| - Paste text into the "Text Input" tab | |
| - Configure chunk size (400-1200 tokens) and overlap (10-15%) | |
| - Click "Process & Store Text" to ingest into vector DB | |
| ### 2. File Ingestion | |
| - Place documents (.txt, .md, .pdf) in the `data/` directory | |
| - Use the "File Ingestion" tab to process all files | |
| - Monitor chunk count and processing status | |
| ### 3. Query and Answer | |
| - Navigate to "Query" tab | |
| - Enter your question | |
| - Adjust Top-K retrieval and reranker settings | |
| - Get answer with inline citations [1], [2] and source details | |
| ## ๐ Performance & Monitoring | |
| ### Metrics Tracked | |
| - **Processing Time**: End-to-end query response time | |
| - **Token Usage**: Query, context, and answer token counts | |
| - **Cost Estimates**: Embedding, LLM, and reranking costs | |
| - **Retrieval Quality**: Vector similarity scores and rerank scores | |
| ### Optimization Tips | |
| - Adjust chunk size based on document characteristics | |
| - Use reranker for better relevance (adds ~100ms but improves quality) | |
| - Batch process documents for efficient ingestion | |
| - Monitor Pinecone index performance and costs | |
| ## ๐จ Error Handling | |
| ### Common Issues | |
| - **Missing API Keys**: Check environment variables | |
| - **Pinecone Connection**: Verify index name and region | |
| - **Document Processing**: Check file formats and encoding | |
| - **Rate Limits**: Implement exponential backoff for API calls | |
| ### Graceful Degradation | |
| - Fallback to original retrieval order if reranker fails | |
| - Continue processing if individual documents fail | |
| - Provide clear error messages with troubleshooting steps | |
| ## ๐ฎ Future Enhancements | |
| ### Planned Improvements | |
| - **Advanced Chunking**: Semantic chunking with sentence transformers | |
| - **Hybrid Search**: Combine vector and keyword search | |
| - **Multi-modal Support**: Image and document processing | |
| - **Caching Layer**: Redis for frequently accessed results | |
| - **Analytics Dashboard**: Query performance and usage metrics | |
| ### Scalability Considerations | |
| - **Vector DB**: Pinecone pod scaling for larger datasets | |
| - **Embedding Models**: Local models for cost reduction | |
| - **Load Balancing**: Multiple LLM providers for redundancy | |
| - **CDN Integration**: Static asset optimization | |
| ## ๐ Remarks | |
| ### Trade-offs Made | |
| - **API Dependencies**: Relies on external services for embeddings and LLM | |
| - **Cost vs Quality**: OpenAI embeddings provide quality but add cost | |
| - **Latency**: Reranking adds ~100ms but significantly improves relevance | |
| - **Chunking Strategy**: Fixed-size chunks for simplicity vs semantic chunking | |
| ### Provider Limits | |
| - **OpenAI**: Rate limits and token limits per request | |
| - **Pinecone**: Free tier index size and query limits | |
| - **Cohere**: Reranking API rate limits | |
| - **Groq**: Alternative LLM with different pricing model | |
| ### What I'd Do Next | |
| 1. **Implement semantic chunking** for better document understanding | |
| 2. **Add hybrid search** combining vector and keyword approaches | |
| 3. **Build evaluation framework** with automated testing | |
| 4. **Optimize for production** with proper logging and monitoring | |
| 5. **Add authentication** for multi-user support | |
| ## ๐จโ๐ป Author | |
| **Your Name** - AI Engineer Assessment Candidate | |
| - **GitHub**: [Your GitHub Profile] | |
| - **LinkedIn**: [Your LinkedIn Profile] | |
| - **Portfolio**: [Your Portfolio/Website] | |
| ## ๐ License | |
| This project is created for the AI Engineer Assessment. Feel free to use and modify for learning purposes. | |
| --- | |
| **Note**: This implementation demonstrates production-ready practices including proper error handling, environment variable management, comprehensive documentation, and scalable architecture design. | |