Spaces:

navyamehta
/

mini-rag

Sleeping

App Files Files Community

navyamehta commited on Aug 31, 2025

Commit

0119088

verified ·

1 Parent(s): c2e6221

Upload README.md

Browse files

Files changed (1) hide show

README.md +274 -0

README.md ADDED Viewed

	@@ -0,0 +1,274 @@

+---
+title: Mini RAG - Track B Assessment
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 4.44.0
+app_file: app.py
+pinned: false
+---
+# Mini RAG - Track B Assessment
+A production-ready RAG (Retrieval-Augmented Generation) application that demonstrates text input, vector storage, retrieval + reranking, and LLM answering with inline citations.
+## 🎯 Goal
+Build and host a small RAG app where users input text (upload file is optional) from the frontend, store it in a cloud-hosted vector DB, retrieve the most relevant chunks with a retriever + reranker, and answer queries via an LLM with proper citations.
+## 🏗️ Architecture
+```
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   Frontend      │    │   Backend       │    │   External      │
+│   (Gradio UI)   │◄──►│   (Python)      │◄──►│   Services      │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+│                        │                        │
+│ • Text Input/Upload    │ • Text Processing      │ • OpenAI API    │
+│ • Query Interface      │ • Chunking Strategy    │ • Groq API      │
+│ • Results Display      │ • Embedding Generation │ • Cohere API    │
+│ • Citations & Sources  │ • Vector Storage      │ • Pinecone      │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+```
+### Data Flow
+1. **Ingestion**: Text → Chunking → Embedding → Pinecone Vector DB
+2. **Query**: Question → Embedding → Vector Search → Top-K Retrieval
+3. **Reranking**: Retrieved chunks → Cohere Reranker → Reordered results
+4. **Generation**: Reranked chunks → LLM → Answer with inline citations [1], [2]
+## 🚀 Features
+### ✅ Requirements Met
+- **Vector Database**: Pinecone cloud-hosted with serverless index
+- **Embeddings & Chunking**: OpenAI embeddings with configurable chunk size (400-1200 tokens) and overlap (10-15%)
+- **Retriever + Reranker**: Top-k retrieval with optional Cohere reranker
+- **LLM & Answering**: OpenAI/Groq with inline citations and source mapping
+- **Frontend**: Text input/upload, query interface, citations display, timing & cost estimates
+- **Metadata Storage**: Source, title, section, position tracking
+### 🔧 Technical Details
+- **Chunking Strategy**: 800 tokens default with 120 token overlap (15%)
+- **Vector Dimension**: 1536 (OpenAI text-embedding-3-small)
+- **Index Configuration**: Pinecone serverless, cosine similarity
+- **Upsert Strategy**: Batch processing (100 chunks) with metadata preservation
+## 🛠️ Setup
+### Prerequisites
+- Python 3.8+
+- Pinecone account and API key
+- OpenAI API key
+- Groq API key (optional)
+- Cohere API key (optional, for reranking)
+### Installation
+1. **Clone and setup environment**
+```bash
+git clone <your-repo-url>
+cd mini-rag
+python -m venv .venv
+source .venv/bin/activate  # On Windows: .\.venv\Scripts\activate
+pip install -r requirements.txt
+```
+2. **Configure environment variables**
+```bash
+cp .env.example .env
+# Edit .env with your API keys
+```
+3. **Create data directory**
+```bash
+mkdir data
+```
+4. **Run the application**
+```bash
+python app.py
+```
+### Environment Variables
+```bash
+# Pinecone
+PINECONE_API_KEY=your_pinecone_key
+PINECONE_INDEX=mini-rag-index
+PINECONE_CLOUD=aws
+PINECONE_REGION=us-east-1
+# LLMs
+OPENAI_API_KEY=your_openai_key
+GROQ_API_KEY=your_groq_key
+# Reranker
+COHERE_API_KEY=your_cohere_key
+# Models
+EMBEDDING_MODEL=text-embedding-3-small
+LLM_PROVIDER=openai
+LLM_MODEL=gpt-4o-mini
+RERANK_PROVIDER=cohere
+RERANK_MODEL=rerank-3
+# Chunking
+CHUNK_SIZE=800
+CHUNK_OVERLAP=120
+DATA_DIR=./data
+```
+## 📊 Evaluation
+### Gold Set Q&A Pairs
+1. **Q:** What is the main topic of the document?
+   **Expected:** Clear identification of document subject
+2. **Q:** What are the key findings or conclusions?
+   **Expected:** Specific facts or conclusions from the text
+3. **Q:** What methodology was used?
+   **Expected:** Description of approach or methods mentioned
+4. **Q:** What are the limitations discussed?
+   **Expected:** Any limitations or constraints mentioned
+5. **Q:** What future work is suggested?
+   **Expected:** Recommendations or future directions
+### Success Metrics
+- **Precision**: Relevant information in answers
+- **Recall**: Coverage of available information
+- **Citation Accuracy**: Proper source attribution with [1], [2] format
+- **Response Time**: Query processing speed
+- **Cost Efficiency**: Token usage and API cost estimates
+## 🚀 Deployment
+### Free Hosting Options
+- **Hugging Face Spaces**: Gradio apps with free tier
+- **Render**: Free tier for Python web services
+- **Railway**: Free tier for small applications
+- **Vercel**: Free tier for static sites (with API routes)
+### Deployment Steps
+1. **Prepare for deployment**
+   - Ensure all API keys are environment variables
+   - Test locally with production settings
+   - Add proper error handling and logging
+2. **Deploy to chosen platform**
+   - Follow platform-specific deployment guides
+   - Set environment variables in platform dashboard
+   - Configure domain and SSL if needed
+## 📁 Project Structure
+```
+mini-rag/
+├── app.py              # Gradio UI and main application
+├── rag_core.py         # RAG orchestration logic
+├── llm.py             # LLM provider abstraction
+├── pinecone_client.py # Pinecone vector DB client
+├── ingest.py          # Document ingestion pipeline
+├── chunker.py         # Text chunking strategy
+├── requirements.txt   # Python dependencies
+├── .env.example      # Environment variables template
+├── README.md         # This file
+└── data/             # Document storage directory
+```
+## 🔍 Usage Examples
+### 1. Text Input Processing
+- Paste text into the "Text Input" tab
+- Configure chunk size (400-1200 tokens) and overlap (10-15%)
+- Click "Process & Store Text" to ingest into vector DB
+### 2. File Ingestion
+- Place documents (.txt, .md, .pdf) in the `data/` directory
+- Use the "File Ingestion" tab to process all files
+- Monitor chunk count and processing status
+### 3. Query and Answer
+- Navigate to "Query" tab
+- Enter your question
+- Adjust Top-K retrieval and reranker settings
+- Get answer with inline citations [1], [2] and source details
+## 📈 Performance & Monitoring
+### Metrics Tracked
+- **Processing Time**: End-to-end query response time
+- **Token Usage**: Query, context, and answer token counts
+- **Cost Estimates**: Embedding, LLM, and reranking costs
+- **Retrieval Quality**: Vector similarity scores and rerank scores
+### Optimization Tips
+- Adjust chunk size based on document characteristics
+- Use reranker for better relevance (adds ~100ms but improves quality)
+- Batch process documents for efficient ingestion
+- Monitor Pinecone index performance and costs
+## 🚨 Error Handling
+### Common Issues
+- **Missing API Keys**: Check environment variables
+- **Pinecone Connection**: Verify index name and region
+- **Document Processing**: Check file formats and encoding
+- **Rate Limits**: Implement exponential backoff for API calls
+### Graceful Degradation
+- Fallback to original retrieval order if reranker fails
+- Continue processing if individual documents fail
+- Provide clear error messages with troubleshooting steps
+## 🔮 Future Enhancements
+### Planned Improvements
+- **Advanced Chunking**: Semantic chunking with sentence transformers
+- **Hybrid Search**: Combine vector and keyword search
+- **Multi-modal Support**: Image and document processing
+- **Caching Layer**: Redis for frequently accessed results
+- **Analytics Dashboard**: Query performance and usage metrics
+### Scalability Considerations
+- **Vector DB**: Pinecone pod scaling for larger datasets
+- **Embedding Models**: Local models for cost reduction
+- **Load Balancing**: Multiple LLM providers for redundancy
+- **CDN Integration**: Static asset optimization
+## 📝 Remarks
+### Trade-offs Made
+- **API Dependencies**: Relies on external services for embeddings and LLM
+- **Cost vs Quality**: OpenAI embeddings provide quality but add cost
+- **Latency**: Reranking adds ~100ms but significantly improves relevance
+- **Chunking Strategy**: Fixed-size chunks for simplicity vs semantic chunking
+### Provider Limits
+- **OpenAI**: Rate limits and token limits per request
+- **Pinecone**: Free tier index size and query limits
+- **Cohere**: Reranking API rate limits
+- **Groq**: Alternative LLM with different pricing model
+### What I'd Do Next
+1. **Implement semantic chunking** for better document understanding
+2. **Add hybrid search** combining vector and keyword approaches
+3. **Build evaluation framework** with automated testing
+4. **Optimize for production** with proper logging and monitoring
+5. **Add authentication** for multi-user support
+## 👨‍💻 Author
+**Your Name** - AI Engineer Assessment Candidate
+- **GitHub**: [Your GitHub Profile]
+- **LinkedIn**: [Your LinkedIn Profile]
+- **Portfolio**: [Your Portfolio/Website]
+## 📄 License
+This project is created for the AI Engineer Assessment. Feel free to use and modify for learning purposes.
+---
+**Note**: This implementation demonstrates production-ready practices including proper error handling, environment variable management, comprehensive documentation, and scalable architecture design.