Spaces:
Sleeping
Sleeping
Jatin Mehra
Add comprehensive documentation including API reference, development guide, and index
535ca47 Development Guide
This guide helps developers understand the codebase and contribute to the RAG Chat Application.
ποΈ Project Structure
wasserstoff-AiInternTask/
βββ rag_elements/ # π§ Core RAG Engine
β βββ enhanced_vectordb.py # Main RAG implementation
β βββ config.py # Configuration management
βββ backend/ # π FastAPI Production Server
β βββ main.py # App entry point
β βββ models.py # Pydantic schemas
β βββ utils.py # Utilities and state
β βββ routes/ # API endpoints
βββ frontend/ # π¨ Web Interface
β βββ index.html # Main UI
β βββ style.css # Styling
β βββ script.js # Frontend logic
βββ tests/ # π§ͺ Test Suite
βββ docs/ # π Documentation
π§ Development Setup
Prerequisites
- Python 3.8+
- Git
- Text editor/IDE (VS Code recommended)
Environment Setup
# Clone repository
git clone https://github.com/Jatin-Mehra119/wasserstoff-AiInternTask.git
cd wasserstoff-AiInternTask
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux/macOS
# or venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Install development dependencies
pip install -r tests/requirements-test.txt
# Set up environment variables
cp .env.example .env # Create if exists
# Add your GROQ_API_KEY to .env
Running in Development Mode
# Start FastAPI with hot reload
cd backend
python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Or run Streamlit version
streamlit run streamlit_rag_app.py
π§± Core Components
1. RAG Engine (rag_elements/enhanced_vectordb.py)
The heart of the application. Key classes and methods:
class EnhancedDocumentProcessor:
def process_files(self, file_paths) # Multi-format processing
def create_enhanced_vector_store(self, documents) # FAISS index creation
def search_with_citations(self, query, k=5) # Semantic search
def get_chat_response(self, query) # End-to-end chat
def save_vector_store(self, path) # Persistence
def load_vector_store(self, path) # Restore data
2. FastAPI Backend (backend/)
Entry Point (main.py):
- FastAPI app initialization
- CORS configuration
- Route registration
Data Models (models.py):
- Pydantic schemas for API requests/responses
- Type validation and serialization
Routes (routes/):
main_routes.py- Frontend serving, health checksupload_routes.py- File upload and processingchat_routes.py- Chat interface and AI responsesstore_routes.py- Vector store management
Utilities (utils.py):
- Global state management
- Helper functions
- Error handling utilities
3. Frontend (frontend/)
Modern web interface with:
- HTML: Semantic structure with responsive layout
- CSS: Modern styling with CSS Grid/Flexbox
- JavaScript: Async API calls, real-time updates, file handling
π Data Flow
Document Processing Pipeline
- File Upload β
upload_routes.py - Text Extraction β
enhanced_vectordb.py - Chunking β LangChain text splitters
- Embeddings β Sentence Transformers
- Indexing β FAISS vector store
- Metadata Storage β JSON persistence
Chat Pipeline
- User Query β
chat_routes.py - Semantic Search β FAISS similarity search
- Context Retrieval β Top-K document chunks
- AI Response β GROQ API integration
- Citation Generation β Source attribution
- Response Formatting β Markdown output
π§ͺ Testing
Running Tests
cd tests
# Run all tests
bash run_tests.sh
# Run specific test files
python -m pytest test_endpoints_pytest.py -v
python test_api_endpoints.py
Test Structure
test_api_endpoints.py- Basic API endpoint testingtest_endpoints_pytest.py- Comprehensive pytest suiterun_tests.sh- Test runner script
Writing Tests
Follow these patterns:
# API endpoint test
def test_upload_endpoint():
response = requests.post(f"{BASE_URL}/upload-files", files=files)
assert response.status_code == 200
assert "total_files" in response.json()
# Pytest test
@pytest.mark.asyncio
async def test_chat_endpoint():
async with httpx.AsyncClient() as client:
response = await client.post(f"{BASE_URL}/chat",
json={"message": "test"})
assert response.status_code == 200
π Adding New Features
Adding a New API Endpoint
- Define Pydantic Model (
models.py):
class NewFeatureRequest(BaseModel):
parameter: str
optional_param: Optional[int] = None
class NewFeatureResponse(BaseModel):
result: str
success: bool
- Create Route Handler (
routes/new_routes.py):
from fastapi import APIRouter, HTTPException
from ..models import NewFeatureRequest, NewFeatureResponse
router = APIRouter()
@router.post("/new-feature", response_model=NewFeatureResponse)
async def new_feature_endpoint(request: NewFeatureRequest):
try:
# Implementation here
return NewFeatureResponse(result="success", success=True)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
- Register Router (
main.py):
from .routes.new_routes import router as new_router
app.include_router(new_router)
- Add Frontend Integration (
frontend/script.js):
async function callNewFeature(data) {
const response = await fetch('/new-feature', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify(data)
});
return response.json();
}
Extending the RAG Engine
To add new document types or processing capabilities:
- Add File Type Support (
enhanced_vectordb.py):
def extract_text_from_new_format(self, file_path):
# Implement extraction logic
return extracted_text
def process_files(self, file_paths):
for file_path in file_paths:
if file_path.endswith('.new_format'):
text = self.extract_text_from_new_format(file_path)
# Process text...
- Update Frontend File Acceptance (
index.html):
<input type="file" accept=".pdf,.txt,.new_format" multiple>
π¨ Frontend Development
Key JavaScript Functions
uploadFiles()- Handle file uploads with progresssendMessage()- Send chat messages and display responsesupdateStats()- Refresh processing statisticsdisplayCitations()- Show document sources
CSS Architecture
- Mobile-first responsive design
- CSS custom properties for theming
- Flexbox/Grid layouts
- Component-based styling
Adding UI Components
- Add HTML structure
- Style with CSS classes
- Add JavaScript event handlers
- Connect to backend APIs
π Debugging
Common Issues
CORS Errors:
- Check
main.pyCORS configuration - Ensure frontend runs on allowed origins
Import Errors:
- Verify Python path and virtual environment
- Check
requirements.txtdependencies
API Key Issues:
- Confirm GROQ API key is set
- Check environment variable loading
Logging
Add logging to your code:
import logging
logger = logging.getLogger(__name__)
@router.post("/endpoint")
async def endpoint():
logger.info("Processing request")
try:
# Logic here
logger.debug("Success")
except Exception as e:
logger.error(f"Error: {e}")
raise
π Code Style Guidelines
Python
- Follow PEP 8
- Use type hints
- Add docstrings
- Maximum line length: 88 characters
def process_document(file_path: str, options: Dict[str, Any]) -> ProcessResult:
"""
Process a document and extract text content.
Args:
file_path: Path to the document file
options: Processing configuration options
Returns:
ProcessResult containing extracted text and metadata
Raises:
ProcessingError: If document cannot be processed
"""
# Implementation...
JavaScript
- Use modern ES6+ syntax
- Prefer
const/letovervar - Use async/await for promises
- Add JSDoc comments
/**
* Upload files to the server
* @param {FileList} files - Files to upload
* @returns {Promise<Object>} Upload result
*/
async function uploadFiles(files) {
// Implementation...
}
π Deployment
Development
python -m uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
Production
python -m uvicorn backend.main:app --host 0.0.0.0 --port 8000 --workers 4
Docker (if configured)
docker build -t rag-chat-app .
docker run -p 8000:8000 -e GROQ_API_KEY=your_key rag-chat-app
π€ Contributing
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Make changes and add tests
- Ensure tests pass:
bash tests/run_tests.sh - Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Open Pull Request
Pull Request Checklist
- Code follows style guidelines
- Tests added for new functionality
- All tests pass
- Documentation updated
- No breaking changes (or clearly documented)