# Government Schemes RAG API Documentation (Multilingual) ## Overview FastAPI-based REST API for querying Indian Government Schemes using Retrieval-Augmented Generation (RAG) with **support for 13+ Indian languages**. ## Base URL ``` http://127.0.0.1:8000 ``` ## Key Features - ✅ Multilingual support (13+ Indian languages) - ✅ Automatic translation (Input & Output) - ✅ Text-to-Speech capability (optional) - ✅ RAG-powered intelligent search - ✅ 3400+ government schemes database ## API Endpoints ### 1. Root Endpoint **GET /** Returns API information, version, and supported languages. **Response:** ```json { "message": "Government Schemes RAG API with Multilingual Support", "version": "2.0.0", "supported_languages": { "en": "English", "hi": "Hindi", "te": "Telugu", "ta": "Tamil", "ml": "Malayalam", "kn": "Kannada", "bn": "Bengali", "mr": "Marathi", "gu": "Gujarati", "pa": "Punjabi", "ur": "Urdu", "or": "Odia", "as": "Assamese" }, "endpoints": { "POST /query": "Query government schemes with translation support", "GET /states": "Get list of Indian states", "GET /languages": "Get list of supported languages", "GET /health": "Health check" } } ``` --- ### 2. Health Check **GET /health** Check if the API and RAG system are running properly. **Response:** ```json { "status": "healthy", "rag_system": "initialized" } ``` --- ### 3. Get Supported Languages **GET /languages** Get list of all supported languages for translation. **Response:** ```json { "languages": { "en": "English", "hi": "Hindi", "te": "Telugu", "ta": "Tamil", "ml": "Malayalam", "kn": "Kannada", "bn": "Bengali", "mr": "Marathi", "gu": "Gujarati", "pa": "Punjabi", "ur": "Urdu", "or": "Odia", "as": "Assamese" } } ``` --- ### 4. Get States **GET /states** Get list of all Indian states and union territories. **Response:** ```json { "states": [ "All States", "Andhra Pradesh", "Arunachal Pradesh", ... ] } ``` --- ### 5. Query Schemes (with Multilingual Support) **POST /query** Query government schemes in any supported language. The API automatically translates the input to English, processes it through the RAG system, and returns the answer in the requested language. **Request Body:** ```json { "question": "స్కాలర్‌షిప్ల గురించి చెప్పండి", // Question in any language "state": "Telangana", // Optional "language": "te" // Language code (default: "en") } ``` **Response:** ```json { "answer": "తెలంగాణలో అందుబాటులో ఉన్న స్కాలర్‌షిప్‌ల గురించి...", "sources": [ "Scheme Name: Pre-Matric Scholarship for Backward Class Students...", "Scheme Name: Post-Matric Scholarship Scheme...", "Scheme Name: Merit-cum-Means Scholarship..." ] } ``` **Note:** Audio is NOT automatically generated. Use the `/generate-audio` endpoint when the user clicks the speaker button. **Translation Flow:** ``` User Question (Telugu) → Translate to English → RAG Processing → English Answer → Translate to Telugu → Return to User ``` --- ### 6. Generate Audio (On-Demand) **POST /generate-audio** Generate audio from text. This endpoint should be called ONLY when the user clicks the "Play Audio" or speaker button on the UI. **Request Body:** ```json { "text": "తెలంగాణలో అందుబాటులో ఉన్న స్కాలర్‌షిప్‌ల గురించి...", "language": "te" // Language code (default: "en") } ``` **Response:** ```json { "audio": "base64_encoded_mp3_audio_data" } ``` **Usage Flow:** ``` 1. User submits question → Receive answer (fast, no audio) 2. User clicks speaker button → Call /generate-audio → Play audio ``` **Error Response (400 - Empty Text):** ```json { "detail": "Text cannot be empty" } ``` **Error Response (400 - Unsupported Language):** ```json { "detail": "Unsupported language. Supported: ['en', 'hi', 'te', 'ta', ...]" } ``` **Error Response (400 - Empty Question):** ```json { "detail": "Question cannot be empty" } ``` **Error Response (400 - Unsupported Language):** ```json { "detail": "Unsupported language. Supported: ['en', 'hi', 'te', 'ta', ...]" } ``` **Error Response (500):** ```json { "detail": "Error processing query: [error message]" } ``` --- ## Interactive API Documentation FastAPI automatically generates interactive API documentation: - **Swagger UI**: http://127.0.0.1:8000/docs - **ReDoc**: http://127.0.0.1:8000/redoc These interfaces allow you to: - View all endpoints - See request/response schemas - Test API calls directly from the browser - Download OpenAPI specification --- ## Usage Examples ### Using cURL ```bash # Health check curl http://127.0.0.1:8000/health # Get supported languages curl http://127.0.0.1:8000/languages # Get states curl http://127.0.0.1:8000/states # Query in English curl -X POST http://127.0.0.1:8000/query \ -H "Content-Type: application/json" \ -d '{ "question": "What scholarships are available for SC students?", "state": "Karnataka", "language": "en" }' # Query in Hindi curl -X POST http://127.0.0.1:8000/query \ -H "Content-Type: application/json" \ -d '{ "question": "छात्रवृत्ति के बारे में बताएं", "language": "hi" }' # Query in Telugu curl -X POST http://127.0.0.1:8000/query \ -H "Content-Type: application/json" \ -d '{ "question": "స్కాలర్‌షిప్ల గురించి చెప్పండి", "state": "Telangana", "language": "te" }' # Generate audio (when user clicks speaker button) curl -X POST http://127.0.0.1:8000/generate-audio \ -H "Content-Type: application/json" \ -d '{ "text": "తెలంగాణలో అందుబాటులో ఉన్న స్కాలర్‌షిప్‌లు...", "language": "te" }' ``` ### Using Python requests ```python import requests # Query in English response = requests.post( "http://127.0.0.1:8000/query", json={ "question": "My daughter is studying in 9th standard. What schemes are applicable?", "state": "Maharashtra", "language": "en" } ) data = response.json() print(data["answer"]) # Query in Hindi response_hindi = requests.post( "http://127.0.0.1:8000/query", json={ "question": "मुझे छात्रवृत्ति चाहिए", "language": "hi" } ) hindi_data = response_hindi.json() print(hindi_data["answer"]) # Answer will be in Hindi # Generate audio on-demand (when user clicks speaker button) audio_response = requests.post( "http://127.0.0.1:8000/generate-audio", json={ "text": hindi_data["answer"], "language": "hi" } ) audio_data = audio_response.json() # audio_data["audio"] contains base64 encoded MP3 ``` ### Using JavaScript fetch ```javascript // Query in English const response = await fetch('http://127.0.0.1:8000/query', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ question: 'What schemes are available for girl child education?', state: 'All States', language: 'en' }) }); const data = await response.json(); console.log(data.answer); // Query in Telugu const responseTelugu = await fetch('http://127.0.0.1:8000/query', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ question: 'బాలికల విద్య కోసం ఏ పథకాలు ఉన్నాయి?', language: 'te' }) }); const teluguData = await responseTelugu.json(); console.log(teluguData.answer); // Answer in Telugu // Generate audio when user clicks speaker button const playAudio = async (text, language) => { const audioResponse = await fetch('http://127.0.0.1:8000/generate-audio', { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ text: text, language: language }) }); const audioData = await audioResponse.json(); const audio = new Audio(`data:audio/mp3;base64,${audioData.audio}`); audio.play(); }; // Usage: Call when user clicks speaker button // playAudio(teluguData.answer, 'te'); ``` ### Using React (Frontend Integration) ```jsx import React, { useState } from 'react'; function SchemeQuery() { const [language, setLanguage] = useState('en'); const [question, setQuestion] = useState(''); const [answer, setAnswer] = useState(''); const [audioLoading, setAudioLoading] = useState(false); const handleSubmit = async (e) => { e.preventDefault(); const response = await fetch('http://127.0.0.1:8000/query', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ question: question, language: language }) }); const data = await response.json(); setAnswer(data.answer); }; // Called only when user clicks speaker button const playAudio = async () => { if (!answer) return; setAudioLoading(true); try { const response = await fetch('http://127.0.0.1:8000/generate-audio', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text: answer, language: language }) }); const data = await response.json(); const audio = new Audio(`data:audio/mp3;base64,${data.audio}`); audio.play(); } catch (error) { console.error('Audio generation failed:', error); } finally { setAudioLoading(false); } }; return (
setQuestion(e.target.value)} placeholder="Ask your question..." />
{answer && (

{answer}

)}
); } ``` const response = await fetch('http://127.0.0.1:8000/query', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ question: question, language: language }) }); const data = await response.json(); setAnswer(data.answer); }; return (
setQuestion(e.target.value)} placeholder="Ask your question..." />
{answer &&
{answer}
}
); } ``` ### Using Postman 1. **Method**: POST 2. **URL**: `http://127.0.0.1:8000/query` 3. **Headers**: - `Content-Type: application/json` 4. **Body** (raw JSON): **English:** ```json { "question": "What are the schemes for construction workers?", "state": "Karnataka", "language": "en" } ``` **Hindi:** ```json { "question": "निर्माण श्रमिकों के लिए क्या योजनाएं हैं?", "language": "hi" } ``` **Telugu:** ```json { "question": "నిర్మాణ కార్మికులకు ఏ పథకాలు ఉన్నాయి?", "state": "Telangana", "language": "te" } ``` --- ## Running the API ### Start the Server ```bash # Activate virtual environment .venv\Scripts\activate # Run the API python app.py ``` The API will start on `http://0.0.0.0:8000` ### Testing the API Run the test script: ```bash python test_api.py ``` --- ## CORS Configuration The API is configured to accept requests from any origin (`allow_origins=["*"]`). ⚠️ **For production**, update the CORS settings in `app.py`: ```python app.add_middleware( CORSMiddleware, allow_origins=["https://yourdomain.com"], # Specify allowed origins allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) ``` --- ## Data Source The API uses `updated_data.csv` containing **3400+ government schemes** across categories: - Education & Learning - Social Welfare & Empowerment - Health & Wellness - Business & Entrepreneurship - Women and Child - And more... --- ## Technology Stack - **Framework**: FastAPI 0.104.1 - **LLM**: Groq API (llama-3.3-70b-versatile) - **Embeddings**: HuggingFace sentence-transformers/all-MiniLM-L6-v2 - **Vector DB**: ChromaDB - **RAG Framework**: LangChain 0.1.0 - **Translation**: deep-translator 1.11.4 (Google Translate) - **Text-to-Speech**: gTTS 2.5.0 (Google Text-to-Speech) - **Server**: Uvicorn --- ## Multilingual Features ### Translation Process 1. **Input Translation**: User's question in any Indian language → English 2. **RAG Processing**: English query → Vector search → LLM inference → English answer 3. **Output Translation**: English answer → User's selected language ### Supported Language Codes | Code | Language | Code | Language | |------|----------|------|----------| | `en` | English | `ml` | Malayalam | | `hi` | Hindi | `kn` | Kannada | | `te` | Telugu | `bn` | Bengali | | `ta` | Tamil | `mr` | Marathi | | `gu` | Gujarati | `pa` | Punjabi | | `ur` | Urdu | `or` | Odia | | `as` | Assamese | | | ### Text-to-Speech (Optional) To enable audio responses, uncomment the following lines in `app.py`: ```python # Line ~280 in app.py audio_base64 = TranslationService.text_to_speech(final_answer, request.language) ``` When enabled, the API will return base64-encoded MP3 audio in the `audio` field. --- ## Testing the Multilingual API ### Using the Test Script ```bash # Make sure the server is running first python app.py # In another terminal python test_translation.py ``` The test script will: 1. Verify language endpoint 2. Test queries in English, Hindi, Telugu, Tamil, and Malayalam 3. Display translated responses ### Manual Testing Checklist - [ ] Test each supported language - [ ] Verify translations are accurate - [ ] Check source citations are included - [ ] Test with state filters - [ ] Test error handling (empty questions, invalid languages) - [ ] Verify CORS headers for frontend integration --- ## Rate Limits Currently, there are no rate limits implemented. The API uses Groq's free tier which has its own rate limits. For production deployment, consider implementing: - Request rate limiting - Authentication/API keys - Caching for common queries --- ## Error Handling - **400 Bad Request**: Invalid or empty question - **500 Internal Server Error**: Processing error (check GROQ_API_KEY) --- ## Performance Notes - First query may take 3-5 seconds (vector search + LLM inference) - Subsequent queries are faster (~1-2 seconds) - ChromaDB is persisted to disk (./chroma_db/) for faster restarts - 3400 schemes are chunked into ~12,000-15,000 text segments --- ## Deployment ### Local Development ```bash uvicorn app:app --reload --host 0.0.0.0 --port 8000 ``` ### Production (with Gunicorn) ```bash pip install gunicorn gunicorn app:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 ``` ### Docker (Optional) Create a `Dockerfile`: ```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] ``` --- ## Support For issues or questions: 1. Check API docs at `/docs` 2. Review logs in terminal 3. Verify `.env` file has valid `GROQ_API_KEY` 4. Ensure `updated_data.csv` is present --- ## Example Queries (Multilingual) ### English - "My daughter is studying in 9th standard. What schemes are applicable to her?" - "What scholarships are available for SC/ST students?" - "What are the schemes for construction workers?" - "Tell me about Beti Bachao Beti Padhao scheme" ### Hindi (हिंदी) - "मेरी बेटी 9वीं कक्षा में पढ़ती है। उसके लिए कौन सी योजनाएं हैं?" - "SC/ST छात्रों के लिए क्या छात्रवृत्ति उपलब्ध है?" - "निर्माण श्रमिकों के लिए क्या योजनाएं हैं?" - "बेटी बचाओ बेटी पढ़ाओ योजना के बारे में बताएं" ### Telugu (తెలుగు) - "నా కూతురు 9వ తరగతి చదువుతోంది. ఆమెకు ఏ పథకాలు వర్తిస్తాయి?" - "SC/ST విద్యార్థులకు ఏ స్కాలర్‌షిప్‌లు అందుబాటులో ఉన్నాయి?" - "నిర్మాణ కార్మికులకు ఏ పథకాలు ఉన్నాయి?" ### Tamil (தமிழ்) - "என் மகள் 9வது வகுப்பு படிக்கிறாள். அவளுக்கு என்ன திட்டங்கள் பொருந்தும்?" - "SC/ST மாணவர்களுக்கு என்ன உதவித்தொகை கிடைக்கும்?" ### Malayalam (മലയാളം) - "എന്റെ മകൾ 9-ാം ക്ലാസിൽ പഠിക്കുന്നു. അവൾക്ക് എന്തെല്ലാം പദ്ധതികൾ ബാധകമാണ്?" - "SC/ST വിദ്യാർത്ഥികൾക്ക് എന്ത് സ്കോളർഷിപ്പുകൾ ലഭ്യമാണ്?" --- ## Frontend Integration Guide For detailed React integration instructions, see: **MULTILINGUAL_INTEGRATION_GUIDE.md** Key points for frontend developers: 1. Always send `language` parameter with queries 2. Backend handles ALL translation - no frontend translation needed 3. Use Web Speech API for voice input (browser native) 4. Use Speech Synthesis API for voice output (browser native) 5. Display loading states during translation/query processing --- ## Performance & Optimization ### Response Times - **Translation**: ~0.5-1 second per translation - **RAG Query**: ~2-3 seconds - **Total**: ~3-5 seconds for multilingual queries - **English-only**: ~2-3 seconds (no translation overhead) ### Optimization Tips 1. **Cache translations** for common queries 2. **Lazy load audio** - only generate when user clicks "Play" 3. **Use connection pooling** for API calls 4. **Implement request debouncing** in frontend 5. **Add response caching** for identical queries ### Scaling Considerations - Translation uses free Google Translate API (via deep-translator) - No rate limits on translation service currently - Groq API has free tier limits (check console.groq.com) - Consider premium APIs for production (Azure Translator, Google Cloud Translation)