--- title: Mini Rag App emoji: 📈 colorFrom: pink colorTo: red sdk: gradio sdk_version: 6.3.0 app_file: app.py pinned: false thumbnail: >- https://cdn-uploads.huggingface.co/production/uploads/696cb435ea65e4b95276706e/yKmxaQF3FkZuUQgDM3pk-.png --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference A simple end-to-end RAG system built using FastAPI, Hugging Face models, Pinecone vector database, and Cohere reranker. The application allows users to upload text, ask questions, and receive answers grounded in retrieved context with visible citations. chunking Parameters chunk size = 800 overlap = 80 Vector Database Provide: Pinecone Index Dimension : 384 Top-k retrieval k = 10 for matching cosine similarity is used Reranking Provider : Cohere Top-N retrieval after reranking = 5 LLM Provider : Hugging Face (HF) Model: google/flan-t5-small User Interface Built using HTML inside FastAPI title: Mini Rag App sdk: gradio sdk_version: 6.3.0 app_file: app.py Remark: Initially, OpenAI models were used as the LLM for answer generation. However, due to free-tier credit exhaustion and API rate limits, OpenAI models were discontinued. The system was migrated to a free Hugging Face LLM (google/flan-t5-base). Tradeoff observed: Reduction in answer fluency and coherence Occasional shorter or less precise responses