---
title: Mini Rag App
emoji: 📈
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/696cb435ea65e4b95276706e/yKmxaQF3FkZuUQgDM3pk-.png
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


A simple end-to-end RAG system built using FastAPI, Hugging Face models, Pinecone vector database, and Cohere reranker.
The application allows users to upload text, ask questions, and receive answers grounded in retrieved context with visible citations.

chunking Parameters
chunk size = 800
overlap = 80


Vector Database
Provide: Pinecone
Index Dimension : 384

Top-k retrieval k = 10
for matching cosine similarity is used

Reranking
Provider : Cohere
Top-N  retrieval after reranking = 5

LLM
Provider : Hugging Face (HF)
Model: google/flan-t5-small

User Interface
Built using HTML inside FastAPI


title: Mini Rag App
sdk: gradio
sdk_version: 6.3.0
app_file: app.py


Remark:
Initially, OpenAI models were used as the LLM for answer generation. However, due to free-tier credit exhaustion and API rate limits, OpenAI models were discontinued.
The system was migrated to a free Hugging Face LLM (google/flan-t5-base).
Tradeoff observed:
Reduction in answer fluency and coherence
Occasional shorter or less precise responses