Spaces:
Running
A newer version of the Gradio SDK is available: 6.16.0
title: Mini Rag App
emoji: 📈
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 6.3.0
app_file: app.py
pinned: false
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/696cb435ea65e4b95276706e/yKmxaQF3FkZuUQgDM3pk-.png
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
A simple end-to-end RAG system built using FastAPI, Hugging Face models, Pinecone vector database, and Cohere reranker. The application allows users to upload text, ask questions, and receive answers grounded in retrieved context with visible citations.
chunking Parameters chunk size = 800 overlap = 80
Vector Database Provide: Pinecone Index Dimension : 384
Top-k retrieval k = 10 for matching cosine similarity is used
Reranking Provider : Cohere Top-N retrieval after reranking = 5
LLM Provider : Hugging Face (HF) Model: google/flan-t5-small
User Interface Built using HTML inside FastAPI
title: Mini Rag App sdk: gradio sdk_version: 6.3.0 app_file: app.py
Remark: Initially, OpenAI models were used as the LLM for answer generation. However, due to free-tier credit exhaustion and API rate limits, OpenAI models were discontinued. The system was migrated to a free Hugging Face LLM (google/flan-t5-base). Tradeoff observed: Reduction in answer fluency and coherence Occasional shorter or less precise responses