Spaces:
Build error
title: Rag
emoji: 🚀
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit template space
Agentic PDF RAG System
This is a Streamlit-based Retrieval-Augmented Generation (RAG) system that allows users to upload PDF documents, build a knowledge base, and ask questions to retrieve AI-powered answers using Google Gemini and ChromaDB. The system supports document ingestion, querying with source references, and exporting query history as JSON or CSV.
Features
PDF Upload and Ingestion: Upload multiple PDF files, extract text, and store embeddings in a ChromaDB vector store.
Question Answering: Query the knowledge base with natural language questions and get answers powered by Google Gemini 2.5 Flash.
Source Attribution: Optionally display source document snippets for answers.
Query History: View past queries and export them as JSON or CSV.
Knowledge Base Management: Clear the knowledge base and view statistics (e.g., number of documents and chunks).
Prerequisites
A Hugging Face account to deploy the application on Hugging Face Spaces.
A Google Gemini API key for embeddings and language model inference. Obtain one from Google AI Studio.
Python 3.8+ installed locally for testing (optional).
Setup Instructions for Hugging Face Spaces
- Create a New Hugging Face Space
Log in to Hugging Face and navigate to Spaces.
Click Create new Space.
Choose a name for your Space (e.g., agentic-pdf-rag).
Select Streamlit as the framework.
Set visibility (e.g., Public or Private).
Create the Space.
- Clone or Upload the Repository
Clone this repository or upload the following files to your Hugging Face Space:
main.py: The main Streamlit application.
rag_system.py: The RAG system implementation.
pdf_processor.py: PDF text extraction and metadata creation.
export_utils.py: Placeholder for export utilities.
requirements.txt: Dependency list.
README.md: This file (optional for documentation).
Alternatively, fork this repository or upload files manually via the Hugging Face Spaces interface.
- File Contents
Ensure the following files are in the root directory of your Space:
main.py
The main Streamlit application (use the code from your first message). It provides the user interface for uploading PDFs, querying, and viewing history.
rag_system.py
The RAG system implementation (use the code from your second message). It handles document ingestion, embedding, and querying using Google Gemini and ChromaDB.
pdf_processor.py