Rag / README.md
andrewammann's picture
Update README.md
d53b6eb verified
metadata
title: Rag
emoji: 🚀
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Streamlit template space

Agentic PDF RAG System

This is a Streamlit-based Retrieval-Augmented Generation (RAG) system that allows users to upload PDF documents, build a knowledge base, and ask questions to retrieve AI-powered answers using Google Gemini and ChromaDB. The system supports document ingestion, querying with source references, and exporting query history as JSON or CSV.

Features

PDF Upload and Ingestion: Upload multiple PDF files, extract text, and store embeddings in a ChromaDB vector store.

Question Answering: Query the knowledge base with natural language questions and get answers powered by Google Gemini 2.5 Flash.

Source Attribution: Optionally display source document snippets for answers.

Query History: View past queries and export them as JSON or CSV.

Knowledge Base Management: Clear the knowledge base and view statistics (e.g., number of documents and chunks).

Prerequisites

A Hugging Face account to deploy the application on Hugging Face Spaces.

A Google Gemini API key for embeddings and language model inference. Obtain one from Google AI Studio.

Python 3.8+ installed locally for testing (optional).

Setup Instructions for Hugging Face Spaces

  1. Create a New Hugging Face Space

Log in to Hugging Face and navigate to Spaces.

Click Create new Space.

Choose a name for your Space (e.g., agentic-pdf-rag).

Select Streamlit as the framework.

Set visibility (e.g., Public or Private).

Create the Space.

  1. Clone or Upload the Repository

Clone this repository or upload the following files to your Hugging Face Space:

main.py: The main Streamlit application.

rag_system.py: The RAG system implementation.

pdf_processor.py: PDF text extraction and metadata creation.

export_utils.py: Placeholder for export utilities.

requirements.txt: Dependency list.

README.md: This file (optional for documentation).

Alternatively, fork this repository or upload files manually via the Hugging Face Spaces interface.

  1. File Contents

Ensure the following files are in the root directory of your Space:

main.py

The main Streamlit application (use the code from your first message). It provides the user interface for uploading PDFs, querying, and viewing history.

rag_system.py

The RAG system implementation (use the code from your second message). It handles document ingestion, embedding, and querying using Google Gemini and ChromaDB.

pdf_processor.py