andrewammann commited on
Commit
d53b6eb
·
verified ·
1 Parent(s): d9cb38e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -4
README.md CHANGED
@@ -11,9 +11,122 @@ pinned: false
11
  short_description: Streamlit template space
12
  ---
13
 
14
- # Welcome to Streamlit!
15
 
16
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
17
 
18
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
19
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  short_description: Streamlit template space
12
  ---
13
 
14
+ Agentic PDF RAG System
15
 
16
+ This is a Streamlit-based Retrieval-Augmented Generation (RAG) system that allows users to upload PDF documents, build a knowledge base, and ask questions to retrieve AI-powered answers using Google Gemini and ChromaDB. The system supports document ingestion, querying with source references, and exporting query history as JSON or CSV.
17
 
18
+ Features
19
+
20
+
21
+
22
+
23
+
24
+ PDF Upload and Ingestion: Upload multiple PDF files, extract text, and store embeddings in a ChromaDB vector store.
25
+
26
+
27
+
28
+ Question Answering: Query the knowledge base with natural language questions and get answers powered by Google Gemini 2.5 Flash.
29
+
30
+
31
+
32
+ Source Attribution: Optionally display source document snippets for answers.
33
+
34
+
35
+
36
+ Query History: View past queries and export them as JSON or CSV.
37
+
38
+
39
+
40
+ Knowledge Base Management: Clear the knowledge base and view statistics (e.g., number of documents and chunks).
41
+
42
+ Prerequisites
43
+
44
+
45
+
46
+
47
+
48
+ A Hugging Face account to deploy the application on Hugging Face Spaces.
49
+
50
+
51
+
52
+ A Google Gemini API key for embeddings and language model inference. Obtain one from Google AI Studio.
53
+
54
+
55
+
56
+ Python 3.8+ installed locally for testing (optional).
57
+
58
+ Setup Instructions for Hugging Face Spaces
59
+
60
+ 1. Create a New Hugging Face Space
61
+
62
+
63
+
64
+
65
+
66
+ Log in to Hugging Face and navigate to Spaces.
67
+
68
+
69
+
70
+ Click Create new Space.
71
+
72
+
73
+
74
+ Choose a name for your Space (e.g., agentic-pdf-rag).
75
+
76
+
77
+
78
+ Select Streamlit as the framework.
79
+
80
+
81
+
82
+ Set visibility (e.g., Public or Private).
83
+
84
+
85
+
86
+ Create the Space.
87
+
88
+ 2. Clone or Upload the Repository
89
+
90
+ Clone this repository or upload the following files to your Hugging Face Space:
91
+
92
+
93
+
94
+
95
+
96
+ main.py: The main Streamlit application.
97
+
98
+
99
+
100
+ rag_system.py: The RAG system implementation.
101
+
102
+
103
+
104
+ pdf_processor.py: PDF text extraction and metadata creation.
105
+
106
+
107
+
108
+ export_utils.py: Placeholder for export utilities.
109
+
110
+
111
+
112
+ requirements.txt: Dependency list.
113
+
114
+
115
+
116
+ README.md: This file (optional for documentation).
117
+
118
+ Alternatively, fork this repository or upload files manually via the Hugging Face Spaces interface.
119
+
120
+ 3. File Contents
121
+
122
+ Ensure the following files are in the root directory of your Space:
123
+
124
+ main.py
125
+
126
+ The main Streamlit application (use the code from your first message). It provides the user interface for uploading PDFs, querying, and viewing history.
127
+
128
+ rag_system.py
129
+
130
+ The RAG system implementation (use the code from your second message). It handles document ingestion, embedding, and querying using Google Gemini and ChromaDB.
131
+
132
+ pdf_processor.py