File size: 1,615 Bytes
fa5051c
 
 
 
 
 
f707f57
fa5051c
 
 
 
 
f707f57
36eac52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
title: PDF Q&A (Gemini RAG)
emoji: 🧠
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: "5.49.1"
app_file: app.py
pinned: false
---



# PDF Q&A (RAG) with Gemini 2.5 Flash — Hugging Face Space

This Space lets you upload PDFs and ask questions. It uses:
- **LangChain** text splitters (document-specific splitting for Markdown/Python/JS, plus a generic recursive splitter).
- **FAISS** for vector search.
- **Gemini** for **embeddings** (`text-embedding-004`) and **generation** (`gemini-2.5-flash`).

## Quick start (on Hugging Face)
1. Create a new **Space****Gradio (Python)**.
2. Add these files: `app.py`, `requirements.txt`, and this `README.md`.
3. In the Space, go to **Settings → Variables and secrets** and add:
   - Key: `GEMINI_API_KEY`
   - Value: *your Gemini API key* (do **not** commit it in code).
4. Click **Restart** to build & launch the Space.

## Local dev
```bash
python -m venv .venv
source .venv/bin/activate  # on Windows: .venv\Scripts\activate
pip install -r requirements.txt
export GEMINI_API_KEY="YOUR_KEY_HERE"
python app.py
```

## Notes
- The app will try the new `from google import genai` client first, then fall back to the legacy `google-generativeai` package.
- The document splitting logic is heuristic-based:
  - Markdown style content → `MarkdownTextSplitter`
  - Python-like content → `PythonCodeTextSplitter`
  - JavaScript-like content → `RecursiveCharacterTextSplitter.from_language(Language.JS, ...)`
  - Otherwise → `RecursiveCharacterTextSplitter`
- If an answer is not in the context, the model is instructed to say it doesn't know.