--- title: PDF Q&A (Gemini RAG) emoji: 🧠 colorFrom: yellow colorTo: purple sdk: gradio sdk_version: "5.49.1" app_file: app.py pinned: false --- # PDF Q&A (RAG) with Gemini 2.5 Flash — Hugging Face Space This Space lets you upload PDFs and ask questions. It uses: - **LangChain** text splitters (document-specific splitting for Markdown/Python/JS, plus a generic recursive splitter). - **FAISS** for vector search. - **Gemini** for **embeddings** (`text-embedding-004`) and **generation** (`gemini-2.5-flash`). ## Quick start (on Hugging Face) 1. Create a new **Space** → **Gradio (Python)**. 2. Add these files: `app.py`, `requirements.txt`, and this `README.md`. 3. In the Space, go to **Settings → Variables and secrets** and add: - Key: `GEMINI_API_KEY` - Value: *your Gemini API key* (do **not** commit it in code). 4. Click **Restart** to build & launch the Space. ## Local dev ```bash python -m venv .venv source .venv/bin/activate # on Windows: .venv\Scripts\activate pip install -r requirements.txt export GEMINI_API_KEY="YOUR_KEY_HERE" python app.py ``` ## Notes - The app will try the new `from google import genai` client first, then fall back to the legacy `google-generativeai` package. - The document splitting logic is heuristic-based: - Markdown style content → `MarkdownTextSplitter` - Python-like content → `PythonCodeTextSplitter` - JavaScript-like content → `RecursiveCharacterTextSplitter.from_language(Language.JS, ...)` - Otherwise → `RecursiveCharacterTextSplitter` - If an answer is not in the context, the model is instructed to say it doesn't know.