File size: 4,092 Bytes
927e0d3 02d318a 927e0d3 3a745a5 02d318a 3a745a5 02d318a 3a745a5 02d318a 3a745a5 02d318a 3a745a5 02d318a 3a745a5 02d318a 3a745a5 02d318a 3a745a5 02d318a 3a745a5 02d318a 3a745a5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
---
title: Agentic Document Intelligence
emoji: 📄
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
license: apache-2.0
---
# 📄 Agentic Document Intelligence
### PDF RAG with Together.ai
This Hugging Face Space demonstrates a **Retrieval-Augmented Generation (RAG)** system that allows users to upload a PDF and ask questions that are **strictly grounded in the document content**.
The Space serves as a **foundational Agentic Document Intelligence component**, designed to be simple, transparent, and extensible.
---
## 🚀 What This Space Does
- Upload a PDF document
- Build a semantic index using embeddings + FAISS
- Ask natural-language questions
- Receive answers grounded only in the uploaded document
- View retrieved source passages for transparency
---
## 🧠 Architecture Overview
1. **PDF Ingestion**
- Extracts text from uploaded PDF
- Cleans and normalizes content
2. **Chunking**
- Splits text into overlapping semantic chunks
- Ensures contextual continuity
3. **Vector Indexing**
- Generates embeddings using Sentence Transformers
- Indexes vectors using FAISS (cosine similarity)
4. **Retrieval**
- Retrieves top-K relevant chunks for each query
5. **Generation (RAG)**
- Injects retrieved context into LLM prompt
- Uses Together.ai (Mixtral) for answer generation
---
## ▶️ How to Use This Space (End-to-End)
### **Step 1: Upload a PDF**
- Click **“Upload PDF”**
- Select a text-based PDF file
> ⚠️ Note: Scanned PDFs without text extraction will not work unless OCR is applied.
---
### **Step 2: Wait for Indexing**
- The system will:
- extract text
- split it into chunks
- build a FAISS vector index
- You will see a confirmation message:
---
### **Step 3: Ask a Question**
- Type a natural-language question related to the document
Examples:
- *“Summarize the document”*
- *“What is the main contribution?”*
- *“Explain the methodology section”*
---
### **Step 4: Receive the Answer**
You will get:
- ✅ A generated answer based **only on document context**
- 📌 Retrieved source passages with similarity scores
- 🚫 No hallucinated or external information
If the answer is not present in the document, the system will respond:
---
### **Step 3: Ask a Question**
- Type a natural-language question related to the document
Examples:
- *“Summarize the document”*
- *“What is the main contribution?”*
- *“Explain the methodology section”*
---
### **Step 4: Receive the Answer**
You will get:
- ✅ A generated answer based **only on document context**
- 📌 Retrieved source passages with similarity scores
- 🚫 No hallucinated or external information
If the answer is not present in the document, the system will respond:
---
## 🤖 Models Used
### **Language Model**
- **Provider:** Together.ai
- **Model:** `mistralai/Mixtral-8x7B-Instruct-v0.1`
### **Embedding Model**
- `sentence-transformers/all-MiniLM-L6-v2`
---
## 🧰 Tech Stack
- Python
- Gradio (UI)
- FAISS (vector search)
- Sentence Transformers (embeddings)
- Together.ai (LLM)
- Hugging Face Spaces
---
## 🔐 Environment Configuration (For Developers)
### **Secrets**
- `TOGETHER_API_KEY` → Together.ai API key
- `OPENAI_API_KEY` → Same value (compatibility with OpenAI client)
### **Variables**
- `TOGETHER_MODEL` → `mistralai/Mixtral-8x7B-Instruct-v0.1`
- `TOGETHER_BASE_URL` → `https://api.together.xyz/v1`
---
## 🧩 Intended Use Cases
- Research paper Q&A
- Technical documentation assistants
- Internal knowledge bases
- RAG pipeline reference implementation
- Agentic AI system foundations
---
## 🔮 Future Enhancements
- Multi-PDF support
- Chat memory
- Streaming responses
- Agent routing & tool usage
- Evaluation and scoring agents
---
## 🙌 Author
Built by **Abhishek Prithvi Teja**
Focused on **Agentic AI, RAG systems, and applied LLM engineering**
---
## 🏷️ Tags
`rag` · `agentic-ai` · `document-qa` · `faiss` · `together-ai` · `huggingface-spaces`
|