prithvi1029's picture
Update README.md
3a745a5 verified
---
title: Agentic Document Intelligence
emoji: 📄
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
license: apache-2.0
---
# 📄 Agentic Document Intelligence
### PDF RAG with Together.ai
This Hugging Face Space demonstrates a **Retrieval-Augmented Generation (RAG)** system that allows users to upload a PDF and ask questions that are **strictly grounded in the document content**.
The Space serves as a **foundational Agentic Document Intelligence component**, designed to be simple, transparent, and extensible.
---
## 🚀 What This Space Does
- Upload a PDF document
- Build a semantic index using embeddings + FAISS
- Ask natural-language questions
- Receive answers grounded only in the uploaded document
- View retrieved source passages for transparency
---
## 🧠 Architecture Overview
1. **PDF Ingestion**
- Extracts text from uploaded PDF
- Cleans and normalizes content
2. **Chunking**
- Splits text into overlapping semantic chunks
- Ensures contextual continuity
3. **Vector Indexing**
- Generates embeddings using Sentence Transformers
- Indexes vectors using FAISS (cosine similarity)
4. **Retrieval**
- Retrieves top-K relevant chunks for each query
5. **Generation (RAG)**
- Injects retrieved context into LLM prompt
- Uses Together.ai (Mixtral) for answer generation
---
## ▶️ How to Use This Space (End-to-End)
### **Step 1: Upload a PDF**
- Click **“Upload PDF”**
- Select a text-based PDF file
> ⚠️ Note: Scanned PDFs without text extraction will not work unless OCR is applied.
---
### **Step 2: Wait for Indexing**
- The system will:
- extract text
- split it into chunks
- build a FAISS vector index
- You will see a confirmation message:
---
### **Step 3: Ask a Question**
- Type a natural-language question related to the document
Examples:
- *“Summarize the document”*
- *“What is the main contribution?”*
- *“Explain the methodology section”*
---
### **Step 4: Receive the Answer**
You will get:
- ✅ A generated answer based **only on document context**
- 📌 Retrieved source passages with similarity scores
- 🚫 No hallucinated or external information
If the answer is not present in the document, the system will respond:
---
### **Step 3: Ask a Question**
- Type a natural-language question related to the document
Examples:
- *“Summarize the document”*
- *“What is the main contribution?”*
- *“Explain the methodology section”*
---
### **Step 4: Receive the Answer**
You will get:
- ✅ A generated answer based **only on document context**
- 📌 Retrieved source passages with similarity scores
- 🚫 No hallucinated or external information
If the answer is not present in the document, the system will respond:
---
## 🤖 Models Used
### **Language Model**
- **Provider:** Together.ai
- **Model:** `mistralai/Mixtral-8x7B-Instruct-v0.1`
### **Embedding Model**
- `sentence-transformers/all-MiniLM-L6-v2`
---
## 🧰 Tech Stack
- Python
- Gradio (UI)
- FAISS (vector search)
- Sentence Transformers (embeddings)
- Together.ai (LLM)
- Hugging Face Spaces
---
## 🔐 Environment Configuration (For Developers)
### **Secrets**
- `TOGETHER_API_KEY` → Together.ai API key
- `OPENAI_API_KEY` → Same value (compatibility with OpenAI client)
### **Variables**
- `TOGETHER_MODEL``mistralai/Mixtral-8x7B-Instruct-v0.1`
- `TOGETHER_BASE_URL``https://api.together.xyz/v1`
---
## 🧩 Intended Use Cases
- Research paper Q&A
- Technical documentation assistants
- Internal knowledge bases
- RAG pipeline reference implementation
- Agentic AI system foundations
---
## 🔮 Future Enhancements
- Multi-PDF support
- Chat memory
- Streaming responses
- Agent routing & tool usage
- Evaluation and scoring agents
---
## 🙌 Author
Built by **Abhishek Prithvi Teja**
Focused on **Agentic AI, RAG systems, and applied LLM engineering**
---
## 🏷️ Tags
`rag` · `agentic-ai` · `document-qa` · `faiss` · `together-ai` · `huggingface-spaces`