Spaces:
Runtime error
Runtime error
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,12 +1,83 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: RAG Document Q&A System
|
| 3 |
+
emoji: π
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# π RAG Document Q&A System
|
| 14 |
+
|
| 15 |
+
A Retrieval-Augmented Generation (RAG) system that answers questions about uploaded PDF documents.
|
| 16 |
+
|
| 17 |
+
## π― What This Does
|
| 18 |
+
|
| 19 |
+
1. **Upload** a PDF document
|
| 20 |
+
2. **Process** the document (chunks it and creates embeddings)
|
| 21 |
+
3. **Ask** questions about the document
|
| 22 |
+
4. **Get** accurate answers with source citations
|
| 23 |
+
|
| 24 |
+
## ποΈ Architecture
|
| 25 |
+
```
|
| 26 |
+
User Question β Embedding β Vector Search β Retrieved Chunks β LLM β Answer
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
| Component | Technology |
|
| 30 |
+
|-----------|------------|
|
| 31 |
+
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 (384 dimensions) |
|
| 32 |
+
| Vector Store | FAISS (Facebook AI Similarity Search) |
|
| 33 |
+
| Text Splitter | RecursiveCharacterTextSplitter (1000 chars, 200 overlap) |
|
| 34 |
+
| LLM | HuggingFaceH4/zephyr-7b-beta via Inference API |
|
| 35 |
+
| Framework | LangChain + Gradio |
|
| 36 |
+
|
| 37 |
+
## π οΈ Development Challenges
|
| 38 |
+
|
| 39 |
+
This project encountered several technical challenges during development:
|
| 40 |
+
|
| 41 |
+
### Challenge 1: LangChain API Changes
|
| 42 |
+
**Problem:** Import errors due to LangChain's package restructuring.
|
| 43 |
+
```python
|
| 44 |
+
# Old (broken)
|
| 45 |
+
from langchain.document_loaders import PyPDFLoader
|
| 46 |
+
from langchain.chains import RetrievalQA
|
| 47 |
+
|
| 48 |
+
# New (working)
|
| 49 |
+
from langchain_community.document_loaders import PyPDFLoader
|
| 50 |
+
# RetrievalQA deprecated β use LCEL chains instead
|
| 51 |
+
```
|
| 52 |
+
**Lesson:** Fast-evolving libraries require checking current documentation.
|
| 53 |
+
|
| 54 |
+
### Challenge 2: PDF Download Issues
|
| 55 |
+
**Problem:** `PdfStreamError: Stream has ended unexpectedly`
|
| 56 |
+
**Cause:** Incomplete download due to missing User-Agent header.
|
| 57 |
+
**Solution:** Added proper headers to HTTP request.
|
| 58 |
+
|
| 59 |
+
### Challenge 3: LLM Response Quality
|
| 60 |
+
**Problem:** FLAN-T5-Large produced fragment-like responses instead of complete answers.
|
| 61 |
+
**Attempted Solutions:**
|
| 62 |
+
1. Adjusted generation parameters β minimal improvement
|
| 63 |
+
2. Modified prompt format β slight improvement
|
| 64 |
+
3. Switched to FLAN-T5-XL β OOM error
|
| 65 |
+
|
| 66 |
+
**Final Solution:** Switched to Zephyr-7B-beta, which produces comprehensive answers.
|
| 67 |
+
|
| 68 |
+
## π Limitations
|
| 69 |
+
|
| 70 |
+
- Only processes PDF documents
|
| 71 |
+
- English language only
|
| 72 |
+
- Free Inference API has rate limits
|
| 73 |
+
|
| 74 |
+
## π€ Author
|
| 75 |
+
|
| 76 |
+
[Nav772](https://huggingface.co/Nav772) - Built as part of AI Engineering portfolio
|
| 77 |
+
|
| 78 |
+
## π Related Projects
|
| 79 |
+
|
| 80 |
+
- [Movie Sentiment Analyzer](https://huggingface.co/spaces/Nav772/movie-sentiment-analyzer)
|
| 81 |
+
- [Amazon Review Rating Predictor](https://huggingface.co/spaces/Nav772/amazon-review-rating-predictor)
|
| 82 |
+
- [Food Image Classifier](https://huggingface.co/spaces/Nav772/food-image-classifier)
|
| 83 |
+
- [Sentiment Model Comparison](https://huggingface.co/spaces/Nav772/sentiment-model-comparison)
|