Spaces:
Sleeping
Sleeping
Hamid Omarov
commited on
Commit
Β·
22f4b8b
1
Parent(s):
0ebeee2
Add Day 3 README
Browse files
README.md
CHANGED
|
@@ -61,6 +61,24 @@ Check commits and folders daily to follow the sprint. Each folder corresponds to
|
|
| 61 |
|
| 62 |
> π£ One day down, 29 to go. Keep shipping.
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
## π¬ Contact
|
| 66 |
|
|
|
|
| 61 |
|
| 62 |
> π£ One day down, 29 to go. Keep shipping.
|
| 63 |
|
| 64 |
+
## Day 3: First RAG System β
|
| 65 |
+
|
| 66 |
+
### What I Built
|
| 67 |
+
- PDF processing pipeline (loader + optimal chunker)
|
| 68 |
+
- Compared 3 chunking strategies (fixed, recursive, token)
|
| 69 |
+
- ChromaDB vector storage (persistent)
|
| 70 |
+
- SentenceTransformer embeddings (MiniLM)
|
| 71 |
+
- Gradio chat interface (upload PDF β ask)
|
| 72 |
+
- Deployment on Hugging Face Spaces
|
| 73 |
+
|
| 74 |
+
### Key Learnings
|
| 75 |
+
- Fixed vs Recursive vs Token-based chunking trade-offs
|
| 76 |
+
- Embedding format must be list[list[float]] for Chroma
|
| 77 |
+
- New Chroma API uses `PersistentClient`
|
| 78 |
+
- Prompt design: extractive answers + fallback
|
| 79 |
+
|
| 80 |
+
### Live Demo
|
| 81 |
+
π [HuggingFace Space Link](https://didactic-winner-q7g79xg9gp4626w56-7860.app.github.dev/)
|
| 82 |
|
| 83 |
## π¬ Contact
|
| 84 |
|