Spaces:
Sleeping
Sleeping
Merge remote-tracking branch 'hf/main'
Browse files- README.md +12 -83
- requirements .txt +0 -6
README.md
CHANGED
|
@@ -1,86 +1,15 @@
|
|
| 1 |
-
# RAG 30 Days Sprint ๐
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|-----|--------|------------------------|--------|
|
| 9 |
-
| 1 | day1 | Hello world test file | โ
|
|
| 10 |
-
| 2 | day2 | TBD | โณ |
|
| 11 |
-
| ... | ... | ... | ... |
|
| 12 |
-
|
| 13 |
-
## ๐ Folder Structure
|
| 14 |
-
|
| 15 |
-
rag-30-days/
|
| 16 |
-
โ
|
| 17 |
-
โโโ day1/
|
| 18 |
-
โ โโโ hello_ai.py
|
| 19 |
-
โ
|
| 20 |
-
โโโ README.md
|
| 21 |
-
|
| 22 |
-
markdown
|
| 23 |
-
Copy
|
| 24 |
-
Edit
|
| 25 |
-
|
| 26 |
-
## ๐ง Goal
|
| 27 |
-
|
| 28 |
-
To build a production-ready RAG pipeline in 30 days and land a remote AI job by the end of the sprint.
|
| 29 |
-
|
| 30 |
-
## ๐ ๏ธ Tools
|
| 31 |
-
|
| 32 |
-
- Python
|
| 33 |
-
- LangChain
|
| 34 |
-
- ChromaDB / Weaviate / FAISS
|
| 35 |
-
- OpenAI API
|
| 36 |
-
- Streamlit (optional UI)
|
| 37 |
-
- Git & GitHub
|
| 38 |
-
|
| 39 |
-
## ๐ Progress
|
| 40 |
-
|
| 41 |
-
Check commits and folders daily to follow the sprint. Each folder corresponds to 1 day of learning and building.
|
| 42 |
-
|
| 43 |
-
## ๐
Day 1 โ Getting Started with Python & Flask
|
| 44 |
-
|
| 45 |
-
### โ
What I Learned
|
| 46 |
-
- Refreshed core **Python basics** (variables, functions, classes, etc.)
|
| 47 |
-
- Built my first **Flask API** with real-world JSON responses
|
| 48 |
-
- Practiced structured coding with **Copilot assistance**
|
| 49 |
-
|
| 50 |
-
### ๐ ๏ธ What I Built
|
| 51 |
-
- `hello_ai.py`: A minimal Python script to print a welcome message
|
| 52 |
-
- `api.py`: A Flask application with 3 endpoints:
|
| 53 |
-
- `/hello`: greeting message
|
| 54 |
-
- `/calculate`: accepts 2 numbers (POST) and returns their sum
|
| 55 |
-
- `/ai-ready`: motivational message for AI learning
|
| 56 |
-
|
| 57 |
-
### ๐ฎ Tomorrow's Plan
|
| 58 |
-
- Begin **LangChain** setup and environment configuration
|
| 59 |
-
- Start working on **RAG-based document processing**
|
| 60 |
-
- Set up folder structure and `day2` workflow
|
| 61 |
-
|
| 62 |
-
> ๐ฃ One day down, 29 to go. Keep shipping.
|
| 63 |
-
|
| 64 |
-
## Day 3: First RAG System โ
|
| 65 |
-
|
| 66 |
-
### What I Built
|
| 67 |
-
- PDF processing pipeline (loader + optimal chunker)
|
| 68 |
-
- Compared 3 chunking strategies (fixed, recursive, token)
|
| 69 |
-
- ChromaDB vector storage (persistent)
|
| 70 |
-
- SentenceTransformer embeddings (MiniLM)
|
| 71 |
-
- Gradio chat interface (upload PDF โ ask)
|
| 72 |
-
- Deployment on Hugging Face Spaces
|
| 73 |
-
|
| 74 |
-
### Key Learnings
|
| 75 |
-
- Fixed vs Recursive vs Token-based chunking trade-offs
|
| 76 |
-
- Embedding format must be list[list[float]] for Chroma
|
| 77 |
-
- New Chroma API uses `PersistentClient`
|
| 78 |
-
- Prompt design: extractive answers + fallback
|
| 79 |
-
|
| 80 |
-
### Live Demo
|
| 81 |
-
๐ [HuggingFace Space Link](https://didactic-winner-q7g79xg9gp4626w56-7860.app.github.dev/)
|
| 82 |
-
|
| 83 |
-
## ๐ฌ Contact
|
| 84 |
-
|
| 85 |
-
Made by [Hamid Omarov](https://www.linkedin.com/in/hamidomarov)
|
| 86 |
-
Check out my portfolio: [Notion Page](https://www.notion.so/AI-Content-Factory-Operations-2400a72a724c8050b5c6ddc0e6a0a77d)
|
|
|
|
|
|
|
| 1 |
|
| 2 |
+
---
|
| 3 |
+
title: PDF RAG (Chroma + Groq)
|
| 4 |
+
emoji: ๐
|
| 5 |
+
colorFrom: indigo
|
| 6 |
+
colorTo: green
|
| 7 |
+
sdk: gradio
|
| 8 |
+
sdk_version: "4.44.0"
|
| 9 |
+
app_file: app.py
|
| 10 |
+
pinned: false
|
| 11 |
+
---
|
| 12 |
|
| 13 |
+
# PDF RAG (Chroma + Groq)
|
| 14 |
|
| 15 |
+
Upload a PDF and ask questions. Uses ChromaDB for retrieval and Groq LLM for answers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements .txt
DELETED
|
@@ -1,6 +0,0 @@
|
|
| 1 |
-
gradio
|
| 2 |
-
chromadb
|
| 3 |
-
sentence-transformers
|
| 4 |
-
langchain-groq
|
| 5 |
-
pypdf
|
| 6 |
-
python-dotenv
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|