Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -119,6 +119,57 @@ Everything starts with **one command**:
|
|
| 119 |
bash start.sh
|
| 120 |
|
| 121 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
|
| 124 |
## 🏗️ Architecture Overview
|
|
|
|
| 119 |
bash start.sh
|
| 120 |
|
| 121 |
|
| 122 |
+
## 🧬 Design Choices
|
| 123 |
+
|
| 124 |
+
### 1️⃣ **Microservices instead of Monolithic**
|
| 125 |
+
- Real-world ML systems separate **indexing, embedding, routing, and inference**.
|
| 126 |
+
- Enables **independent scaling**, easier debugging, and service-level isolation.
|
| 127 |
+
- Perfect architecture to demonstrate **system design skills** in interviews.
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
### 2️⃣ **MiniLM Embeddings**
|
| 132 |
+
- ⚡ **Fast on CPU** (optimized for lightweight inference)
|
| 133 |
+
- 🎯 **High semantic quality** for short & long text
|
| 134 |
+
- 🪶 **Small model** → ideal for search engines, mobile, Spaces deployments
|
| 135 |
+
|
| 136 |
+
---
|
| 137 |
+
|
| 138 |
+
### 3️⃣ **FAISS L2 on Normalized Embeddings**
|
| 139 |
+
L2 distance is used instead of cosine because:
|
| 140 |
+
|
| 141 |
+
- 🚀 **FAISS FlatL2 is faster** and more optimized
|
| 142 |
+
- ✨ When vectors are normalized:
|
| 143 |
+
`L2 Distance ≡ Cosine Distance` (mathematically equivalent)
|
| 144 |
+
- 🧩 Avoids the overhead of cosine kernels
|
| 145 |
+
|
| 146 |
+
---
|
| 147 |
+
|
| 148 |
+
### 4️⃣ **Local Embedding Cache**
|
| 149 |
+
- Reduces startup time from **~5 seconds → <1 second**
|
| 150 |
+
- Prevents **re-embedding identical documents**
|
| 151 |
+
- Stores:
|
| 152 |
+
- `embed_meta.json` → filename → hash → index
|
| 153 |
+
- `embeddings.npy` → matrix of stored embeddings
|
| 154 |
+
- Saves compute + makes repeated searches much faster
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
+
|
| 158 |
+
### 5️⃣ **LLM-Driven Explainability**
|
| 159 |
+
- Generates **human-friendly reasoning**
|
| 160 |
+
- Explains **why a document matched your query**
|
| 161 |
+
- Combines:
|
| 162 |
+
- Top semantic-matching sentences
|
| 163 |
+
- Keyword overlap
|
| 164 |
+
- Gemini’s natural-language reasoning
|
| 165 |
+
|
| 166 |
+
---
|
| 167 |
+
|
| 168 |
+
### 6️⃣ **Streamlit for Fast UI**
|
| 169 |
+
- ⚡ Instant reload during development
|
| 170 |
+
- 🎨 Clean layout for Gemini-style cards
|
| 171 |
+
- 🧱 Easy to extend (evaluation panel, metrics, expanders)
|
| 172 |
+
|
| 173 |
|
| 174 |
|
| 175 |
## 🏗️ Architecture Overview
|