File size: 10,222 Bytes
479bd64
 
 
 
 
 
 
 
 
 
 
fea62df
 
9847166
fea62df
 
 
 
 
9847166
fea62df
 
 
 
9847166
fea62df
 
 
 
 
 
9847166
 
fea62df
 
 
9847166
fea62df
 
 
 
 
 
 
9847166
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fea62df
9847166
fea62df
9847166
 
fea62df
 
bf67a97
 
 
9847166
 
 
 
 
 
 
 
 
 
bf67a97
 
 
fea62df
 
 
 
9847166
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fea62df
9847166
fea62df
9847166
fea62df
9847166
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fea62df
9847166
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51ae485
 
9847166
 
fea62df
 
 
9847166
 
 
 
 
fea62df
9847166
fea62df
9847166
 
 
 
fea62df
 
 
9847166
fea62df
9847166
 
 
fea62df
 
 
 
 
9847166
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
---
title: Api Embedding
emoji: 🐠
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# 🧠 Unified Embedding API

> 🧩 Unified API for all your Embedding, Sparse & Reranking Models β€” plug and play with any model from Hugging Face or your own fine-tuned versions.

---

## πŸš€ Overview

**Unified Embedding API** is a modular and open-source **RAG-ready API** built for developers who want a simple, unified way to access **dense**, **sparse**, and **reranking** models.

It’s designed for **vector search**, **semantic retrieval**, and **AI-powered pipelines** β€” all controlled from a single `config.yaml` file.

⚠️ **Note:** This is a development API.  
For production deployment, host it on cloud platforms such as **Hugging Face TEI**, **AWS**, **GCP**, or any cloud provider of your choice.

---

## 🧩 Features

- 🧠 **Unified Interface** β€” One API to handle dense, sparse, and reranking models.
- ⚑ **Batch Processing** β€” Automatic single/batch.
- πŸ”§ **Flexible Parameters** β€” Full control via kwargs and options
- πŸ” **Vector DB Ready** β€” Easily integrates with FAISS, Chroma, Qdrant, Milvus, etc.
- πŸ“ˆ **RAG Support** β€” Perfect base for Retrieval-Augmented Generation systems.
- ⚑ **Fast & Lightweight** β€” Powered by FastAPI and optimized with async processing.
- 🧰 **Extendable** β€”  Switch models instantly via `config.yaml` and add your own models or pipelines effortlessly.

---

## πŸ“ Project Structure

```
unified-embedding-api/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ dependencies.py
β”‚   β”‚   └── routes/
β”‚   β”‚       β”œβ”€β”€ embeddings.py  # endpoint sparse & dense   
β”‚   β”‚       β”œβ”€β”€ models.py
β”‚   β”‚       |── health.py
β”‚   β”‚       └── rerank.py       # endpoint reranking
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ base.py
β”‚   β”‚   β”œβ”€β”€ config.py
β”‚   β”‚   β”œβ”€β”€ exceptions.py
β”‚   β”‚   └── manager.py
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ embeddings/
β”‚   β”‚   β”‚   β”œβ”€β”€ dense.py        # dense model
β”‚   β”‚   β”‚   └── sparse.py       # sparse model
β”‚   β”‚   β”‚   └── rank.py         # reranking model
β”‚   β”‚   └── schemas/
β”‚   β”‚       β”œβ”€β”€ common.py
β”‚   β”‚       β”œβ”€β”€ requests.py       
β”‚   β”‚       └── responses.py
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   β”œβ”€β”€ settings.py
β”‚   β”‚   └── models.yaml         # add/change models here
β”‚   └── utils/
β”‚       β”œβ”€β”€ logger.py
β”‚       └── validators.py
β”‚
β”œβ”€β”€ app.py                         
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
β”œβ”€β”€ Dockerfile
└── README.md
```
---
## 🧩 Model Selection

Default configuration is optimized for **CPU 2vCPU / 16GB RAM**. See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for model recommendations and memory usage reference.

**Add More Models:** Edit `src/config/models.yaml`

```yaml
models:
  your-model-name:
    name: "org/model-name"
    type: "embeddings"  # or "sparse-embeddings" or "rerank"
```

⚠️ If you plan to use larger models like `Qwen2-embedding-8B`, please upgrade your Space.

---

## ☁️ How to Deploy (Free πŸš€)

Deploy your **Custom Embedding API** on **Hugging Face Spaces** β€” free, fast, and serverless.

### **1️⃣ Deploy on Hugging Face Spaces (Free!)**

1. **Duplicate this Space:**  
   πŸ‘‰ [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)  
   Click **β‹―** (three dots) β†’ **Duplicate this Space**

2. **Add HF_TOKEN environment variable**  Make sure your space is public

3. **Clone your Space locally:**  
   Click **β‹―** β†’ **Clone repository**
   ```bash
   git clone https://huggingface.co/spaces/YOUR_USERNAME/api-embedding
   cd api-embedding
   ```

4. **Edit `src/config/models.yaml`** to customize models:
   ```yaml
   models:
     your-model:
       name: "org/model-name"
       type: "embeddings"  # or "sparse-embeddings" or "rerank"
   ```

5. **Commit and push changes:**
   ```bash
   git add src/config/models.yaml
   git commit -m "Update models configuration"
   git push
   ```

6. **Access your API:**
  Click **β‹―** β†’  **Embed this Space** -> copy **Direct URL**
   ```
   https://YOUR_USERNAME-api-embedding.hf.space
   https://YOUR_USERNAME-api-embedding.hf.space/docs  # Interactive docs
   ```

That’s it! You now have a live embedding API endpoint powered by your models.

### **2️⃣ Run Locally (NOT RECOMMENDED)**

```bash
# Clone repository
git clone https://github.com/fahmiaziz98/unified-embedding-api.git
cd unified-embedding-api

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run server
python app.py
```

API available at: `http://localhost:7860`

### **3️⃣ Run with Docker**

```bash
# Build and run
docker-compose up --build

# Or with Docker only
docker build -t embedding-api .
docker run -p 7860:7860 embedding-api
```

## πŸ“– Usage Examples

### **Python**

```python
import requests

url = "http://localhost:7860/api/v1/embeddings/embed"

# Single embedding
response = requests.post(url, json={
    "texts": ["What is artificial intelligence?"],
    "model_id": "qwen3-0.6b"
})
print(response.json())

# Batch embeddings
response = requests.post(url, json={
    "texts": [
        "First document",
        "Second document", 
        "Third document"
    ],
    "model_id": "qwen3-0.6b",
    "options": {
        "normalize_embeddings": True
    }
})
embeddings = response.json()["embeddings"]
```

### **cURL**

```bash
# Single embedding (Dense)
curl -X POST "http://localhost:7860/api/v1/embeddings/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["Hello world"],
    "prompt": "add instructions here",
    "model_id": "qwen3-0.6b"
  }'

# Batch embeddings (Sparse)
curl -X POST "http://localhost:7860/api/v1/embeddings/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["First doc", "Second doc", "Third doc"],
    "model_id": "splade-pp-v2"
  }'

# Reranking
curl -X POST "http://localhost:7860/api/v1/rerank" \
  -H "Content-Type: application/json" \
  -d '{
  "documents": [
    "Python is a popular language for data science due to its extensive libraries.",
    "R is widely used in statistical computing and data analysis.",
    "Java is a versatile language used in various applications, including data science.",
    "SQL is essential for managing and querying relational databases.",
    "Julia is a high-performance language gaining popularity for numerical computing and data science."
  ],
  "model_id": "bge-v2-m3",
  "query": "Python best programming languages for data science",
  "top_k": 3
}'

# Query embedding with options
curl -X POST "http://localhost:7860/api/v1/embeddings/query" \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["What is machine learning?"],
    "model_id": "qwen3-0.6b",
    "options": {
      "normalize_embeddings": true,
      "batch_size": 32
    }
  }'
```

### **JavaScript/TypeScript**

```typescript
const url = "http://localhost:7860/api/v1/embeddings/embed";

const response = await fetch(url, {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    texts: ["Hello world"],
    model_id: "qwen3-0.6b",
  }),
});

const data = await response.json();
console.log(data.embedding);
```

---

## πŸ“Š API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/v1/embeddings/embed` | POST | Generate document embeddings (single/batch) |
| `/api/v1/embeddings/query` | POST | Generate query embeddings (single/batch) |
| `/api/v1/rerank` | POST | Rerank documents based on a query |
| `/api/v1/models` | GET | List available models |
| `/api/v1/models/{model_id}` | GET | Get model information |
| `/health` | GET | Health check |
| `/` | GET | API information |
| `/docs` | GET | Interactive API documentation |


### 🀝 Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

**Development Setup:**

```bash
git clone https://github.com/fahmiaziz/unified-embedding-api.git
cd unified-embedding-api
pip install -r requirements-dev.txt
pre-commit install  # (optional)
```

---

## πŸ“š Resources

- [API Documentation](API.md)
- [Sentence Transformers](https://www.sbert.net/)
- [FastAPI Docs](https://fastapi.tiangolo.com/)
- [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
- [Hugging Face Spaces](https://huggingface.co/docs/hub/spaces)
- [Deploy Applications on Hugging Face Spaces (Official Guide)](https://huggingface.co/blog/HemanthSai7/deploy-applications-on-huggingface-spaces)
- [How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository by Ruslanmv](https://github.com/ruslanmv/How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository?tab=readme-ov-file)
- [Duplicate & Clone space to local machine](https://huggingface.co/docs/hub/spaces-overview#duplicating-a-space)
---

---

## πŸ“ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## πŸ™ Acknowledgments

- **Sentence Transformers** for the embedding models
- **FastAPI** for the excellent web framework
- **Hugging Face** for model hosting and Spaces
- **Open Source Community** for inspiration and support

---

## πŸ“ž Support

- **Issues:** [GitHub Issues](https://github.com/fahmiaziz/unified-embedding-api/issues)
- **Discussions:** [GitHub Discussions](https://github.com/fahmiaziz/unified-embedding-api/discussions)
- **Hugging Face Space:** [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)

---

> ✨ β€œUnify your embeddings. Simplify your AI stack.”

<div align="center">

**⭐ Star this repo if you find it useful!**

Made with ❀️ by the Open-Source Community

</div>