fahmiaziz98 commited on
Commit
9847166
·
1 Parent(s): 3b88f19

init README

Browse files
Files changed (8) hide show
  1. API.md +729 -0
  2. README.md +285 -30
  3. core/__init__.py +0 -3
  4. core/embedding.py +0 -81
  5. core/model_manager.py +0 -229
  6. core/sparse.py +0 -123
  7. models/__init__.py +0 -20
  8. models/model.py +0 -110
API.md ADDED
@@ -0,0 +1,729 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 📖 Unified Embedding API Documentation
2
+
3
+ Complete API reference for the Unified Embedding API v3.0.0.
4
+
5
+ **Features:** Dense Embeddings, Sparse Embeddings, and Document Reranking
6
+
7
+ ---
8
+
9
+ ## 🌐 Base URL
10
+
11
+ ```
12
+ https://fahmiaziz-api-embedding.hf.space
13
+ ```
14
+
15
+ For local development:
16
+ ```
17
+ http://localhost:7860
18
+ ```
19
+
20
+ ---
21
+
22
+ ## 🔑 Authentication
23
+
24
+ **Currently no authentication required.**
25
+
26
+ ---
27
+
28
+ ## 📊 Endpoints Overview
29
+
30
+ | Endpoint | Method | Description |
31
+ |----------|--------|-------------|
32
+ | `/api/v1/embeddings/embed` | POST | Generate document embeddings |
33
+ | `/api/v1/embeddings/query` | POST | Generate query embeddings |
34
+ | `/api/v1/rerank` | POST | Rerank documents by relevance |
35
+ | `/api/v1/models` | GET | List available models |
36
+ | `/api/v1/models/{model_id}` | GET | Get model information |
37
+ | `/health` | GET | Health check |
38
+ | `/` | GET | API information |
39
+
40
+ ---
41
+
42
+ ## 🚀 Embedding Endpoints
43
+
44
+ ### 1. Generate Document Embeddings
45
+
46
+ **`POST /api/v1/embeddings/embed`**
47
+
48
+ Generate embeddings for document texts. Supports both single and batch processing.
49
+
50
+ #### Request Body
51
+
52
+ ```json
53
+ {
54
+ "texts": ["string"], // Required: List of texts (1-100 items)
55
+ "model_id": "string", // Required: Model identifier
56
+ "prompt": "string", // Optional: Instruction prompt
57
+ "options": { // Optional: Embedding parameters
58
+ "normalize_embeddings": true,
59
+ "batch_size": 32,
60
+ "max_length": 512,
61
+ "show_progress_bar": false
62
+ }
63
+ }
64
+ ```
65
+
66
+ #### Parameters
67
+
68
+ | Field | Type | Required | Description |
69
+ |-------|------|----------|-------------|
70
+ | `texts` | array[string] | ✅ Yes | List of texts to embed (min: 1, max: 100) |
71
+ | `model_id` | string | ✅ Yes | Model identifier (e.g., "qwen3-0.6b") |
72
+ | `prompt` | string | ❌ No | Instruction prompt for the model |
73
+ | `options` | object | ❌ No | Additional embedding parameters |
74
+
75
+ #### Options Parameters
76
+
77
+ | Field | Type | Default | Description |
78
+ |-------|------|---------|-------------|
79
+ | `normalize_embeddings` | boolean | false | L2 normalize output embeddings |
80
+ | `batch_size` | integer | 32 | Processing batch size (1-256) |
81
+ | `max_length` | integer | 512 | Maximum sequence length (1-8192) |
82
+ | `show_progress_bar` | boolean | false | Display progress during encoding |
83
+ | `precision` | string | float32 | Precision ("float32", "int8", "binary") |
84
+
85
+ #### Response - Single Text (Dense)
86
+
87
+ ```json
88
+ {
89
+ "embedding": [0.123, -0.456, 0.789, ...],
90
+ "dimension": 768,
91
+ "model_id": "qwen3-0.6b",
92
+ "processing_time": 0.0523
93
+ }
94
+ ```
95
+
96
+ #### Response - Batch (Dense)
97
+
98
+ ```json
99
+ {
100
+ "embeddings": [
101
+ [0.123, -0.456, ...],
102
+ [0.234, 0.567, ...],
103
+ [0.345, -0.678, ...]
104
+ ],
105
+ "dimension": 768,
106
+ "count": 3,
107
+ "model_id": "qwen3-0.6b",
108
+ "processing_time": 0.1245
109
+ }
110
+ ```
111
+
112
+ #### Response - Single Text (Sparse)
113
+
114
+ ```json
115
+ {
116
+ "sparse_embedding": {
117
+ "text": "Hello world",
118
+ "indices": [10, 25, 42, 100],
119
+ "values": [0.85, 0.62, 0.91, 0.73]
120
+ },
121
+ "model_id": "splade-pp-v2",
122
+ "processing_time": 0.0421
123
+ }
124
+ ```
125
+
126
+ #### Response - Batch (Sparse)
127
+
128
+ ```json
129
+ {
130
+ "embeddings": [
131
+ {
132
+ "text": "First doc",
133
+ "indices": [10, 25, 42],
134
+ "values": [0.85, 0.62, 0.91]
135
+ },
136
+ {
137
+ "text": "Second doc",
138
+ "indices": [15, 30, 50],
139
+ "values": [0.73, 0.88, 0.65]
140
+ }
141
+ ],
142
+ "count": 2,
143
+ "model_id": "splade-pp-v2",
144
+ "processing_time": 0.0892
145
+ }
146
+ ```
147
+
148
+ #### Examples
149
+
150
+ **Single Text (Dense Model):**
151
+ ```bash
152
+ curl -X 'POST' \
153
+ 'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed' \
154
+ -H 'accept: application/json' \
155
+ -H 'Content-Type: application/json' \
156
+ -d '{
157
+ "texts": ["What is artificial intelligence?"],
158
+ "model_id": "qwen3-0.6b"
159
+ }'
160
+ ```
161
+
162
+ **Single Text (Sparse Model):**
163
+ ```bash
164
+ curl -X 'POST' \
165
+ 'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed' \
166
+ -H 'accept: application/json' \
167
+ -H 'Content-Type: application/json' \
168
+ -d '{
169
+ "texts": ["Hello world"],
170
+ "model_id": "splade-pp-v2"
171
+ }'
172
+ ```
173
+
174
+ **Batch (with Options):**
175
+ ```bash
176
+ curl -X 'POST' \
177
+ 'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed' \
178
+ -H 'accept: application/json' \
179
+ -H 'Content-Type: application/json' \
180
+ -d '{
181
+ "texts": [
182
+ "First document to embed",
183
+ "Second document to embed",
184
+ "Third document to embed"
185
+ ],
186
+ "model_id": "qwen3-0.6b",
187
+ "options": {
188
+ "normalize_embeddings": true,
189
+ "batch_size": 32
190
+ }
191
+ }'
192
+ ```
193
+
194
+ **Python Example:**
195
+ ```python
196
+ import requests
197
+
198
+ url = "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed"
199
+
200
+ payload = {
201
+ "texts": ["Hello world"],
202
+ "model_id": "qwen3-0.6b"
203
+ }
204
+
205
+ response = requests.post(url, json=payload)
206
+ data = response.json()
207
+
208
+ print(f"Embedding dimension: {data['dimension']}")
209
+ print(f"Processing time: {data['processing_time']:.3f}s")
210
+ ```
211
+
212
+ ---
213
+
214
+ ### 2. Generate Query Embeddings
215
+
216
+ **`POST /api/v1/embeddings/query`**
217
+
218
+ Generate embeddings optimized for search queries. Some models differentiate between query and document embeddings.
219
+
220
+ #### Request Body
221
+
222
+ Same as `/embed` endpoint.
223
+
224
+ ```json
225
+ {
226
+ "texts": ["string"],
227
+ "model_id": "string",
228
+ "prompt": "string",
229
+ "options": {}
230
+ }
231
+ ```
232
+
233
+ #### Response
234
+
235
+ Same format as `/embed` endpoint.
236
+
237
+ #### Examples
238
+
239
+ **Single Query:**
240
+ ```bash
241
+ curl -X 'POST' \
242
+ 'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/query' \
243
+ -H 'accept: application/json' \
244
+ -H 'Content-Type: application/json' \
245
+ -d '{
246
+ "texts": ["What is machine learning?"],
247
+ "model_id": "qwen3-0.6b",
248
+ "prompt": "Represent this query for retrieval",
249
+ "options": {
250
+ "normalize_embeddings": true
251
+ }
252
+ }'
253
+ ```
254
+
255
+ **Batch Queries:**
256
+ ```bash
257
+ curl -X 'POST' \
258
+ 'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/query' \
259
+ -H 'accept: application/json' \
260
+ -H 'Content-Type: application/json' \
261
+ -d '{
262
+ "texts": [
263
+ "First query",
264
+ "Second query",
265
+ "Third query"
266
+ ],
267
+ "model_id": "qwen3-0.6b"
268
+ }'
269
+ ```
270
+
271
+ **Python Example:**
272
+ ```python
273
+ import requests
274
+
275
+ url = "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/query"
276
+
277
+ payload = {
278
+ "texts": ["What is AI?"],
279
+ "model_id": "qwen3-0.6b",
280
+ "options": {
281
+ "normalize_embeddings": True
282
+ }
283
+ }
284
+
285
+ response = requests.post(url, json=payload)
286
+ embedding = response.json()["embedding"]
287
+ ```
288
+
289
+ ---
290
+
291
+ ### 3. Rerank Documents
292
+
293
+ **`POST /api/v1/rerank`**
294
+
295
+ Rerank documents based on their relevance to a query using CrossEncoder models.
296
+
297
+ #### Request Body
298
+
299
+ ```json
300
+ {
301
+ "query": "string", // Required: Search query
302
+ "documents": ["string"], // Required: List of documents (min: 1)
303
+ "model_id": "string", // Required: Reranking model identifier
304
+ "top_k": integer, // Required: Number of top results to return
305
+ }
306
+ ```
307
+
308
+ #### Parameters
309
+
310
+ | Field | Type | Required | Description |
311
+ |-------|------|----------|-------------|
312
+ | `query` | string | ✅ Yes | Search query text |
313
+ | `documents` | array[string] | ✅ Yes | List of documents to rerank (min: 1) |
314
+ | `model_id` | string | ✅ Yes | Reranking model identifier |
315
+ | `top_k` | integer | ✅ Yes | Maximum number of results to return |
316
+
317
+ #### Response
318
+
319
+ ```json
320
+ {
321
+ "model_id": "jina-reranker-v3",
322
+ "processing_time": 0.56,
323
+ "query": "Python for data science",
324
+ "results": [
325
+ {
326
+ "index": 0,
327
+ "score": 0.95,
328
+ "text": "Python is excellent for data science"
329
+ },
330
+ {
331
+ "index": 2,
332
+ "score": 0.73,
333
+ "text": "R is also used in data science"
334
+ }
335
+ ]
336
+ }
337
+ ```
338
+
339
+ #### Response Fields
340
+
341
+ | Field | Type | Description |
342
+ |-------|------|-------------|
343
+ | `model_id` | string | Model identifier used |
344
+ | `processing_time` | float | Processing time in seconds |
345
+ | `query` | string | Original search query |
346
+ | `results` | array | Reranked documents with scores |
347
+ | `results[].index` | integer | Original index in input documents |
348
+ | `results[].score` | float | Relevance score (0-1, normalized) |
349
+ | `results[].text` | string | Document text |
350
+
351
+ #### Examples
352
+
353
+ **Basic Reranking:**
354
+ ```bash
355
+ curl -X 'POST' \
356
+ 'https://fahmiaziz-api-embedding.hf.space/api/v1/rerank' \
357
+ -H 'Content-Type: application/json' \
358
+ -d '{
359
+ "query": "Python for data science",
360
+ "documents": [
361
+ "Python is great for data science",
362
+ "Java is used for enterprise applications",
363
+ "R is also used in data science",
364
+ "JavaScript is for web development"
365
+ ],
366
+ "model_id": "jina-reranker-v3",
367
+ "top_k": 2
368
+ }'
369
+ ```
370
+
371
+
372
+ **Python Example:**
373
+ ```python
374
+ import requests
375
+
376
+ url = "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank"
377
+
378
+ payload = {
379
+ "query": "best programming language for beginners",
380
+ "documents": [
381
+ "Python is beginner-friendly with simple syntax",
382
+ "C++ is powerful but complex for beginners",
383
+ "JavaScript is essential for web development",
384
+ "Rust offers memory safety but steep learning curve"
385
+ ],
386
+ "model_id": "jina-reranker-v3",
387
+ "top_k": 2
388
+ }
389
+
390
+ response = requests.post(url, json=payload)
391
+ data = response.json()
392
+
393
+ print(f"Top result: {data['results'][0]['text']}")
394
+ print(f"Score: {data['results'][0]['score']:.3f}")
395
+ ```
396
+
397
+ **JavaScript Example:**
398
+ ```javascript
399
+ const url = "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank";
400
+
401
+ const response = await fetch(url, {
402
+ method: "POST",
403
+ headers: { "Content-Type": "application/json" },
404
+ body: JSON.stringify({
405
+ query: "AI applications",
406
+ documents: [
407
+ "Computer vision for image recognition",
408
+ "Recipe for chocolate cake",
409
+ "Natural language processing for chatbots",
410
+ "Travel guide to Paris"
411
+ ],
412
+ model_id: "jina-reranker-v3",
413
+ top_k: 2
414
+ })
415
+ });
416
+
417
+ const { results } = await response.json();
418
+ console.log("Top results:", results);
419
+ ```
420
+
421
+ ---
422
+
423
+ ## 🤖 Model Management
424
+
425
+ ### 3. List Available Models
426
+
427
+ **`GET /api/v1/models`**
428
+
429
+ Get a list of all available embedding models.
430
+
431
+ #### Response
432
+
433
+ ```json
434
+ {
435
+ "models": [
436
+ {
437
+ "id": "qwen3-0.6b",
438
+ "name": "Qwen/Qwen3-Embedding-0.6B",
439
+ "type": "embeddings",
440
+ "loaded": true,
441
+ "repository": "https://huggingface.co/Qwen/Qwen3-Embedding-0.6B"
442
+ },
443
+ {
444
+ "id": "splade-pp-v2",
445
+ "name": "prithivida/Splade_PP_en_v2",
446
+ "type": "sparse-embeddings",
447
+ "loaded": true,
448
+ "repository": "https://huggingface.co/prithivida/Splade_PP_en_v2"
449
+ }
450
+ ],
451
+ "total": 2
452
+ }
453
+ ```
454
+
455
+ #### Example
456
+
457
+ ```bash
458
+ curl -X 'GET' \
459
+ 'https://fahmiaziz-api-embedding.hf.space/api/v1/models' \
460
+ -H 'accept: application/json'
461
+ ```
462
+
463
+ ---
464
+
465
+ ### 4. Get Model Information
466
+
467
+ **`GET /api/v1/models/{model_id}`**
468
+
469
+ Get detailed information about a specific model.
470
+
471
+ #### Parameters
472
+
473
+ | Parameter | Type | Required | Description |
474
+ |-----------|------|----------|-------------|
475
+ | `model_id` | string | ✅ Yes | Model identifier |
476
+
477
+ #### Response
478
+
479
+ ```json
480
+ {
481
+ "id": "qwen3-0.6b",
482
+ "name": "Qwen/Qwen3-Embedding-0.6B",
483
+ "type": "embeddings",
484
+ "loaded": true,
485
+ "repository": "https://huggingface.co/Qwen/Qwen3-Embedding-0.6B"
486
+ }
487
+ ```
488
+
489
+ #### Example
490
+
491
+ ```bash
492
+ curl -X 'GET' \
493
+ 'https://fahmiaziz-api-embedding.hf.space/api/v1/models/qwen3-0.6b' \
494
+ -H 'accept: application/json'
495
+ ```
496
+
497
+ ---
498
+
499
+ ## 🏥 System Endpoints
500
+
501
+ ### 5. Health Check
502
+
503
+ **`GET /health`**
504
+
505
+ Check API health status.
506
+
507
+ #### Response
508
+
509
+ ```json
510
+ {
511
+ "status": "ok",
512
+ "total_models": 2,
513
+ "loaded_models": 2,
514
+ "startup_complete": true
515
+ }
516
+ ```
517
+
518
+ #### Example
519
+
520
+ ```bash
521
+ curl -X 'GET' \
522
+ 'https://fahmiaziz-api-embedding.hf.space/health' \
523
+ -H 'accept: application/json'
524
+ ```
525
+
526
+ ---
527
+
528
+ ### 6. API Information
529
+
530
+ **`GET /`**
531
+
532
+ Get basic API information.
533
+
534
+ #### Response
535
+
536
+ ```json
537
+ {
538
+ "message": "Unified Embedding API - Dense & Sparse Embeddings",
539
+ "version": "3.0.0",
540
+ "docs_url": "/docs"
541
+ }
542
+ ```
543
+
544
+ ---
545
+
546
+ ## ❌ Error Responses
547
+
548
+ All errors follow this format:
549
+
550
+ ```json
551
+ {
552
+ "detail": "Error message description"
553
+ }
554
+ ```
555
+
556
+ ### HTTP Status Codes
557
+
558
+ | Code | Description |
559
+ |------|-------------|
560
+ | 200 | Success |
561
+ | 400 | Bad Request - Invalid input |
562
+ | 404 | Not Found - Model not found |
563
+ | 422 | Unprocessable Entity - Validation error |
564
+ | 500 | Internal Server Error |
565
+ | 503 | Service Unavailable - Server not ready |
566
+
567
+ ### Common Errors
568
+
569
+ **Model Not Found (404):**
570
+ ```json
571
+ {
572
+ "detail": "Model 'unknown-model' not found in configuration"
573
+ }
574
+ ```
575
+
576
+ **Validation Error (422):**
577
+ ```json
578
+ {
579
+ "detail": [
580
+ {
581
+ "loc": ["body", "texts"],
582
+ "msg": "texts list cannot be empty",
583
+ "type": "value_error"
584
+ }
585
+ ]
586
+ }
587
+ ```
588
+
589
+ **Batch Too Large (422):**
590
+ ```json
591
+ {
592
+ "detail": "Batch size (150) exceeds maximum (100)"
593
+ }
594
+ ```
595
+
596
+ ---
597
+
598
+ ## 📦 Available Models
599
+
600
+ ### Dense Embedding Models
601
+
602
+ | Model ID | Name | Dimension | Description |
603
+ |----------|------|-----------|-------------|
604
+ | `qwen3-0.6b` | Qwen/Qwen3-Embedding-0.6B | 768 | Efficient multilingual embeddings |
605
+
606
+ ### Sparse Embedding Models
607
+
608
+ | Model ID | Name | Type | Description |
609
+ |----------|------|------|-------------|
610
+ | `splade-pp-v2` | prithivida/Splade_PP_en_v2 | Sparse | SPLADE++ English v2 |
611
+
612
+ ### Reranking Models
613
+
614
+ | Model ID | Name | Type | Description |
615
+ |----------|------|------|-------------|
616
+ | `jina-reranker-v3` | jinaai/jina-reranker-v3-base-en | CrossEncoder | High-quality reranking (English) |
617
+ | `bge-v2-m3` | BAAI/bge-reranker-v2-m3 | CrossEncoder | Multilingual reranking |
618
+
619
+ ---
620
+
621
+ ## 🔧 Rate Limits
622
+
623
+ **Current Limits:**
624
+ - Max text length: 8,192 characters
625
+ - Max batch size: 100 texts per request
626
+ - No rate limiting (subject to server resources)
627
+
628
+ ---
629
+
630
+ ## 💡 Best Practices
631
+
632
+ ### 1. Batch Processing
633
+ Always batch multiple texts together for better performance:
634
+ ```python
635
+ # ❌ Bad - Multiple requests
636
+ for text in texts:
637
+ response = requests.post(url, json={"texts": [text], ...})
638
+
639
+ # ✅ Good - Single batch request
640
+ response = requests.post(url, json={"texts": texts, ...})
641
+ ```
642
+
643
+ ### 2. Normalize Embeddings for Similarity
644
+ For cosine similarity, always normalize:
645
+ ```python
646
+ payload = {
647
+ "texts": ["text"],
648
+ "model_id": "qwen3-0.6b",
649
+ "options": {"normalize_embeddings": True}
650
+ }
651
+ ```
652
+
653
+ ### 3. Model Selection
654
+ - **Dense models** (qwen3-0.6b): Best for semantic similarity
655
+ - **Sparse models** (splade-pp-v2): Best for keyword matching + semantic
656
+ - **Rerank models** (jina-reranker-v3): Best for re-scoring top candidates
657
+
658
+ ### 4. Two-Stage Retrieval (Recommended for RAG)
659
+ ```python
660
+ # Stage 1: Fast retrieval with embeddings (top 100)
661
+ query_embedding = embed_query(query)
662
+ candidates = vector_search(query_embedding, top_k=100)
663
+
664
+ # Stage 2: Precise reranking (top 10)
665
+ reranked = rerank(
666
+ query=query,
667
+ documents=[c["text"] for c in candidates],
668
+ model_id="jina-reranker-v3",
669
+ top_k=10
670
+ )
671
+ ```
672
+
673
+ ### 5. Error Handling
674
+ Always handle errors gracefully:
675
+ ```python
676
+ try:
677
+ response = requests.post(url, json=payload)
678
+ response.raise_for_status()
679
+ data = response.json()
680
+ except requests.exceptions.HTTPError as e:
681
+ print(f"HTTP error: {e}")
682
+ except requests.exceptions.RequestException as e:
683
+ print(f"Request failed: {e}")
684
+ ```
685
+
686
+ ---
687
+
688
+ ## 🐛 Troubleshooting
689
+
690
+ ### Empty Response
691
+ - Check `texts` field is not empty
692
+ - Validate `model_id` exists
693
+
694
+ ### Slow Performance
695
+ - Use batch requests instead of multiple single requests
696
+ - Reduce `batch_size` in options if memory issues
697
+ - Check model is preloaded (first request is slower)
698
+
699
+ ### Connection Errors
700
+ - Verify base URL is correct
701
+ - Check network connectivity
702
+ - Ensure server is running (`/health` endpoint)
703
+
704
+ ---
705
+
706
+ ## 📞 Support
707
+
708
+ - **Documentation**: [GitHub README](https://github.com/fahmiaziz/unified-embedding-api)
709
+ - **Issues**: [GitHub Issues](https://github.com/fahmiaziz/unified-embedding-api/issues)
710
+ - **Hugging Face Space**: [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
711
+
712
+ ---
713
+
714
+ ## 🔄 Changelog
715
+
716
+ ### v3.0.0 (Current)
717
+ - ✨ Added reranking endpoint (`/api/v1/rerank`)
718
+ - ✨ Support for CrossEncoder models
719
+ - ✨ Unified batch-only response format
720
+ - ✨ Flexible kwargs support
721
+ - ✨ In-memory caching
722
+ - ✨ Improved error handling
723
+ - ✨ Comprehensive documentation
724
+ - 🐛 Fixed type hint errors in RerankModel
725
+ - 🐛 Fixed duplicate parameter errors in rerank endpoint
726
+
727
+ ---
728
+
729
+ **Last Updated**: 2025-11-02
README.md CHANGED
@@ -11,54 +11,85 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
11
 
12
  # 🧠 Unified Embedding API
13
 
14
- > 🧩 Unified API for all your Embedding & Sparse needs — plug and play with any model from Hugging Face or your own fine-tuned versions. This official repository from huggingface space
15
 
16
  ---
17
 
18
  ## 🚀 Overview
19
 
20
- **Unified Embedding API** is a modular and open-source **RAG-ready API** built for developers who want a simple, unified way to access **dense**, and **sparse** models.
21
 
22
  It’s designed for **vector search**, **semantic retrieval**, and **AI-powered pipelines** — all controlled from a single `config.yaml` file.
23
 
24
  ⚠️ **Note:** This is a development API.
25
- For production deployment, host it on cloud platforms such as **Hugging Face TGI**, **AWS**, or **GCP**.
26
 
27
  ---
28
 
29
  ## 🧩 Features
30
 
31
  - 🧠 **Unified Interface** — One API to handle dense, sparse, and reranking models.
32
- - ⚙️ **Configurable** — Switch models instantly via `config.yaml`.
 
33
  - 🔍 **Vector DB Ready** — Easily integrates with FAISS, Chroma, Qdrant, Milvus, etc.
34
  - 📈 **RAG Support** — Perfect base for Retrieval-Augmented Generation systems.
35
  - ⚡ **Fast & Lightweight** — Powered by FastAPI and optimized with async processing.
36
- - 🧰 **Extendable** — Add your own models or pipelines effortlessly.
37
 
38
  ---
39
 
40
  ## 📁 Project Structure
41
 
42
  ```
43
-
44
  unified-embedding-api/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
- ├── core/
47
- │ ├── embedding.py
48
- │ └── model_manager.py
49
- ├── models/
50
- | └──model.py
51
- ├── app.py # Entry point (FastAPI server)
52
- |── config.yaml # Model + system configuration
53
- ├── Dockerfile
54
  ├── requirements.txt
 
 
55
  └── README.md
56
-
57
  ```
58
  ---
59
  ## 🧩 Model Selection
60
 
61
- Default configuration is optimized for **CPU 2vCPU / 16GB RAM**. See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for memory usage reference.
 
 
 
 
 
 
 
 
 
62
 
63
  ⚠️ If you plan to use larger models like `Qwen2-embedding-8B`, please upgrade your Space.
64
 
@@ -66,37 +97,261 @@ Default configuration is optimized for **CPU 2vCPU / 16GB RAM**. See [MTEB Leade
66
 
67
  ## ☁️ How to Deploy (Free 🚀)
68
 
69
- Deploy your **custom Embedding API** on **Hugging Face Spaces** — free, fast, and serverless.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
- ### 🔧 Steps:
72
 
73
- 1. **Clone this Space Template:**
74
- 👉 [Hugging Face Space — fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
75
- 2. **Edit `config.yaml`** to set your own model names and backend preferences.
76
- 3. **Push your code** — Spaces will automatically rebuild and host your API.
77
 
78
- That’s it! You now have a live embedding API endpoint powered by your models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
- 📘 **Tutorial Reference:**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  - [Deploy Applications on Hugging Face Spaces (Official Guide)](https://huggingface.co/blog/HemanthSai7/deploy-applications-on-huggingface-spaces)
82
  - [How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository by Ruslanmv](https://github.com/ruslanmv/How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository?tab=readme-ov-file)
 
 
83
 
84
  ---
85
 
 
 
 
 
 
86
 
87
- ## 🧑‍💻 Contributing
88
 
89
- Contributions are welcome!
90
- Please open an issue or submit a pull request to discuss changes.
 
 
91
 
92
  ---
93
 
94
- ## ⚠️ License
95
 
96
- MIT License © 2025
97
- Developed with ❤️ by the Open-Source Community.
 
98
 
99
  ---
100
 
101
  > ✨ “Unify your embeddings. Simplify your AI stack.”
102
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  # 🧠 Unified Embedding API
13
 
14
+ > 🧩 Unified API for all your Embedding, Sparse & Reranking Models — plug and play with any model from Hugging Face or your own fine-tuned versions.
15
 
16
  ---
17
 
18
  ## 🚀 Overview
19
 
20
+ **Unified Embedding API** is a modular and open-source **RAG-ready API** built for developers who want a simple, unified way to access **dense**, **sparse**, and **reranking** models.
21
 
22
  It’s designed for **vector search**, **semantic retrieval**, and **AI-powered pipelines** — all controlled from a single `config.yaml` file.
23
 
24
  ⚠️ **Note:** This is a development API.
25
+ For production deployment, host it on cloud platforms such as **Hugging Face TEI**, **AWS**, **GCP**, or any cloud provider of your choice.
26
 
27
  ---
28
 
29
  ## 🧩 Features
30
 
31
  - 🧠 **Unified Interface** — One API to handle dense, sparse, and reranking models.
32
+ - **Batch Processing** — Automatic single/batch.
33
+ - 🔧 **Flexible Parameters** — Full control via kwargs and options
34
  - 🔍 **Vector DB Ready** — Easily integrates with FAISS, Chroma, Qdrant, Milvus, etc.
35
  - 📈 **RAG Support** — Perfect base for Retrieval-Augmented Generation systems.
36
  - ⚡ **Fast & Lightweight** — Powered by FastAPI and optimized with async processing.
37
+ - 🧰 **Extendable** — Switch models instantly via `config.yaml` and add your own models or pipelines effortlessly.
38
 
39
  ---
40
 
41
  ## 📁 Project Structure
42
 
43
  ```
 
44
  unified-embedding-api/
45
+ ├── src/
46
+ │ ├── api/
47
+ │ │ ├── dependencies.py
48
+ │ │ └── routes/
49
+ │ │ ├── embeddings.py # endpoint sparse & dense
50
+ │ │ ├── models.py
51
+ │ │ |── health.py
52
+ │ │ └── rerank.py # endpoint reranking
53
+ │ ├── core/
54
+ │ │ ├── base.py
55
+ │ │ ├── config.py
56
+ │ │ ├── exceptions.py
57
+ │ │ └── manager.py
58
+ │ ├── models/
59
+ │ │ ├── embeddings/
60
+ │ │ │ ├── dense.py # dense model
61
+ │ │ │ └── sparse.py # sparse model
62
+ │ │ │ └── rank.py # reranking model
63
+ │ │ └── schemas/
64
+ │ │ ├── common.py
65
+ │ │ ├── requests.py
66
+ │ │ └── responses.py
67
+ │ ├── config/
68
+ │ │ ├── settings.py
69
+ │ │ └── models.yaml # add/change models here
70
+ │ └── utils/
71
+ │ ├── logger.py
72
+ │ └── validators.py
73
 
74
+ ├── app.py
 
 
 
 
 
 
 
75
  ├── requirements.txt
76
+ ├── LICENSE
77
+ ├── Dockerfile
78
  └── README.md
 
79
  ```
80
  ---
81
  ## 🧩 Model Selection
82
 
83
+ Default configuration is optimized for **CPU 2vCPU / 16GB RAM**. See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for model recommendations and memory usage reference.
84
+
85
+ **Add More Models:** Edit `src/config/models.yaml`
86
+
87
+ ```yaml
88
+ models:
89
+ your-model-name:
90
+ name: "org/model-name"
91
+ type: "embeddings" # or "sparse-embeddings" or "rerank"
92
+ ```
93
 
94
  ⚠️ If you plan to use larger models like `Qwen2-embedding-8B`, please upgrade your Space.
95
 
 
97
 
98
  ## ☁️ How to Deploy (Free 🚀)
99
 
100
+ Deploy your **Custom Embedding API** on **Hugging Face Spaces** — free, fast, and serverless.
101
+
102
+ ### **1️⃣ Deploy on Hugging Face Spaces (Free!)**
103
+
104
+ 1. **Duplicate this Space:**
105
+ 👉 [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
106
+ Click **⋯** (three dots) → **Duplicate this Space**
107
+
108
+ 2. **Add HF_TOKEN environment variable** Make sure your space is public
109
+
110
+ 3. **Clone your Space locally:**
111
+ Click **⋯** → **Clone repository**
112
+ ```bash
113
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/api-embedding
114
+ cd api-embedding
115
+ ```
116
+
117
+ 4. **Edit `src/config/models.yaml`** to customize models:
118
+ ```yaml
119
+ models:
120
+ your-model:
121
+ name: "org/model-name"
122
+ type: "embeddings" # or "sparse-embeddings" or "rerank"
123
+ ```
124
+
125
+ 5. **Commit and push changes:**
126
+ ```bash
127
+ git add src/config/models.yaml
128
+ git commit -m "Update models configuration"
129
+ git push
130
+ ```
131
+
132
+ 6. **Access your API:**
133
+ Click **⋯** → **Embed this Space** -> copy **Direct URL**
134
+ ```
135
+ https://YOUR_USERNAME-api-embedding.hf.space
136
+ https://YOUR_USERNAME-api-embedding.hf.space/docs # Interactive docs
137
+ ```
138
 
139
+ That’s it! You now have a live embedding API endpoint powered by your models.
140
 
141
+ ### **2️⃣ Run Locally (NOT RECOMMENDED)**
 
 
 
142
 
143
+ ```bash
144
+ # Clone repository
145
+ git clone https://github.com/fahmiaziz98/unified-embedding-api.git
146
+ cd unified-embedding-api
147
+
148
+ # Create virtual environment
149
+ python -m venv venv
150
+ source venv/bin/activate
151
+
152
+ # Install dependencies
153
+ pip install -r requirements.txt
154
+
155
+ # Run server
156
+ python app.py
157
+ ```
158
+
159
+ API available at: `http://localhost:7860`
160
+
161
+ ### **3️⃣ Run with Docker**
162
+
163
+ ```bash
164
+ # Build and run
165
+ docker-compose up --build
166
+
167
+ # Or with Docker only
168
+ docker build -t embedding-api .
169
+ docker run -p 7860:7860 embedding-api
170
+ ```
171
+
172
+ ## 📖 Usage Examples
173
+
174
+ ### **Python**
175
+
176
+ ```python
177
+ import requests
178
+
179
+ url = "http://localhost:7860/api/v1/embeddings/embed"
180
+
181
+ # Single embedding
182
+ response = requests.post(url, json={
183
+ "texts": ["What is artificial intelligence?"],
184
+ "model_id": "qwen3-0.6b"
185
+ })
186
+ print(response.json())
187
+
188
+ # Batch embeddings
189
+ response = requests.post(url, json={
190
+ "texts": [
191
+ "First document",
192
+ "Second document",
193
+ "Third document"
194
+ ],
195
+ "model_id": "qwen3-0.6b",
196
+ "options": {
197
+ "normalize_embeddings": True
198
+ }
199
+ })
200
+ embeddings = response.json()["embeddings"]
201
+ ```
202
+
203
+ ### **cURL**
204
+
205
+ ```bash
206
+ # Single embedding (Dense)
207
+ curl -X POST "http://localhost:7860/api/v1/embeddings/embed" \
208
+ -H "Content-Type: application/json" \
209
+ -d '{
210
+ "texts": ["Hello world"],
211
+ "prompt": "add instructions here",
212
+ "model_id": "qwen3-0.6b"
213
+ }'
214
+
215
+ # Batch embeddings (Sparse)
216
+ curl -X POST "http://localhost:7860/api/v1/embeddings/embed" \
217
+ -H "Content-Type: application/json" \
218
+ -d '{
219
+ "texts": ["First doc", "Second doc", "Third doc"],
220
+ "model_id": "splade-pp-v2"
221
+ }'
222
+
223
+ # Reranking
224
+ curl -X POST "http://localhost:7860/api/v1/rerank" \
225
+ -H "Content-Type: application/json" \
226
+ -d '{
227
+ "documents": [
228
+ "Python is a popular language for data science due to its extensive libraries.",
229
+ "R is widely used in statistical computing and data analysis.",
230
+ "Java is a versatile language used in various applications, including data science.",
231
+ "SQL is essential for managing and querying relational databases.",
232
+ "Julia is a high-performance language gaining popularity for numerical computing and data science."
233
+ ],
234
+ "model_id": "bge-v2-m3",
235
+ "query": "Python best programming languages for data science",
236
+ "top_k": 3
237
+ }'
238
+
239
+ # Query embedding with options
240
+ curl -X POST "http://localhost:7860/api/v1/embeddings/query" \
241
+ -H "Content-Type: application/json" \
242
+ -d '{
243
+ "texts": ["What is machine learning?"],
244
+ "model_id": "qwen3-0.6b",
245
+ "options": {
246
+ "normalize_embeddings": true,
247
+ "batch_size": 32
248
+ }
249
+ }'
250
+ ```
251
+
252
+ ### **JavaScript/TypeScript**
253
+
254
+ ```typescript
255
+ const url = "http://localhost:7860/api/v1/embeddings/embed";
256
+
257
+ const response = await fetch(url, {
258
+ method: "POST",
259
+ headers: {
260
+ "Content-Type": "application/json",
261
+ },
262
+ body: JSON.stringify({
263
+ texts: ["Hello world"],
264
+ model_id: "qwen3-0.6b",
265
+ }),
266
+ });
267
+
268
+ const data = await response.json();
269
+ console.log(data.embedding);
270
+ ```
271
 
272
+ ---
273
+
274
+ ## 📊 API Endpoints
275
+
276
+ | Endpoint | Method | Description |
277
+ |----------|--------|-------------|
278
+ | `/api/v1/embeddings/embed` | POST | Generate document embeddings (single/batch) |
279
+ | `/api/v1/embeddings/query` | POST | Generate query embeddings (single/batch) |
280
+ | `/api/v1/rerank` | POST | Rerank documents based on a query |
281
+ | `/api/v1/models` | GET | List available models |
282
+ | `/api/v1/models/{model_id}` | GET | Get model information |
283
+ | `/health` | GET | Health check |
284
+ | `/` | GET | API information |
285
+ | `/docs` | GET | Interactive API documentation |
286
+
287
+
288
+ ### 🤝 Contributing
289
+
290
+ Contributions are welcome! Please:
291
+
292
+ 1. Fork the repository
293
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
294
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
295
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
296
+ 5. Open a Pull Request
297
+
298
+ **Development Setup:**
299
+
300
+ ```bash
301
+ git clone https://github.com/fahmiaziz/unified-embedding-api.git
302
+ cd unified-embedding-api
303
+ pip install -r requirements-dev.txt
304
+ pre-commit install # (optional)
305
+ ```
306
+
307
+ ---
308
+
309
+ ## 📚 Resources
310
+
311
+ - [API Documentation](API.md)
312
+ - [Sentence Transformers](https://www.sbert.net/)
313
+ - [FastAPI Docs](https://fastapi.tiangolo.com/)
314
+ - [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
315
+ - [Hugging Face Spaces](https://huggingface.co/docs/hub/spaces)
316
  - [Deploy Applications on Hugging Face Spaces (Official Guide)](https://huggingface.co/blog/HemanthSai7/deploy-applications-on-huggingface-spaces)
317
  - [How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository by Ruslanmv](https://github.com/ruslanmv/How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository?tab=readme-ov-file)
318
+ - [Duplicate & Clone space to local machine](https://huggingface.co/docs/hub/spaces-overview#duplicating-a-space)
319
+ ---
320
 
321
  ---
322
 
323
+ ## 📝 License
324
+
325
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
326
+
327
+ ---
328
 
329
+ ## 🙏 Acknowledgments
330
 
331
+ - **Sentence Transformers** for the embedding models
332
+ - **FastAPI** for the excellent web framework
333
+ - **Hugging Face** for model hosting and Spaces
334
+ - **Open Source Community** for inspiration and support
335
 
336
  ---
337
 
338
+ ## 📞 Support
339
 
340
+ - **Issues:** [GitHub Issues](https://github.com/fahmiaziz/unified-embedding-api/issues)
341
+ - **Discussions:** [GitHub Discussions](https://github.com/fahmiaziz/unified-embedding-api/discussions)
342
+ - **Hugging Face Space:** [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
343
 
344
  ---
345
 
346
  > ✨ “Unify your embeddings. Simplify your AI stack.”
347
 
348
+ <div align="center">
349
+
350
+ **⭐ Star this repo if you find it useful!**
351
+
352
+ Made with ❤️ by the Open-Source Community
353
+
354
+ </div>
355
+
356
+
357
+
core/__init__.py DELETED
@@ -1,3 +0,0 @@
1
- from .model_manager import ModelManager
2
-
3
- __all__ = ["ModelManager"]
 
 
 
 
core/embedding.py DELETED
@@ -1,81 +0,0 @@
1
- from typing import List, Optional
2
- from sentence_transformers import SentenceTransformer
3
- from loguru import logger
4
-
5
- from ..src.core.config import ModelConfig
6
-
7
-
8
- class EmbeddingModel:
9
- """
10
- Embedding model wrapper for dense embeddings.
11
-
12
- attributes:
13
- config: ModelConfig instance
14
- model: SentenceTransformer instance
15
- _loaded: Flag indicating if the model is loaded
16
- """
17
-
18
- def __init__(self, config: ModelConfig):
19
- self.config = config
20
- self.model: Optional[SentenceTransformer] = None
21
- self._loaded = False
22
-
23
- def load(self) -> None:
24
- """Load the embedding model."""
25
- if self._loaded:
26
- return
27
-
28
- logger.info(f"Loading embedding model: {self.config.name}")
29
- try:
30
- self.model = SentenceTransformer(
31
- self.config.name, device="cpu", trust_remote_code=True
32
- )
33
- self._loaded = True
34
- logger.success(f"Loaded embedding model: {self.config.id}")
35
- except Exception as e:
36
- logger.error(f"Failed to load embedding model {self.config.id}: {e}")
37
- raise
38
-
39
- def query_embed(self, text: List[str], prompt: Optional[str] = None) -> List[float]:
40
- """
41
- method to generate embedding for a single text.
42
-
43
- Args:
44
- text: Input text
45
- prompt: Optional prompt for instruction-based models
46
-
47
- Returns:
48
- Embedding vector
49
- """
50
- if not self._loaded:
51
- self.load()
52
-
53
- try:
54
- embeddings = self.model.encode_query(text, prompt=prompt)
55
- return [embedding.tolist() for embedding in embeddings]
56
- except Exception as e:
57
- logger.error(f"Embedding generation failed: {e}")
58
- raise
59
-
60
- def embed_documents(
61
- self, texts: List[str], prompt: Optional[str] = None
62
- ) -> List[List[float]]:
63
- """
64
- method to generate embeddings for a list of texts.
65
-
66
- Args:
67
- texts: List of input texts
68
- prompt: Optional prompt for instruction-based models
69
-
70
- Returns:
71
- List of embedding vectors
72
- """
73
- if not self._loaded:
74
- self.load()
75
-
76
- try:
77
- embeddings = self.model.encode_document(texts, prompt=prompt)
78
- return [embedding.tolist() for embedding in embeddings]
79
- except Exception as e:
80
- logger.error(f"Embedding generation failed: {e}")
81
- raise
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
core/model_manager.py DELETED
@@ -1,229 +0,0 @@
1
- import yaml
2
- from pathlib import Path
3
- from loguru import logger
4
- from typing import Dict, List, Any, Union
5
- from threading import Lock
6
- from .embedding import EmbeddingModel
7
- from .sparse import SparseEmbeddingModel
8
- from ..src.core.config import ModelConfig
9
-
10
-
11
- class ModelManager:
12
- """
13
- Manages multiple embedding models based on a configuration file.
14
-
15
- Attributes:
16
- models: Dictionary mapping model IDs to their instances.
17
- model_configs: Dictionary mapping model IDs to their configurations.
18
- default_model_id: The default model ID to use if none is specified.
19
- _lock: A threading lock for thread-safe operations.
20
- _preload_complete: Flag indicating if all models have been preloaded.
21
- """
22
-
23
- def __init__(self, config_path: str = "config.yaml"):
24
- self.models: Dict[str, Union[EmbeddingModel, SparseEmbeddingModel]] = {}
25
- self.model_configs: Dict[str, ModelConfig] = {}
26
- self._lock = Lock() # For thread safety
27
- self._preload_complete = False
28
-
29
- self._load_config(config_path)
30
-
31
- def _load_config(self, config_path: str) -> None:
32
- """Load model configurations from a YAML file."""
33
-
34
- config_file = Path(config_path)
35
- if not config_file.exists():
36
- raise FileNotFoundError(f"Configuration file not found: {config_path}")
37
-
38
- try:
39
- with open(config_file, "r", encoding="utf-8") as f:
40
- config = yaml.safe_load(f)
41
-
42
- for model_id, model_cfg in config["models"].items():
43
- self.model_configs[model_id] = ModelConfig(model_id, model_cfg)
44
-
45
- logger.info(f"Loaded {len(self.model_configs)} model configurations")
46
-
47
- except Exception as e:
48
- raise ValueError(f"Failed to load configuration: {e}")
49
-
50
- def _create_model(
51
- self, config: ModelConfig
52
- ) -> Union[EmbeddingModel, SparseEmbeddingModel]:
53
- """
54
- Factory method to create model instances based on type.
55
-
56
- Args:
57
- config: The ModelConfig instance.
58
-
59
- Returns:
60
- The created model instance.
61
- """
62
- if config.type == "sparse-embeddings":
63
- return SparseEmbeddingModel(config)
64
- else:
65
- return EmbeddingModel(config)
66
-
67
- def preload_all_models(self) -> None:
68
- """
69
- Preload all models defined in the configuration.
70
- returns: None
71
- """
72
-
73
- if self._preload_complete:
74
- logger.info("Models already preloaded")
75
- return
76
-
77
- logger.info(f"Preloading {len(self.model_configs)} models...")
78
-
79
- successful_loads = 0
80
- for model_id, config in self.model_configs.items():
81
- try:
82
- with self._lock:
83
- if model_id not in self.models:
84
- model = self._create_model(config)
85
- model.load()
86
- self.models[model_id] = model
87
- successful_loads += 1
88
- logger.debug(f"Preloaded: {model_id}")
89
-
90
- except Exception as e:
91
- logger.error(f"Failed to preload {model_id}: {e}")
92
-
93
- self._preload_complete = True
94
- logger.success(f"Preloaded {successful_loads}/{len(self.model_configs)} models")
95
-
96
- def get_model(self, model_id: str) -> Union[EmbeddingModel, SparseEmbeddingModel]:
97
- """
98
- Retrieve a model instance by its ID, loading it on-demand if necessary.
99
-
100
- Args:
101
- model_id: The ID of the model to retrieve.
102
-
103
- Returns:
104
- The model instance.
105
- """
106
- if model_id not in self.model_configs:
107
- raise ValueError(f"Model '{model_id}' not found in configuration")
108
-
109
- with self._lock:
110
- if model_id in self.models:
111
- return self.models[model_id]
112
-
113
- logger.info(f"🔄 Loading model on-demand: {model_id}")
114
- try:
115
- config = self.model_configs[model_id]
116
- model = self._create_model(config)
117
- model.load()
118
- self.models[model_id] = model
119
- logger.success(f"Loaded: {model_id}")
120
- return model
121
- except Exception as e:
122
- raise RuntimeError(f"Failed to load model {model_id}: {e}")
123
-
124
- def get_model_info(self, model_id: str) -> Dict[str, Any]:
125
- """
126
- Get detailed information about a specific model.
127
-
128
- Args:
129
- model_id: The ID of the model.
130
-
131
- Returns:
132
- A dictionary with model details and load status.
133
- """
134
- if model_id not in self.model_configs:
135
- return {}
136
-
137
- config = self.model_configs[model_id]
138
- is_loaded = model_id in self.models and self.models[model_id]._loaded
139
-
140
- return {
141
- "id": config.id,
142
- "name": config.name,
143
- "type": config.type,
144
- "loaded": is_loaded,
145
- "repository": config.repository,
146
- }
147
-
148
- def generate_api_description(self) -> str:
149
- """Generate a dynamic API description based on available models."""
150
-
151
- dense_models = []
152
- sparse_models = []
153
-
154
- for model_id, config in self.model_configs.items():
155
- if config.type == "sparse-embeddings":
156
- sparse_models.append(f"**{config.name}**")
157
- else:
158
- dense_models.append(f"**{config.name}**")
159
-
160
- description = """
161
- High-performance API for generating text embeddings using multiple model architectures.
162
-
163
- """
164
- if dense_models:
165
- description += "✅ **Dense Embedding Models:**\n"
166
- for model in dense_models:
167
- description += f"- {model}\n"
168
- description += "\n"
169
-
170
- if sparse_models:
171
- description += "🔤 **Sparse Embedding Models:**\n"
172
- for model in sparse_models:
173
- description += f"- {model}\n"
174
- description += "\n"
175
-
176
- # Add features section
177
- description += """
178
- 🚀 **Features:**
179
- - Single text embedding generation
180
- - Batch text embedding processing
181
- - Both dense and sparse vector outputs
182
- - Automatic model type detection
183
- - List all available models with status
184
- - Fast response times with preloading
185
-
186
- 📊 **Statistics:**
187
- """
188
- description += f"- Total configured models: **{len(self.model_configs)}**\n"
189
- description += f"- Dense embedding models: **{len(dense_models)}**\n"
190
- description += f"- Sparse embedding models: **{len(sparse_models)}**\n"
191
- description += """
192
-
193
- ⚠️ Note: This is a development API. For production use, must deploy on cloud like TGI Huggingface, AWS, GCP etc
194
- """
195
- return description.strip()
196
-
197
- def list_models(self) -> List[Dict[str, Any]]:
198
- """List all available models with their configurations and load status."""
199
- return [self.get_model_info(model_id) for model_id in self.model_configs.keys()]
200
-
201
- def get_memory_usage(self) -> Dict[str, Any]:
202
- """Get memory usage statistics for loaded models."""
203
- loaded_models = []
204
- for model_id, model in self.models.items():
205
- if model._loaded:
206
- loaded_models.append(
207
- {
208
- "id": model_id,
209
- "type": self.model_configs[model_id].type,
210
- "name": model.config.name,
211
- }
212
- )
213
-
214
- return {
215
- "total_available": len(self.model_configs),
216
- "loaded_count": len(loaded_models),
217
- "loaded_models": loaded_models,
218
- "preload_complete": self._preload_complete,
219
- }
220
-
221
- def unload_all_models(self) -> None:
222
- """Unload all models and clear the model cache."""
223
- with self._lock:
224
- count = len(self.models)
225
- for model in self.models.values():
226
- model.unload()
227
- self.models.clear()
228
- self._preload_complete = False
229
- logger.info(f"Unloaded {count} models")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
core/sparse.py DELETED
@@ -1,123 +0,0 @@
1
- from typing import Any, Dict, List, Optional
2
- from sentence_transformers import SparseEncoder
3
- from loguru import logger
4
-
5
- from ..src.core.config import ModelConfig
6
-
7
-
8
- class SparseEmbeddingModel:
9
- """
10
- Sparse embedding model wrapper.
11
-
12
- Attributes:
13
- config: ModelConfig instance
14
- model: SparseEncoder instance
15
- _loaded: Flag indicating if the model is loaded
16
- """
17
-
18
- def __init__(self, config: ModelConfig):
19
- self.config = config
20
- self.model: Optional[SparseEncoder] = None
21
- self._loaded = False
22
-
23
- def load(self) -> None:
24
- """Load the sparse embedding model."""
25
- if self._loaded:
26
- return
27
-
28
- logger.info(f"Loading sparse model: {self.config.name}")
29
- try:
30
- self.model = SparseEncoder(self.config.name)
31
- self._loaded = True
32
- logger.success(f"Loaded sparse model: {self.config.id}")
33
- except Exception as e:
34
- logger.error(f"Failed to load sparse model {self.config.id}: {e}")
35
- raise
36
-
37
- def query_embed(
38
- self, text: List[str], prompt: Optional[str] = None
39
- ) -> Dict[Any, Any]:
40
- """
41
- Generate a sparse embedding for a single text.
42
-
43
- Args:
44
- text: Input text
45
- prompt: Optional prompt for instruction-based models
46
- Returns:
47
- Sparse embedding as a dictionary with 'indices' and 'values' keys.
48
- """
49
- if not self._loaded:
50
- self.load()
51
-
52
- try:
53
- tensor = self.model.encode_query(text)
54
-
55
- values = tensor[0].coalesce().values().tolist()
56
- indices = tensor[0].coalesce().indices()[0].tolist()
57
-
58
- return {"indices": indices, "values": values}
59
- except Exception as e:
60
- logger.error(f"Embedding error: {e}")
61
- raise
62
-
63
- def embed_documents(
64
- self, text: List[str], prompt: Optional[str] = None
65
- ) -> Dict[Any, Any]:
66
- """
67
- Generate a sparse embedding for a single text.
68
-
69
- Args:
70
- text: Input text
71
- prompt: Optional prompt for instruction-based models
72
-
73
- Returns:
74
- Sparse embedding as a dictionary with 'indices' and 'values' keys.
75
- """
76
-
77
- try:
78
- tensor = self.model.encode(text)
79
-
80
- values = tensor[0].coalesce().values().tolist()
81
- indices = tensor[0].coalesce().indices()[0].tolist()
82
-
83
- return {"indices": indices, "values": values}
84
-
85
- except Exception as e:
86
- logger.error(f"Embedding error: {e}")
87
- raise
88
-
89
- def embed_batch(
90
- self, texts: List[str], prompt: Optional[str] = None
91
- ) -> List[Dict[str, Any]]:
92
- """
93
- Generate sparse embeddings for a batch of texts.
94
-
95
- Args:
96
- texts: List of input texts
97
- prompt: Optional prompt for instruction-based models
98
-
99
- Returns:
100
- List of sparse embeddings as dictionaries with 'text' and 'sparse_embedding' keys.
101
- """
102
- if not self._loaded:
103
- self.load()
104
-
105
- try:
106
- tensors = self.model.encode(texts)
107
- results = []
108
-
109
- for i, tensor in enumerate(tensors):
110
- values = tensor.coalesce().values().tolist()
111
- indices = tensor.coalesce().indices()[0].tolist()
112
-
113
- results.append(
114
- {
115
- "text": texts[i],
116
- "sparse_embedding": {"indices": indices, "values": values},
117
- }
118
- )
119
-
120
- return results
121
- except Exception as e:
122
- logger.error(f"Sparse embedding generation failed: {e}")
123
- raise
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
models/__init__.py DELETED
@@ -1,20 +0,0 @@
1
- # app/models/__init__.py
2
- from .model import (
3
- BatchEmbedRequest,
4
- BatchEmbedResponse,
5
- EmbedRequest,
6
- EmbedResponse,
7
- SparseEmbedResponse,
8
- SparseEmbedding,
9
- BatchSparseEmbedResponse,
10
- )
11
-
12
- __all__ = [
13
- "EmbedRequest",
14
- "EmbedResponse",
15
- "BatchEmbedRequest",
16
- "BatchEmbedResponse",
17
- "SparseEmbedding",
18
- "SparseEmbedResponse",
19
- "BatchSparseEmbedResponse",
20
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
models/model.py DELETED
@@ -1,110 +0,0 @@
1
- from typing import List, Optional
2
- from pydantic import BaseModel
3
-
4
-
5
- class EmbedRequest(BaseModel):
6
- """
7
- Request model for single text embedding.
8
-
9
- Attributes:
10
- text: The input text to embed
11
- model_id: Identifier of the model to use
12
- prompt: Optional prompt for instruction-based models
13
- """
14
-
15
- text: str
16
- model_id: str
17
- prompt: Optional[str] = None
18
-
19
-
20
- class BatchEmbedRequest(BaseModel):
21
- """
22
- Request model for batch text embedding.
23
-
24
- Attributes:
25
- texts: List of input texts to embed
26
- model_id: Identifier of the model to use
27
- prompt: Optional prompt for instruction-based models
28
- """
29
-
30
- texts: List[str]
31
- model_id: str
32
- prompt: Optional[str] = None
33
-
34
-
35
- class EmbedResponse(BaseModel):
36
- """
37
- Response model for single text embedding.
38
-
39
- Attributes:
40
- embedding: The generated embedding vector
41
- dimension: Dimensionality of the embedding
42
- model_id: Identifier of the model used
43
- processing_time: Time taken to process the request
44
- """
45
-
46
- embedding: List[float]
47
- dimension: int
48
- model_id: str
49
- processing_time: float
50
-
51
-
52
- class BatchEmbedResponse(BaseModel):
53
- """
54
- Response model for batch text embedding.
55
-
56
- Attributes:
57
- embeddings: List of generated embedding vectors
58
- dimension: Dimensionality of the embeddings
59
- model_id: Identifier of the model used
60
- processing_time: Time taken to process the request
61
- """
62
-
63
- embeddings: List[List[float]]
64
- dimension: int
65
- model_id: str
66
- processing_time: float
67
-
68
-
69
- class SparseEmbedding(BaseModel):
70
- """
71
- Sparse embedding model.
72
-
73
- Attributes:
74
- text: The input text that was embedded
75
- indices: Indices of non-zero elements in the sparse vector
76
- values: Values corresponding to the indices
77
- """
78
-
79
- text: Optional[str] = None
80
- indices: List[int]
81
- values: List[float]
82
-
83
-
84
- class SparseEmbedResponse(BaseModel):
85
- """
86
- Sparse embedding response model.
87
-
88
- Attributes:
89
- sparse_embedding: The generated sparse embedding
90
- model_id: Identifier of the model used
91
- processing_time: Time taken to process the request
92
- """
93
-
94
- sparse_embedding: SparseEmbedding
95
- model_id: str
96
- processing_time: float
97
-
98
-
99
- class BatchSparseEmbedResponse(BaseModel):
100
- """
101
- Batch sparse embedding response model.
102
-
103
- Attributes:
104
- embeddings: List of generated sparse embeddings
105
- model_id: Identifier of the model used
106
- """
107
-
108
- embeddings: List[SparseEmbedding]
109
- model_id: str
110
- processing_time: float