Spaces:

fahmiaziz
/

api-embedding

Running

App Files Files Community

fahmiaziz98 commited on Nov 4

Commit

9847166

1 Parent(s): 3b88f19

init README

Browse files

Files changed (8) hide show

API.md +729 -0
README.md +285 -30
core/__init__.py +0 -3
core/embedding.py +0 -81
core/model_manager.py +0 -229
core/sparse.py +0 -123
models/__init__.py +0 -20
models/model.py +0 -110

API.md ADDED Viewed

	@@ -0,0 +1,729 @@

+# 📖 Unified Embedding API Documentation
+Complete API reference for the Unified Embedding API v3.0.0.
+**Features:** Dense Embeddings, Sparse Embeddings, and Document Reranking
+---
+## 🌐 Base URL
+```
+https://fahmiaziz-api-embedding.hf.space
+```
+For local development:
+```
+http://localhost:7860
+```
+---
+## 🔑 Authentication
+**Currently no authentication required.**
+---
+## 📊 Endpoints Overview
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/v1/embeddings/embed` | POST | Generate document embeddings |
+| `/api/v1/embeddings/query` | POST | Generate query embeddings |
+| `/api/v1/rerank` | POST | Rerank documents by relevance |
+| `/api/v1/models` | GET | List available models |
+| `/api/v1/models/{model_id}` | GET | Get model information |
+| `/health` | GET | Health check |
+| `/` | GET | API information |
+---
+## 🚀 Embedding Endpoints
+### 1. Generate Document Embeddings
+**`POST /api/v1/embeddings/embed`**
+Generate embeddings for document texts. Supports both single and batch processing.
+#### Request Body
+```json
+{
+  "texts": ["string"],           // Required: List of texts (1-100 items)
+  "model_id": "string",          // Required: Model identifier
+  "prompt": "string",            // Optional: Instruction prompt
+  "options": {                   // Optional: Embedding parameters
+    "normalize_embeddings": true,
+    "batch_size": 32,
+    "max_length": 512,
+    "show_progress_bar": false
+  }
+}
+```
+#### Parameters
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `texts` | array[string] | ✅ Yes | List of texts to embed (min: 1, max: 100) |
+| `model_id` | string | ✅ Yes | Model identifier (e.g., "qwen3-0.6b") |
+| `prompt` | string | ❌ No | Instruction prompt for the model |
+| `options` | object | ❌ No | Additional embedding parameters |
+#### Options Parameters
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `normalize_embeddings` | boolean | false | L2 normalize output embeddings |
+| `batch_size` | integer | 32 | Processing batch size (1-256) |
+| `max_length` | integer | 512 | Maximum sequence length (1-8192) |
+| `show_progress_bar` | boolean | false | Display progress during encoding |
+| `precision` | string | float32 | Precision ("float32", "int8", "binary") |
+#### Response - Single Text (Dense)
+```json
+{
+  "embedding": [0.123, -0.456, 0.789, ...],
+  "dimension": 768,
+  "model_id": "qwen3-0.6b",
+  "processing_time": 0.0523
+}
+```
+#### Response - Batch (Dense)
+```json
+{
+  "embeddings": [
+    [0.123, -0.456, ...],
+    [0.234, 0.567, ...],
+    [0.345, -0.678, ...]
+  ],
+  "dimension": 768,
+  "count": 3,
+  "model_id": "qwen3-0.6b",
+  "processing_time": 0.1245
+}
+```
+#### Response - Single Text (Sparse)
+```json
+{
+  "sparse_embedding": {
+    "text": "Hello world",
+    "indices": [10, 25, 42, 100],
+    "values": [0.85, 0.62, 0.91, 0.73]
+  },
+  "model_id": "splade-pp-v2",
+  "processing_time": 0.0421
+}
+```
+#### Response - Batch (Sparse)
+```json
+{
+  "embeddings": [
+    {
+      "text": "First doc",
+      "indices": [10, 25, 42],
+      "values": [0.85, 0.62, 0.91]
+    },
+    {
+      "text": "Second doc",
+      "indices": [15, 30, 50],
+      "values": [0.73, 0.88, 0.65]
+    }
+  ],
+  "count": 2,
+  "model_id": "splade-pp-v2",
+  "processing_time": 0.0892
+}
+```
+#### Examples
+**Single Text (Dense Model):**
+```bash
+curl -X 'POST' \
+  'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "texts": ["What is artificial intelligence?"],
+  "model_id": "qwen3-0.6b"
+}'
+```
+**Single Text (Sparse Model):**
+```bash
+curl -X 'POST' \
+  'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "texts": ["Hello world"],
+  "model_id": "splade-pp-v2"
+}'
+```
+**Batch (with Options):**
+```bash
+curl -X 'POST' \
+  'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "texts": [
+    "First document to embed",
+    "Second document to embed",
+    "Third document to embed"
+  ],
+  "model_id": "qwen3-0.6b",
+  "options": {
+    "normalize_embeddings": true,
+    "batch_size": 32
+  }
+}'
+```
+**Python Example:**
+```python
+import requests
+url = "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed"
+payload = {
+    "texts": ["Hello world"],
+    "model_id": "qwen3-0.6b"
+}
+response = requests.post(url, json=payload)
+data = response.json()
+print(f"Embedding dimension: {data['dimension']}")
+print(f"Processing time: {data['processing_time']:.3f}s")
+```
+---
+### 2. Generate Query Embeddings
+**`POST /api/v1/embeddings/query`**
+Generate embeddings optimized for search queries. Some models differentiate between query and document embeddings.
+#### Request Body
+Same as `/embed` endpoint.
+```json
+{
+  "texts": ["string"],
+  "model_id": "string",
+  "prompt": "string",
+  "options": {}
+}
+```
+#### Response
+Same format as `/embed` endpoint.
+#### Examples
+**Single Query:**
+```bash
+curl -X 'POST' \
+  'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/query' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "texts": ["What is machine learning?"],
+  "model_id": "qwen3-0.6b",
+  "prompt": "Represent this query for retrieval",
+  "options": {
+    "normalize_embeddings": true
+  }
+}'
+```
+**Batch Queries:**
+```bash
+curl -X 'POST' \
+  'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/query' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "texts": [
+    "First query",
+    "Second query",
+    "Third query"
+  ],
+  "model_id": "qwen3-0.6b"
+}'
+```
+**Python Example:**
+```python
+import requests
+url = "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/query"
+payload = {
+    "texts": ["What is AI?"],
+    "model_id": "qwen3-0.6b",
+    "options": {
+        "normalize_embeddings": True
+    }
+}
+response = requests.post(url, json=payload)
+embedding = response.json()["embedding"]
+```
+---
+### 3. Rerank Documents
+**`POST /api/v1/rerank`**
+Rerank documents based on their relevance to a query using CrossEncoder models.
+#### Request Body
+```json
+{
+  "query": "string",             // Required: Search query
+  "documents": ["string"],       // Required: List of documents (min: 1)
+  "model_id": "string",          // Required: Reranking model identifier
+  "top_k": integer,              // Required: Number of top results to return
+}
+```
+#### Parameters
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `query` | string | ✅ Yes | Search query text |
+| `documents` | array[string] | ✅ Yes | List of documents to rerank (min: 1) |
+| `model_id` | string | ✅ Yes | Reranking model identifier |
+| `top_k` | integer | ✅ Yes | Maximum number of results to return |
+#### Response
+```json
+{
+  "model_id": "jina-reranker-v3",
+  "processing_time": 0.56,
+  "query": "Python for data science",
+  "results": [
+    {
+      "index": 0,
+      "score": 0.95,
+      "text": "Python is excellent for data science"
+    },
+    {
+      "index": 2,
+      "score": 0.73,
+      "text": "R is also used in data science"
+    }
+  ]
+}
+```
+#### Response Fields
+| Field | Type | Description |
+|-------|------|-------------|
+| `model_id` | string | Model identifier used |
+| `processing_time` | float | Processing time in seconds |
+| `query` | string | Original search query |
+| `results` | array | Reranked documents with scores |
+| `results[].index` | integer | Original index in input documents |
+| `results[].score` | float | Relevance score (0-1, normalized) |
+| `results[].text` | string | Document text |
+#### Examples
+**Basic Reranking:**
+```bash
+curl -X 'POST' \
+  'https://fahmiaziz-api-embedding.hf.space/api/v1/rerank' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "query": "Python for data science",
+  "documents": [
+    "Python is great for data science",
+    "Java is used for enterprise applications",
+    "R is also used in data science",
+    "JavaScript is for web development"
+  ],
+  "model_id": "jina-reranker-v3",
+  "top_k": 2
+}'
+```
+**Python Example:**
+```python
+import requests
+url = "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank"
+payload = {
+    "query": "best programming language for beginners",
+    "documents": [
+        "Python is beginner-friendly with simple syntax",
+        "C++ is powerful but complex for beginners",
+        "JavaScript is essential for web development",
+        "Rust offers memory safety but steep learning curve"
+    ],
+    "model_id": "jina-reranker-v3",
+    "top_k": 2
+}
+response = requests.post(url, json=payload)
+data = response.json()
+print(f"Top result: {data['results'][0]['text']}")
+print(f"Score: {data['results'][0]['score']:.3f}")
+```
+**JavaScript Example:**
+```javascript
+const url = "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank";
+const response = await fetch(url, {
+  method: "POST",
+  headers: { "Content-Type": "application/json" },
+  body: JSON.stringify({
+    query: "AI applications",
+    documents: [
+      "Computer vision for image recognition",
+      "Recipe for chocolate cake",
+      "Natural language processing for chatbots",
+      "Travel guide to Paris"
+    ],
+    model_id: "jina-reranker-v3",
+    top_k: 2
+  })
+});
+const { results } = await response.json();
+console.log("Top results:", results);
+```
+---
+## 🤖 Model Management
+### 3. List Available Models
+**`GET /api/v1/models`**
+Get a list of all available embedding models.
+#### Response
+```json
+{
+  "models": [
+    {
+      "id": "qwen3-0.6b",
+      "name": "Qwen/Qwen3-Embedding-0.6B",
+      "type": "embeddings",
+      "loaded": true,
+      "repository": "https://huggingface.co/Qwen/Qwen3-Embedding-0.6B"
+    },
+    {
+      "id": "splade-pp-v2",
+      "name": "prithivida/Splade_PP_en_v2",
+      "type": "sparse-embeddings",
+      "loaded": true,
+      "repository": "https://huggingface.co/prithivida/Splade_PP_en_v2"
+    }
+  ],
+  "total": 2
+}
+```
+#### Example
+```bash
+curl -X 'GET' \
+  'https://fahmiaziz-api-embedding.hf.space/api/v1/models' \
+  -H 'accept: application/json'
+```
+---
+### 4. Get Model Information
+**`GET /api/v1/models/{model_id}`**
+Get detailed information about a specific model.
+#### Parameters
+| Parameter | Type | Required | Description |
+|-----------|------|----------|-------------|
+| `model_id` | string | ✅ Yes | Model identifier |
+#### Response
+```json
+{
+  "id": "qwen3-0.6b",
+  "name": "Qwen/Qwen3-Embedding-0.6B",
+  "type": "embeddings",
+  "loaded": true,
+  "repository": "https://huggingface.co/Qwen/Qwen3-Embedding-0.6B"
+}
+```
+#### Example
+```bash
+curl -X 'GET' \
+  'https://fahmiaziz-api-embedding.hf.space/api/v1/models/qwen3-0.6b' \
+  -H 'accept: application/json'
+```
+---
+## 🏥 System Endpoints
+### 5. Health Check
+**`GET /health`**
+Check API health status.
+#### Response
+```json
+{
+  "status": "ok",
+  "total_models": 2,
+  "loaded_models": 2,
+  "startup_complete": true
+}
+```
+#### Example
+```bash
+curl -X 'GET' \
+  'https://fahmiaziz-api-embedding.hf.space/health' \
+  -H 'accept: application/json'
+```
+---
+### 6. API Information
+**`GET /`**
+Get basic API information.
+#### Response
+```json
+{
+  "message": "Unified Embedding API - Dense & Sparse Embeddings",
+  "version": "3.0.0",
+  "docs_url": "/docs"
+}
+```
+---
+## ❌ Error Responses
+All errors follow this format:
+```json
+{
+  "detail": "Error message description"
+}
+```
+### HTTP Status Codes
+| Code | Description |
+|------|-------------|
+| 200 | Success |
+| 400 | Bad Request - Invalid input |
+| 404 | Not Found - Model not found |
+| 422 | Unprocessable Entity - Validation error |
+| 500 | Internal Server Error |
+| 503 | Service Unavailable - Server not ready |
+### Common Errors
+**Model Not Found (404):**
+```json
+{
+  "detail": "Model 'unknown-model' not found in configuration"
+}
+```
+**Validation Error (422):**
+```json
+{
+  "detail": [
+    {
+      "loc": ["body", "texts"],
+      "msg": "texts list cannot be empty",
+      "type": "value_error"
+    }
+  ]
+}
+```
+**Batch Too Large (422):**
+```json
+{
+  "detail": "Batch size (150) exceeds maximum (100)"
+}
+```
+---
+## 📦 Available Models
+### Dense Embedding Models
+| Model ID | Name | Dimension | Description |
+|----------|------|-----------|-------------|
+| `qwen3-0.6b` | Qwen/Qwen3-Embedding-0.6B | 768 | Efficient multilingual embeddings |
+### Sparse Embedding Models
+| Model ID | Name | Type | Description |
+|----------|------|------|-------------|
+| `splade-pp-v2` | prithivida/Splade_PP_en_v2 | Sparse | SPLADE++ English v2 |
+### Reranking Models
+| Model ID | Name | Type | Description |
+|----------|------|------|-------------|
+| `jina-reranker-v3` | jinaai/jina-reranker-v3-base-en | CrossEncoder | High-quality reranking (English) |
+| `bge-v2-m3` | BAAI/bge-reranker-v2-m3 | CrossEncoder | Multilingual reranking |
+---
+## 🔧 Rate Limits
+**Current Limits:**
+- Max text length: 8,192 characters
+- Max batch size: 100 texts per request
+- No rate limiting (subject to server resources)
+---
+## 💡 Best Practices
+### 1. Batch Processing
+Always batch multiple texts together for better performance:
+```python
+# ❌ Bad - Multiple requests
+for text in texts:
+    response = requests.post(url, json={"texts": [text], ...})
+# ✅ Good - Single batch request
+response = requests.post(url, json={"texts": texts, ...})
+```
+### 2. Normalize Embeddings for Similarity
+For cosine similarity, always normalize:
+```python
+payload = {
+    "texts": ["text"],
+    "model_id": "qwen3-0.6b",
+    "options": {"normalize_embeddings": True}
+}
+```
+### 3. Model Selection
+- **Dense models** (qwen3-0.6b): Best for semantic similarity
+- **Sparse models** (splade-pp-v2): Best for keyword matching + semantic
+- **Rerank models** (jina-reranker-v3): Best for re-scoring top candidates
+### 4. Two-Stage Retrieval (Recommended for RAG)
+```python
+# Stage 1: Fast retrieval with embeddings (top 100)
+query_embedding = embed_query(query)
+candidates = vector_search(query_embedding, top_k=100)
+# Stage 2: Precise reranking (top 10)
+reranked = rerank(
+    query=query,
+    documents=[c["text"] for c in candidates],
+    model_id="jina-reranker-v3",
+    top_k=10
+)
+```
+### 5. Error Handling
+Always handle errors gracefully:
+```python
+try:
+    response = requests.post(url, json=payload)
+    response.raise_for_status()
+    data = response.json()
+except requests.exceptions.HTTPError as e:
+    print(f"HTTP error: {e}")
+except requests.exceptions.RequestException as e:
+    print(f"Request failed: {e}")
+```
+---
+## 🐛 Troubleshooting
+### Empty Response
+- Check `texts` field is not empty
+- Validate `model_id` exists
+### Slow Performance
+- Use batch requests instead of multiple single requests
+- Reduce `batch_size` in options if memory issues
+- Check model is preloaded (first request is slower)
+### Connection Errors
+- Verify base URL is correct
+- Check network connectivity
+- Ensure server is running (`/health` endpoint)
+---
+## 📞 Support
+- **Documentation**: [GitHub README](https://github.com/fahmiaziz/unified-embedding-api)
+- **Issues**: [GitHub Issues](https://github.com/fahmiaziz/unified-embedding-api/issues)
+- **Hugging Face Space**: [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
+---
+## 🔄 Changelog
+### v3.0.0 (Current)
+- ✨ Added reranking endpoint (`/api/v1/rerank`)
+- ✨ Support for CrossEncoder models
+- ✨ Unified batch-only response format
+- ✨ Flexible kwargs support
+- ✨ In-memory caching
+- ✨ Improved error handling
+- ✨ Comprehensive documentation
+- 🐛 Fixed type hint errors in RerankModel
+- 🐛 Fixed duplicate parameter errors in rerank endpoint
+---
+**Last Updated**: 2025-11-02

README.md CHANGED Viewed

@@ -11,54 +11,85 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
 # 🧠 Unified Embedding API
-> 🧩 Unified API for all your Embedding & Sparse needs — plug and play with any model from Hugging Face or your own fine-tuned versions. This official repository from huggingface space
 ---
 ## 🚀 Overview
-**Unified Embedding API** is a modular and open-source **RAG-ready API** built for developers who want a simple, unified way to access **dense**, and **sparse** models.
 It’s designed for **vector search**, **semantic retrieval**, and **AI-powered pipelines** — all controlled from a single `config.yaml` file.
 ⚠️ **Note:** This is a development API.
-For production deployment, host it on cloud platforms such as **Hugging Face TGI**, **AWS**, or **GCP**.
 ---
 ## 🧩 Features
 - 🧠 **Unified Interface** — One API to handle dense, sparse, and reranking models.
-- ⚙️ **Configurable** — Switch models instantly via `config.yaml`.
 - 🔍 **Vector DB Ready** — Easily integrates with FAISS, Chroma, Qdrant, Milvus, etc.
 - 📈 **RAG Support** — Perfect base for Retrieval-Augmented Generation systems.
 - ⚡ **Fast & Lightweight** — Powered by FastAPI and optimized with async processing.
-- 🧰 **Extendable** — Add your own models or pipelines effortlessly.
 ---
 ## 📁 Project Structure
 ```
 unified-embedding-api/
 │
-├── core/
-│   ├── embedding.py
-│   └── model_manager.py
-├── models/
-|   └──model.py
-├── app.py                   # Entry point (FastAPI server)
-|── config.yaml              # Model + system configuration
-├── Dockerfile
 ├── requirements.txt
 └── README.md
 ```
 ---
 ## 🧩 Model Selection
-Default configuration is optimized for **CPU 2vCPU / 16GB RAM**. See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for memory usage reference.
 ⚠️ If you plan to use larger models like `Qwen2-embedding-8B`, please upgrade your Space.
@@ -66,37 +97,261 @@ Default configuration is optimized for **CPU 2vCPU / 16GB RAM**. See [MTEB Leade
 ## ☁️ How to Deploy (Free 🚀)
-Deploy your **custom Embedding API** on **Hugging Face Spaces** — free, fast, and serverless.
-### 🔧 Steps:
-1. **Clone this Space Template:**
-   👉 [Hugging Face Space — fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
-2. **Edit `config.yaml`** to set your own model names and backend preferences.
-3. **Push your code** — Spaces will automatically rebuild and host your API.
-That’s it! You now have a live embedding API endpoint powered by your models.
-📘 **Tutorial Reference:**
 - [Deploy Applications on Hugging Face Spaces (Official Guide)](https://huggingface.co/blog/HemanthSai7/deploy-applications-on-huggingface-spaces)
 - [How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository by Ruslanmv](https://github.com/ruslanmv/How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository?tab=readme-ov-file)
 ---
-## 🧑‍💻 Contributing
-Contributions are welcome!
-Please open an issue or submit a pull request to discuss changes.
 ---
-## ⚠️ License
-MIT License © 2025
-Developed with ❤️ by the Open-Source Community.
 ---
 > ✨ “Unify your embeddings. Simplify your AI stack.”

 # 🧠 Unified Embedding API
+> 🧩 Unified API for all your Embedding, Sparse & Reranking Models — plug and play with any model from Hugging Face or your own fine-tuned versions.
 ---
 ## 🚀 Overview
+**Unified Embedding API** is a modular and open-source **RAG-ready API** built for developers who want a simple, unified way to access **dense**, **sparse**, and **reranking** models.
 It’s designed for **vector search**, **semantic retrieval**, and **AI-powered pipelines** — all controlled from a single `config.yaml` file.
 ⚠️ **Note:** This is a development API.
+For production deployment, host it on cloud platforms such as **Hugging Face TEI**, **AWS**, **GCP**, or any cloud provider of your choice.
 ---
 ## 🧩 Features
 - 🧠 **Unified Interface** — One API to handle dense, sparse, and reranking models.
+- ⚡ **Batch Processing** — Automatic single/batch.
+- 🔧 **Flexible Parameters** — Full control via kwargs and options
 - 🔍 **Vector DB Ready** — Easily integrates with FAISS, Chroma, Qdrant, Milvus, etc.
 - 📈 **RAG Support** — Perfect base for Retrieval-Augmented Generation systems.
 - ⚡ **Fast & Lightweight** — Powered by FastAPI and optimized with async processing.
+- 🧰 **Extendable** —  Switch models instantly via `config.yaml` and add your own models or pipelines effortlessly.
 ---
 ## 📁 Project Structure
 ```
 unified-embedding-api/
+├── src/
+│   ├── api/
+│   │   ├── dependencies.py
+│   │   └── routes/
+│   │       ├── embeddings.py  # endpoint sparse & dense
+│   │       ├── models.py
+│   │       |── health.py
+│   │       └── rerank.py       # endpoint reranking
+│   ├── core/
+│   │   ├── base.py
+│   │   ├── config.py
+│   │   ├── exceptions.py
+│   │   └── manager.py
+│   ├── models/
+│   │   ├── embeddings/
+│   │   │   ├── dense.py        # dense model
+│   │   │   └── sparse.py       # sparse model
+│   │   │   └── rank.py         # reranking model
+│   │   └── schemas/
+│   │       ├── common.py
+│   │       ├── requests.py
+│   │       └── responses.py
+│   ├── config/
+│   │   ├── settings.py
+│   │   └── models.yaml         # add/change models here
+│   └── utils/
+│       ├── logger.py
+│       └── validators.py
 │
+├── app.py
 ├── requirements.txt
+├── LICENSE
+├── Dockerfile
 └── README.md
 ```
 ---
 ## 🧩 Model Selection
+Default configuration is optimized for **CPU 2vCPU / 16GB RAM**. See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for model recommendations and memory usage reference.
+**Add More Models:** Edit `src/config/models.yaml`
+```yaml
+models:
+  your-model-name:
+    name: "org/model-name"
+    type: "embeddings"  # or "sparse-embeddings" or "rerank"
+```
 ⚠️ If you plan to use larger models like `Qwen2-embedding-8B`, please upgrade your Space.
 ## ☁️ How to Deploy (Free 🚀)
+Deploy your **Custom Embedding API** on **Hugging Face Spaces** — free, fast, and serverless.
+### **1️⃣ Deploy on Hugging Face Spaces (Free!)**
+1. **Duplicate this Space:**
+   👉 [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
+   Click **⋯** (three dots) → **Duplicate this Space**
+2. **Add HF_TOKEN environment variable**  Make sure your space is public
+3. **Clone your Space locally:**
+   Click **⋯** → **Clone repository**
+   ```bash
+   git clone https://huggingface.co/spaces/YOUR_USERNAME/api-embedding
+   cd api-embedding
+   ```
+4. **Edit `src/config/models.yaml`** to customize models:
+   ```yaml
+   models:
+     your-model:
+       name: "org/model-name"
+       type: "embeddings"  # or "sparse-embeddings" or "rerank"
+   ```
+5. **Commit and push changes:**
+   ```bash
+   git add src/config/models.yaml
+   git commit -m "Update models configuration"
+   git push
+   ```
+6. **Access your API:**
+  Click **⋯** →  **Embed this Space** -> copy **Direct URL**
+   ```
+   https://YOUR_USERNAME-api-embedding.hf.space
+   https://YOUR_USERNAME-api-embedding.hf.space/docs  # Interactive docs
+   ```
+That’s it! You now have a live embedding API endpoint powered by your models.
+### **2️⃣ Run Locally (NOT RECOMMENDED)**
+```bash
+# Clone repository
+git clone https://github.com/fahmiaziz98/unified-embedding-api.git
+cd unified-embedding-api
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate
+# Install dependencies
+pip install -r requirements.txt
+# Run server
+python app.py
+```
+API available at: `http://localhost:7860`
+### **3️⃣ Run with Docker**
+```bash
+# Build and run
+docker-compose up --build
+# Or with Docker only
+docker build -t embedding-api .
+docker run -p 7860:7860 embedding-api
+```
+## 📖 Usage Examples
+### **Python**
+```python
+import requests
+url = "http://localhost:7860/api/v1/embeddings/embed"
+# Single embedding
+response = requests.post(url, json={
+    "texts": ["What is artificial intelligence?"],
+    "model_id": "qwen3-0.6b"
+})
+print(response.json())
+# Batch embeddings
+response = requests.post(url, json={
+    "texts": [
+        "First document",
+        "Second document",
+        "Third document"
+    ],
+    "model_id": "qwen3-0.6b",
+    "options": {
+        "normalize_embeddings": True
+    }
+})
+embeddings = response.json()["embeddings"]
+```
+### **cURL**
+```bash
+# Single embedding (Dense)
+curl -X POST "http://localhost:7860/api/v1/embeddings/embed" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "texts": ["Hello world"],
+    "prompt": "add instructions here",
+    "model_id": "qwen3-0.6b"
+  }'
+# Batch embeddings (Sparse)
+curl -X POST "http://localhost:7860/api/v1/embeddings/embed" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "texts": ["First doc", "Second doc", "Third doc"],
+    "model_id": "splade-pp-v2"
+  }'
+# Reranking
+curl -X POST "http://localhost:7860/api/v1/rerank" \
+  -H "Content-Type: application/json" \
+  -d '{
+  "documents": [
+    "Python is a popular language for data science due to its extensive libraries.",
+    "R is widely used in statistical computing and data analysis.",
+    "Java is a versatile language used in various applications, including data science.",
+    "SQL is essential for managing and querying relational databases.",
+    "Julia is a high-performance language gaining popularity for numerical computing and data science."
+  ],
+  "model_id": "bge-v2-m3",
+  "query": "Python best programming languages for data science",
+  "top_k": 3
+}'
+# Query embedding with options
+curl -X POST "http://localhost:7860/api/v1/embeddings/query" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "texts": ["What is machine learning?"],
+    "model_id": "qwen3-0.6b",
+    "options": {
+      "normalize_embeddings": true,
+      "batch_size": 32
+    }
+  }'
+```
+### **JavaScript/TypeScript**
+```typescript
+const url = "http://localhost:7860/api/v1/embeddings/embed";
+const response = await fetch(url, {
+  method: "POST",
+  headers: {
+    "Content-Type": "application/json",
+  },
+  body: JSON.stringify({
+    texts: ["Hello world"],
+    model_id: "qwen3-0.6b",
+  }),
+});
+const data = await response.json();
+console.log(data.embedding);
+```
+---
+## 📊 API Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/v1/embeddings/embed` | POST | Generate document embeddings (single/batch) |
+| `/api/v1/embeddings/query` | POST | Generate query embeddings (single/batch) |
+| `/api/v1/rerank` | POST | Rerank documents based on a query |
+| `/api/v1/models` | GET | List available models |
+| `/api/v1/models/{model_id}` | GET | Get model information |
+| `/health` | GET | Health check |
+| `/` | GET | API information |
+| `/docs` | GET | Interactive API documentation |
+### 🤝 Contributing
+Contributions are welcome! Please:
+1. Fork the repository
+2. Create a feature branch (`git checkout -b feature/amazing-feature`)
+3. Commit your changes (`git commit -m 'Add amazing feature'`)
+4. Push to the branch (`git push origin feature/amazing-feature`)
+5. Open a Pull Request
+**Development Setup:**
+```bash
+git clone https://github.com/fahmiaziz/unified-embedding-api.git
+cd unified-embedding-api
+pip install -r requirements-dev.txt
+pre-commit install  # (optional)
+```
+---
+## 📚 Resources
+- [API Documentation](API.md)
+- [Sentence Transformers](https://www.sbert.net/)
+- [FastAPI Docs](https://fastapi.tiangolo.com/)
+- [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
+- [Hugging Face Spaces](https://huggingface.co/docs/hub/spaces)
 - [Deploy Applications on Hugging Face Spaces (Official Guide)](https://huggingface.co/blog/HemanthSai7/deploy-applications-on-huggingface-spaces)
 - [How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository by Ruslanmv](https://github.com/ruslanmv/How-to-Sync-Hugging-Face-Spaces-with-a-GitHub-Repository?tab=readme-ov-file)
+- [Duplicate & Clone space to local machine](https://huggingface.co/docs/hub/spaces-overview#duplicating-a-space)
+---
 ---
+## 📝 License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+---
+## 🙏 Acknowledgments
+- **Sentence Transformers** for the embedding models
+- **FastAPI** for the excellent web framework
+- **Hugging Face** for model hosting and Spaces
+- **Open Source Community** for inspiration and support
 ---
+## 📞 Support
+- **Issues:** [GitHub Issues](https://github.com/fahmiaziz/unified-embedding-api/issues)
+- **Discussions:** [GitHub Discussions](https://github.com/fahmiaziz/unified-embedding-api/discussions)
+- **Hugging Face Space:** [fahmiaziz/api-embedding](https://huggingface.co/spaces/fahmiaziz/api-embedding)
 ---
 > ✨ “Unify your embeddings. Simplify your AI stack.”
+<div align="center">
+**⭐ Star this repo if you find it useful!**
+Made with ❤️ by the Open-Source Community
+</div>

core/__init__.py DELETED Viewed

@@ -1,3 +0,0 @@
-from .model_manager import ModelManager
-__all__ = ["ModelManager"]

core/embedding.py DELETED Viewed

@@ -1,81 +0,0 @@
-from typing import List, Optional
-from sentence_transformers import SentenceTransformer
-from loguru import logger
-from ..src.core.config import ModelConfig
-class EmbeddingModel:
-    """
-    Embedding model wrapper for dense embeddings.
-    attributes:
-        config: ModelConfig instance
-        model: SentenceTransformer instance
-        _loaded: Flag indicating if the model is loaded
-    """
-    def __init__(self, config: ModelConfig):
-        self.config = config
-        self.model: Optional[SentenceTransformer] = None
-        self._loaded = False
-    def load(self) -> None:
-        """Load the embedding model."""
-        if self._loaded:
-            return
-        logger.info(f"Loading embedding model: {self.config.name}")
-        try:
-            self.model = SentenceTransformer(
-                self.config.name, device="cpu", trust_remote_code=True
-            )
-            self._loaded = True
-            logger.success(f"Loaded embedding model: {self.config.id}")
-        except Exception as e:
-            logger.error(f"Failed to load embedding model {self.config.id}: {e}")
-            raise
-    def query_embed(self, text: List[str], prompt: Optional[str] = None) -> List[float]:
-        """
-        method to generate embedding for a single text.
-        Args:
-            text: Input text
-            prompt: Optional prompt for instruction-based models
-        Returns:
-            Embedding vector
-        """
-        if not self._loaded:
-            self.load()
-        try:
-            embeddings = self.model.encode_query(text, prompt=prompt)
-            return [embedding.tolist() for embedding in embeddings]
-        except Exception as e:
-            logger.error(f"Embedding generation failed: {e}")
-            raise
-    def embed_documents(
-        self, texts: List[str], prompt: Optional[str] = None
-    ) -> List[List[float]]:
-        """
-        method to generate embeddings for a list of texts.
-        Args:
-            texts: List of input texts
-            prompt: Optional prompt for instruction-based models
-        Returns:
-        List of embedding vectors
-        """
-        if not self._loaded:
-            self.load()
-        try:
-            embeddings = self.model.encode_document(texts, prompt=prompt)
-            return [embedding.tolist() for embedding in embeddings]
-        except Exception as e:
-            logger.error(f"Embedding generation failed: {e}")
-            raise

core/model_manager.py DELETED Viewed

@@ -1,229 +0,0 @@
-import yaml
-from pathlib import Path
-from loguru import logger
-from typing import Dict, List, Any, Union
-from threading import Lock
-from .embedding import EmbeddingModel
-from .sparse import SparseEmbeddingModel
-from ..src.core.config import ModelConfig
-class ModelManager:
-    """
-    Manages multiple embedding models based on a configuration file.
-    Attributes:
-        models: Dictionary mapping model IDs to their instances.
-        model_configs: Dictionary mapping model IDs to their configurations.
-        default_model_id: The default model ID to use if none is specified.
-        _lock: A threading lock for thread-safe operations.
-        _preload_complete: Flag indicating if all models have been preloaded.
-    """
-    def __init__(self, config_path: str = "config.yaml"):
-        self.models: Dict[str, Union[EmbeddingModel, SparseEmbeddingModel]] = {}
-        self.model_configs: Dict[str, ModelConfig] = {}
-        self._lock = Lock()  # For thread safety
-        self._preload_complete = False
-        self._load_config(config_path)
-    def _load_config(self, config_path: str) -> None:
-        """Load model configurations from a YAML file."""
-        config_file = Path(config_path)
-        if not config_file.exists():
-            raise FileNotFoundError(f"Configuration file not found: {config_path}")
-        try:
-            with open(config_file, "r", encoding="utf-8") as f:
-                config = yaml.safe_load(f)
-            for model_id, model_cfg in config["models"].items():
-                self.model_configs[model_id] = ModelConfig(model_id, model_cfg)
-            logger.info(f"Loaded {len(self.model_configs)} model configurations")
-        except Exception as e:
-            raise ValueError(f"Failed to load configuration: {e}")
-    def _create_model(
-        self, config: ModelConfig
-    ) -> Union[EmbeddingModel, SparseEmbeddingModel]:
-        """
-        Factory method to create model instances based on type.
-        Args:
-            config: The ModelConfig instance.
-        Returns:
-            The created model instance.
-        """
-        if config.type == "sparse-embeddings":
-            return SparseEmbeddingModel(config)
-        else:
-            return EmbeddingModel(config)
-    def preload_all_models(self) -> None:
-        """
-        Preload all models defined in the configuration.
-        returns: None
-        """
-        if self._preload_complete:
-            logger.info("Models already preloaded")
-            return
-        logger.info(f"Preloading {len(self.model_configs)} models...")
-        successful_loads = 0
-        for model_id, config in self.model_configs.items():
-            try:
-                with self._lock:
-                    if model_id not in self.models:
-                        model = self._create_model(config)
-                        model.load()
-                        self.models[model_id] = model
-                        successful_loads += 1
-                        logger.debug(f"Preloaded: {model_id}")
-            except Exception as e:
-                logger.error(f"Failed to preload {model_id}: {e}")
-        self._preload_complete = True
-        logger.success(f"Preloaded {successful_loads}/{len(self.model_configs)} models")
-    def get_model(self, model_id: str) -> Union[EmbeddingModel, SparseEmbeddingModel]:
-        """
-        Retrieve a model instance by its ID, loading it on-demand if necessary.
-        Args:
-            model_id: The ID of the model to retrieve.
-        Returns:
-            The model instance.
-        """
-        if model_id not in self.model_configs:
-            raise ValueError(f"Model '{model_id}' not found in configuration")
-        with self._lock:
-            if model_id in self.models:
-                return self.models[model_id]
-            logger.info(f"🔄 Loading model on-demand: {model_id}")
-            try:
-                config = self.model_configs[model_id]
-                model = self._create_model(config)
-                model.load()
-                self.models[model_id] = model
-                logger.success(f"Loaded: {model_id}")
-                return model
-            except Exception as e:
-                raise RuntimeError(f"Failed to load model {model_id}: {e}")
-    def get_model_info(self, model_id: str) -> Dict[str, Any]:
-        """
-        Get detailed information about a specific model.
-        Args:
-            model_id: The ID of the model.
-        Returns:
-            A dictionary with model details and load status.
-        """
-        if model_id not in self.model_configs:
-            return {}
-        config = self.model_configs[model_id]
-        is_loaded = model_id in self.models and self.models[model_id]._loaded
-        return {
-            "id": config.id,
-            "name": config.name,
-            "type": config.type,
-            "loaded": is_loaded,
-            "repository": config.repository,
-        }
-    def generate_api_description(self) -> str:
-        """Generate a dynamic API description based on available models."""
-        dense_models = []
-        sparse_models = []
-        for model_id, config in self.model_configs.items():
-            if config.type == "sparse-embeddings":
-                sparse_models.append(f"**{config.name}**")
-            else:
-                dense_models.append(f"**{config.name}**")
-        description = """
-High-performance API for generating text embeddings using multiple model architectures.
-"""
-        if dense_models:
-            description += "✅ **Dense Embedding Models:**\n"
-            for model in dense_models:
-                description += f"- {model}\n"
-            description += "\n"
-        if sparse_models:
-            description += "🔤 **Sparse Embedding Models:**\n"
-            for model in sparse_models:
-                description += f"- {model}\n"
-            description += "\n"
-        # Add features section
-        description += """
-🚀 **Features:**
-- Single text embedding generation
-- Batch text embedding processing
-- Both dense and sparse vector outputs
-- Automatic model type detection
-- List all available models with status
-- Fast response times with preloading
-📊 **Statistics:**
-"""
-        description += f"- Total configured models: **{len(self.model_configs)}**\n"
-        description += f"- Dense embedding models: **{len(dense_models)}**\n"
-        description += f"- Sparse embedding models: **{len(sparse_models)}**\n"
-        description += """
-⚠️ Note: This is a development API. For production use, must deploy on cloud like TGI Huggingface, AWS, GCP etc
-        """
-        return description.strip()
-    def list_models(self) -> List[Dict[str, Any]]:
-        """List all available models with their configurations and load status."""
-        return [self.get_model_info(model_id) for model_id in self.model_configs.keys()]
-    def get_memory_usage(self) -> Dict[str, Any]:
-        """Get memory usage statistics for loaded models."""
-        loaded_models = []
-        for model_id, model in self.models.items():
-            if model._loaded:
-                loaded_models.append(
-                    {
-                        "id": model_id,
-                        "type": self.model_configs[model_id].type,
-                        "name": model.config.name,
-                    }
-                )
-        return {
-            "total_available": len(self.model_configs),
-            "loaded_count": len(loaded_models),
-            "loaded_models": loaded_models,
-            "preload_complete": self._preload_complete,
-        }
-    def unload_all_models(self) -> None:
-        """Unload all models and clear the model cache."""
-        with self._lock:
-            count = len(self.models)
-            for model in self.models.values():
-                model.unload()
-            self.models.clear()
-            self._preload_complete = False
-            logger.info(f"Unloaded {count} models")

core/sparse.py DELETED Viewed

@@ -1,123 +0,0 @@
-from typing import Any, Dict, List, Optional
-from sentence_transformers import SparseEncoder
-from loguru import logger
-from ..src.core.config import ModelConfig
-class SparseEmbeddingModel:
-    """
-    Sparse embedding model wrapper.
-    Attributes:
-        config: ModelConfig instance
-        model: SparseEncoder instance
-        _loaded: Flag indicating if the model is loaded
-    """
-    def __init__(self, config: ModelConfig):
-        self.config = config
-        self.model: Optional[SparseEncoder] = None
-        self._loaded = False
-    def load(self) -> None:
-        """Load the sparse embedding model."""
-        if self._loaded:
-            return
-        logger.info(f"Loading sparse model: {self.config.name}")
-        try:
-            self.model = SparseEncoder(self.config.name)
-            self._loaded = True
-            logger.success(f"Loaded sparse model: {self.config.id}")
-        except Exception as e:
-            logger.error(f"Failed to load sparse model {self.config.id}: {e}")
-            raise
-    def query_embed(
-        self, text: List[str], prompt: Optional[str] = None
-    ) -> Dict[Any, Any]:
-        """
-        Generate a sparse embedding for a single text.
-        Args:
-            text: Input text
-            prompt: Optional prompt for instruction-based models
-        Returns:
-            Sparse embedding as a dictionary with 'indices' and 'values' keys.
-        """
-        if not self._loaded:
-            self.load()
-        try:
-            tensor = self.model.encode_query(text)
-            values = tensor[0].coalesce().values().tolist()
-            indices = tensor[0].coalesce().indices()[0].tolist()
-            return {"indices": indices, "values": values}
-        except Exception as e:
-            logger.error(f"Embedding error: {e}")
-            raise
-    def embed_documents(
-        self, text: List[str], prompt: Optional[str] = None
-    ) -> Dict[Any, Any]:
-        """
-        Generate a sparse embedding for a single text.
-        Args:
-            text: Input text
-            prompt: Optional prompt for instruction-based models
-        Returns:
-            Sparse embedding as a dictionary with 'indices' and 'values' keys.
-        """
-        try:
-            tensor = self.model.encode(text)
-            values = tensor[0].coalesce().values().tolist()
-            indices = tensor[0].coalesce().indices()[0].tolist()
-            return {"indices": indices, "values": values}
-        except Exception as e:
-            logger.error(f"Embedding error: {e}")
-            raise
-    def embed_batch(
-        self, texts: List[str], prompt: Optional[str] = None
-    ) -> List[Dict[str, Any]]:
-        """
-        Generate sparse embeddings for a batch of texts.
-        Args:
-            texts: List of input texts
-            prompt: Optional prompt for instruction-based models
-        Returns:
-            List of sparse embeddings as dictionaries with 'text' and 'sparse_embedding' keys.
-        """
-        if not self._loaded:
-            self.load()
-        try:
-            tensors = self.model.encode(texts)
-            results = []
-            for i, tensor in enumerate(tensors):
-                values = tensor.coalesce().values().tolist()
-                indices = tensor.coalesce().indices()[0].tolist()
-                results.append(
-                    {
-                        "text": texts[i],
-                        "sparse_embedding": {"indices": indices, "values": values},
-                    }
-                )
-            return results
-        except Exception as e:
-            logger.error(f"Sparse embedding generation failed: {e}")
-            raise

models/__init__.py DELETED Viewed

@@ -1,20 +0,0 @@
-# app/models/__init__.py
-from .model import (
-    BatchEmbedRequest,
-    BatchEmbedResponse,
-    EmbedRequest,
-    EmbedResponse,
-    SparseEmbedResponse,
-    SparseEmbedding,
-    BatchSparseEmbedResponse,
-)
-__all__ = [
-    "EmbedRequest",
-    "EmbedResponse",
-    "BatchEmbedRequest",
-    "BatchEmbedResponse",
-    "SparseEmbedding",
-    "SparseEmbedResponse",
-    "BatchSparseEmbedResponse",
-]

models/model.py DELETED Viewed

@@ -1,110 +0,0 @@
-from typing import List, Optional
-from pydantic import BaseModel
-class EmbedRequest(BaseModel):
-    """
-    Request model for single text embedding.
-    Attributes:
-        text: The input text to embed
-        model_id: Identifier of the model to use
-        prompt: Optional prompt for instruction-based models
-    """
-    text: str
-    model_id: str
-    prompt: Optional[str] = None
-class BatchEmbedRequest(BaseModel):
-    """
-    Request model for batch text embedding.
-    Attributes:
-        texts: List of input texts to embed
-        model_id: Identifier of the model to use
-        prompt: Optional prompt for instruction-based models
-    """
-    texts: List[str]
-    model_id: str
-    prompt: Optional[str] = None
-class EmbedResponse(BaseModel):
-    """
-    Response model for single text embedding.
-    Attributes:
-        embedding: The generated embedding vector
-        dimension: Dimensionality of the embedding
-        model_id: Identifier of the model used
-        processing_time: Time taken to process the request
-    """
-    embedding: List[float]
-    dimension: int
-    model_id: str
-    processing_time: float
-class BatchEmbedResponse(BaseModel):
-    """
-    Response model for batch text embedding.
-    Attributes:
-        embeddings: List of generated embedding vectors
-        dimension: Dimensionality of the embeddings
-        model_id: Identifier of the model used
-        processing_time: Time taken to process the request
-    """
-    embeddings: List[List[float]]
-    dimension: int
-    model_id: str
-    processing_time: float
-class SparseEmbedding(BaseModel):
-    """
-    Sparse embedding model.
-    Attributes:
-        text: The input text that was embedded
-        indices: Indices of non-zero elements in the sparse vector
-        values: Values corresponding to the indices
-    """
-    text: Optional[str] = None
-    indices: List[int]
-    values: List[float]
-class SparseEmbedResponse(BaseModel):
-    """
-    Sparse embedding response model.
-    Attributes:
-        sparse_embedding: The generated sparse embedding
-        model_id: Identifier of the model used
-        processing_time: Time taken to process the request
-    """
-    sparse_embedding: SparseEmbedding
-    model_id: str
-    processing_time: float
-class BatchSparseEmbedResponse(BaseModel):
-    """
-    Batch sparse embedding response model.
-    Attributes:
-        embeddings: List of generated sparse embeddings
-        model_id: Identifier of the model used
-    """
-    embeddings: List[SparseEmbedding]
-    model_id: str
-    processing_time: float