api-embedding / README.md
fahmiaziz98
init README
9847166
|
raw
history blame
10.2 kB
metadata
title: Api Embedding
emoji: 🐠
colorFrom: green
colorTo: purple
sdk: docker
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

🧠 Unified Embedding API

🧩 Unified API for all your Embedding, Sparse & Reranking Models β€” plug and play with any model from Hugging Face or your own fine-tuned versions.


πŸš€ Overview

Unified Embedding API is a modular and open-source RAG-ready API built for developers who want a simple, unified way to access dense, sparse, and reranking models.

It’s designed for vector search, semantic retrieval, and AI-powered pipelines β€” all controlled from a single config.yaml file.

⚠️ Note: This is a development API.
For production deployment, host it on cloud platforms such as Hugging Face TEI, AWS, GCP, or any cloud provider of your choice.


🧩 Features

  • 🧠 Unified Interface β€” One API to handle dense, sparse, and reranking models.
  • ⚑ Batch Processing β€” Automatic single/batch.
  • πŸ”§ Flexible Parameters β€” Full control via kwargs and options
  • πŸ” Vector DB Ready β€” Easily integrates with FAISS, Chroma, Qdrant, Milvus, etc.
  • πŸ“ˆ RAG Support β€” Perfect base for Retrieval-Augmented Generation systems.
  • ⚑ Fast & Lightweight β€” Powered by FastAPI and optimized with async processing.
  • 🧰 Extendable β€” Switch models instantly via config.yaml and add your own models or pipelines effortlessly.

πŸ“ Project Structure

unified-embedding-api/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ dependencies.py
β”‚   β”‚   └── routes/
β”‚   β”‚       β”œβ”€β”€ embeddings.py  # endpoint sparse & dense   
β”‚   β”‚       β”œβ”€β”€ models.py
β”‚   β”‚       |── health.py
β”‚   β”‚       └── rerank.py       # endpoint reranking
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ base.py
β”‚   β”‚   β”œβ”€β”€ config.py
β”‚   β”‚   β”œβ”€β”€ exceptions.py
β”‚   β”‚   └── manager.py
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ embeddings/
β”‚   β”‚   β”‚   β”œβ”€β”€ dense.py        # dense model
β”‚   β”‚   β”‚   └── sparse.py       # sparse model
β”‚   β”‚   β”‚   └── rank.py         # reranking model
β”‚   β”‚   └── schemas/
β”‚   β”‚       β”œβ”€β”€ common.py
β”‚   β”‚       β”œβ”€β”€ requests.py       
β”‚   β”‚       └── responses.py
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   β”œβ”€β”€ settings.py
β”‚   β”‚   └── models.yaml         # add/change models here
β”‚   └── utils/
β”‚       β”œβ”€β”€ logger.py
β”‚       └── validators.py
β”‚
β”œβ”€β”€ app.py                         
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
β”œβ”€β”€ Dockerfile
└── README.md

🧩 Model Selection

Default configuration is optimized for CPU 2vCPU / 16GB RAM. See MTEB Leaderboard for model recommendations and memory usage reference.

Add More Models: Edit src/config/models.yaml

models:
  your-model-name:
    name: "org/model-name"
    type: "embeddings"  # or "sparse-embeddings" or "rerank"

⚠️ If you plan to use larger models like Qwen2-embedding-8B, please upgrade your Space.


☁️ How to Deploy (Free πŸš€)

Deploy your Custom Embedding API on Hugging Face Spaces β€” free, fast, and serverless.

1️⃣ Deploy on Hugging Face Spaces (Free!)

  1. Duplicate this Space:
    πŸ‘‰ fahmiaziz/api-embedding
    Click β‹― (three dots) β†’ Duplicate this Space

  2. Add HF_TOKEN environment variable Make sure your space is public

  3. Clone your Space locally:
    Click β‹― β†’ Clone repository

    git clone https://huggingface.co/spaces/YOUR_USERNAME/api-embedding
    cd api-embedding
    
  4. Edit src/config/models.yaml to customize models:

    models:
      your-model:
        name: "org/model-name"
        type: "embeddings"  # or "sparse-embeddings" or "rerank"
    
  5. Commit and push changes:

    git add src/config/models.yaml
    git commit -m "Update models configuration"
    git push
    
  6. Access your API: Click β‹― β†’ Embed this Space -> copy Direct URL

    https://YOUR_USERNAME-api-embedding.hf.space
    https://YOUR_USERNAME-api-embedding.hf.space/docs  # Interactive docs
    

That’s it! You now have a live embedding API endpoint powered by your models.

2️⃣ Run Locally (NOT RECOMMENDED)

# Clone repository
git clone https://github.com/fahmiaziz98/unified-embedding-api.git
cd unified-embedding-api

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run server
python app.py

API available at: http://localhost:7860

3️⃣ Run with Docker

# Build and run
docker-compose up --build

# Or with Docker only
docker build -t embedding-api .
docker run -p 7860:7860 embedding-api

πŸ“– Usage Examples

Python

import requests

url = "http://localhost:7860/api/v1/embeddings/embed"

# Single embedding
response = requests.post(url, json={
    "texts": ["What is artificial intelligence?"],
    "model_id": "qwen3-0.6b"
})
print(response.json())

# Batch embeddings
response = requests.post(url, json={
    "texts": [
        "First document",
        "Second document", 
        "Third document"
    ],
    "model_id": "qwen3-0.6b",
    "options": {
        "normalize_embeddings": True
    }
})
embeddings = response.json()["embeddings"]

cURL

# Single embedding (Dense)
curl -X POST "http://localhost:7860/api/v1/embeddings/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["Hello world"],
    "prompt": "add instructions here",
    "model_id": "qwen3-0.6b"
  }'

# Batch embeddings (Sparse)
curl -X POST "http://localhost:7860/api/v1/embeddings/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["First doc", "Second doc", "Third doc"],
    "model_id": "splade-pp-v2"
  }'

# Reranking
curl -X POST "http://localhost:7860/api/v1/rerank" \
  -H "Content-Type: application/json" \
  -d '{
  "documents": [
    "Python is a popular language for data science due to its extensive libraries.",
    "R is widely used in statistical computing and data analysis.",
    "Java is a versatile language used in various applications, including data science.",
    "SQL is essential for managing and querying relational databases.",
    "Julia is a high-performance language gaining popularity for numerical computing and data science."
  ],
  "model_id": "bge-v2-m3",
  "query": "Python best programming languages for data science",
  "top_k": 3
}'

# Query embedding with options
curl -X POST "http://localhost:7860/api/v1/embeddings/query" \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["What is machine learning?"],
    "model_id": "qwen3-0.6b",
    "options": {
      "normalize_embeddings": true,
      "batch_size": 32
    }
  }'

JavaScript/TypeScript

const url = "http://localhost:7860/api/v1/embeddings/embed";

const response = await fetch(url, {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    texts: ["Hello world"],
    model_id: "qwen3-0.6b",
  }),
});

const data = await response.json();
console.log(data.embedding);

πŸ“Š API Endpoints

Endpoint Method Description
/api/v1/embeddings/embed POST Generate document embeddings (single/batch)
/api/v1/embeddings/query POST Generate query embeddings (single/batch)
/api/v1/rerank POST Rerank documents based on a query
/api/v1/models GET List available models
/api/v1/models/{model_id} GET Get model information
/health GET Health check
/ GET API information
/docs GET Interactive API documentation

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Setup:

git clone https://github.com/fahmiaziz/unified-embedding-api.git
cd unified-embedding-api
pip install -r requirements-dev.txt
pre-commit install  # (optional)

πŸ“š Resources



πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Sentence Transformers for the embedding models
  • FastAPI for the excellent web framework
  • Hugging Face for model hosting and Spaces
  • Open Source Community for inspiration and support

πŸ“ž Support


✨ β€œUnify your embeddings. Simplify your AI stack.”

⭐ Star this repo if you find it useful!

Made with ❀️ by the Open-Source Community