Spaces:
Running
π Unified Embedding API Documentation
Complete API reference for the Unified Embedding API v3.0.0.
Features: Dense Embeddings, Sparse Embeddings, and Document Reranking
π Base URL
https://fahmiaziz-api-embedding.hf.space
For local development:
http://localhost:7860
π Authentication
Currently no authentication required.
π Endpoints Overview
| Endpoint | Method | Description |
|---|---|---|
/api/v1/embeddings/embed |
POST | Generate document embeddings |
/api/v1/embeddings/query |
POST | Generate query embeddings |
/api/v1/rerank |
POST | Rerank documents by relevance |
/api/v1/models |
GET | List available models |
/api/v1/models/{model_id} |
GET | Get model information |
/health |
GET | Health check |
/ |
GET | API information |
π Embedding Endpoints
1. Generate Document Embeddings
POST /api/v1/embeddings/embed
Generate embeddings for document texts. Supports both single and batch processing.
Request Body
{
"texts": ["string"], // Required: List of texts (1-100 items)
"model_id": "string", // Required: Model identifier
"prompt": "string", // Optional: Instruction prompt
"options": { // Optional: Embedding parameters
"normalize_embeddings": true,
"batch_size": 32,
"max_length": 512,
"show_progress_bar": false
}
}
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
texts |
array[string] | β Yes | List of texts to embed (min: 1, max: 100) |
model_id |
string | β Yes | Model identifier (e.g., "qwen3-0.6b") |
prompt |
string | β No | Instruction prompt for the model |
options |
object | β No | Additional embedding parameters |
Options Parameters
| Field | Type | Default | Description |
|---|---|---|---|
normalize_embeddings |
boolean | false | L2 normalize output embeddings |
batch_size |
integer | 32 | Processing batch size (1-256) |
max_length |
integer | 512 | Maximum sequence length (1-8192) |
show_progress_bar |
boolean | false | Display progress during encoding |
precision |
string | float32 | Precision ("float32", "int8", "binary") |
Response - Single Text (Dense)
{
"embedding": [0.123, -0.456, 0.789, ...],
"dimension": 768,
"model_id": "qwen3-0.6b",
"processing_time": 0.0523
}
Response - Batch (Dense)
{
"embeddings": [
[0.123, -0.456, ...],
[0.234, 0.567, ...],
[0.345, -0.678, ...]
],
"dimension": 768,
"count": 3,
"model_id": "qwen3-0.6b",
"processing_time": 0.1245
}
Response - Single Text (Sparse)
{
"sparse_embedding": {
"text": "Hello world",
"indices": [10, 25, 42, 100],
"values": [0.85, 0.62, 0.91, 0.73]
},
"model_id": "splade-pp-v2",
"processing_time": 0.0421
}
Response - Batch (Sparse)
{
"embeddings": [
{
"text": "First doc",
"indices": [10, 25, 42],
"values": [0.85, 0.62, 0.91]
},
{
"text": "Second doc",
"indices": [15, 30, 50],
"values": [0.73, 0.88, 0.65]
}
],
"count": 2,
"model_id": "splade-pp-v2",
"processing_time": 0.0892
}
Examples
Single Text (Dense Model):
curl -X 'POST' \
'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"texts": ["What is artificial intelligence?"],
"model_id": "qwen3-0.6b"
}'
Single Text (Sparse Model):
curl -X 'POST' \
'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"texts": ["Hello world"],
"model_id": "splade-pp-v2"
}'
Batch (with Options):
curl -X 'POST' \
'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"texts": [
"First document to embed",
"Second document to embed",
"Third document to embed"
],
"model_id": "qwen3-0.6b",
"options": {
"normalize_embeddings": true,
"batch_size": 32
}
}'
Python Example:
import requests
url = "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/embed"
payload = {
"texts": ["Hello world"],
"model_id": "qwen3-0.6b"
}
response = requests.post(url, json=payload)
data = response.json()
print(f"Embedding dimension: {data['dimension']}")
print(f"Processing time: {data['processing_time']:.3f}s")
2. Generate Query Embeddings
POST /api/v1/embeddings/query
Generate embeddings optimized for search queries. Some models differentiate between query and document embeddings.
Request Body
Same as /embed endpoint.
{
"texts": ["string"],
"model_id": "string",
"prompt": "string",
"options": {}
}
Response
Same format as /embed endpoint.
Examples
Single Query:
curl -X 'POST' \
'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/query' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"texts": ["What is machine learning?"],
"model_id": "qwen3-0.6b",
"prompt": "Represent this query for retrieval",
"options": {
"normalize_embeddings": true
}
}'
Batch Queries:
curl -X 'POST' \
'https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/query' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"texts": [
"First query",
"Second query",
"Third query"
],
"model_id": "qwen3-0.6b"
}'
Python Example:
import requests
url = "https://fahmiaziz-api-embedding.hf.space/api/v1/embeddings/query"
payload = {
"texts": ["What is AI?"],
"model_id": "qwen3-0.6b",
"options": {
"normalize_embeddings": True
}
}
response = requests.post(url, json=payload)
embedding = response.json()["embedding"]
3. Rerank Documents
POST /api/v1/rerank
Rerank documents based on their relevance to a query using CrossEncoder models.
Request Body
{
"query": "string", // Required: Search query
"documents": ["string"], // Required: List of documents (min: 1)
"model_id": "string", // Required: Reranking model identifier
"top_k": integer, // Required: Number of top results to return
}
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
query |
string | β Yes | Search query text |
documents |
array[string] | β Yes | List of documents to rerank (min: 1) |
model_id |
string | β Yes | Reranking model identifier |
top_k |
integer | β Yes | Maximum number of results to return |
Response
{
"model_id": "jina-reranker-v3",
"processing_time": 0.56,
"query": "Python for data science",
"results": [
{
"index": 0,
"score": 0.95,
"text": "Python is excellent for data science"
},
{
"index": 2,
"score": 0.73,
"text": "R is also used in data science"
}
]
}
Response Fields
| Field | Type | Description |
|---|---|---|
model_id |
string | Model identifier used |
processing_time |
float | Processing time in seconds |
query |
string | Original search query |
results |
array | Reranked documents with scores |
results[].index |
integer | Original index in input documents |
results[].score |
float | Relevance score (0-1, normalized) |
results[].text |
string | Document text |
Examples
Basic Reranking:
curl -X 'POST' \
'https://fahmiaziz-api-embedding.hf.space/api/v1/rerank' \
-H 'Content-Type: application/json' \
-d '{
"query": "Python for data science",
"documents": [
"Python is great for data science",
"Java is used for enterprise applications",
"R is also used in data science",
"JavaScript is for web development"
],
"model_id": "jina-reranker-v3",
"top_k": 2
}'
Python Example:
import requests
url = "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank"
payload = {
"query": "best programming language for beginners",
"documents": [
"Python is beginner-friendly with simple syntax",
"C++ is powerful but complex for beginners",
"JavaScript is essential for web development",
"Rust offers memory safety but steep learning curve"
],
"model_id": "jina-reranker-v3",
"top_k": 2
}
response = requests.post(url, json=payload)
data = response.json()
print(f"Top result: {data['results'][0]['text']}")
print(f"Score: {data['results'][0]['score']:.3f}")
JavaScript Example:
const url = "https://fahmiaziz-api-embedding.hf.space/api/v1/rerank";
const response = await fetch(url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
query: "AI applications",
documents: [
"Computer vision for image recognition",
"Recipe for chocolate cake",
"Natural language processing for chatbots",
"Travel guide to Paris"
],
model_id: "jina-reranker-v3",
top_k: 2
})
});
const { results } = await response.json();
console.log("Top results:", results);
π€ Model Management
3. List Available Models
GET /api/v1/models
Get a list of all available embedding models.
Response
{
"models": [
{
"id": "qwen3-0.6b",
"name": "Qwen/Qwen3-Embedding-0.6B",
"type": "embeddings",
"loaded": true,
"repository": "https://huggingface.co/Qwen/Qwen3-Embedding-0.6B"
},
{
"id": "splade-pp-v2",
"name": "prithivida/Splade_PP_en_v2",
"type": "sparse-embeddings",
"loaded": true,
"repository": "https://huggingface.co/prithivida/Splade_PP_en_v2"
}
],
"total": 2
}
Example
curl -X 'GET' \
'https://fahmiaziz-api-embedding.hf.space/api/v1/models' \
-H 'accept: application/json'
4. Get Model Information
GET /api/v1/models/{model_id}
Get detailed information about a specific model.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model_id |
string | β Yes | Model identifier |
Response
{
"id": "qwen3-0.6b",
"name": "Qwen/Qwen3-Embedding-0.6B",
"type": "embeddings",
"loaded": true,
"repository": "https://huggingface.co/Qwen/Qwen3-Embedding-0.6B"
}
Example
curl -X 'GET' \
'https://fahmiaziz-api-embedding.hf.space/api/v1/models/qwen3-0.6b' \
-H 'accept: application/json'
π₯ System Endpoints
5. Health Check
GET /health
Check API health status.
Response
{
"status": "ok",
"total_models": 2,
"loaded_models": 2,
"startup_complete": true
}
Example
curl -X 'GET' \
'https://fahmiaziz-api-embedding.hf.space/health' \
-H 'accept: application/json'
6. API Information
GET /
Get basic API information.
Response
{
"message": "Unified Embedding API - Dense & Sparse Embeddings",
"version": "3.0.0",
"docs_url": "/docs"
}
β Error Responses
All errors follow this format:
{
"detail": "Error message description"
}
HTTP Status Codes
| Code | Description |
|---|---|
| 200 | Success |
| 400 | Bad Request - Invalid input |
| 404 | Not Found - Model not found |
| 422 | Unprocessable Entity - Validation error |
| 500 | Internal Server Error |
| 503 | Service Unavailable - Server not ready |
Common Errors
Model Not Found (404):
{
"detail": "Model 'unknown-model' not found in configuration"
}
Validation Error (422):
{
"detail": [
{
"loc": ["body", "texts"],
"msg": "texts list cannot be empty",
"type": "value_error"
}
]
}
Batch Too Large (422):
{
"detail": "Batch size (150) exceeds maximum (100)"
}
π¦ Available Models
Dense Embedding Models
| Model ID | Name | Dimension | Description |
|---|---|---|---|
qwen3-0.6b |
Qwen/Qwen3-Embedding-0.6B | 768 | Efficient multilingual embeddings |
Sparse Embedding Models
| Model ID | Name | Type | Description |
|---|---|---|---|
splade-pp-v2 |
prithivida/Splade_PP_en_v2 | Sparse | SPLADE++ English v2 |
Reranking Models
| Model ID | Name | Type | Description |
|---|---|---|---|
jina-reranker-v3 |
jinaai/jina-reranker-v3-base-en | CrossEncoder | High-quality reranking (English) |
bge-v2-m3 |
BAAI/bge-reranker-v2-m3 | CrossEncoder | Multilingual reranking |
π§ Rate Limits
Current Limits:
- Max text length: 8,192 characters
- Max batch size: 100 texts per request
- No rate limiting (subject to server resources)
π‘ Best Practices
1. Batch Processing
Always batch multiple texts together for better performance:
# β Bad - Multiple requests
for text in texts:
response = requests.post(url, json={"texts": [text], ...})
# β
Good - Single batch request
response = requests.post(url, json={"texts": texts, ...})
2. Normalize Embeddings for Similarity
For cosine similarity, always normalize:
payload = {
"texts": ["text"],
"model_id": "qwen3-0.6b",
"options": {"normalize_embeddings": True}
}
3. Model Selection
- Dense models (qwen3-0.6b): Best for semantic similarity
- Sparse models (splade-pp-v2): Best for keyword matching + semantic
- Rerank models (jina-reranker-v3): Best for re-scoring top candidates
4. Two-Stage Retrieval (Recommended for RAG)
# Stage 1: Fast retrieval with embeddings (top 100)
query_embedding = embed_query(query)
candidates = vector_search(query_embedding, top_k=100)
# Stage 2: Precise reranking (top 10)
reranked = rerank(
query=query,
documents=[c["text"] for c in candidates],
model_id="jina-reranker-v3",
top_k=10
)
5. Error Handling
Always handle errors gracefully:
try:
response = requests.post(url, json=payload)
response.raise_for_status()
data = response.json()
except requests.exceptions.HTTPError as e:
print(f"HTTP error: {e}")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
π Troubleshooting
Empty Response
- Check
textsfield is not empty - Validate
model_idexists
Slow Performance
- Use batch requests instead of multiple single requests
- Reduce
batch_sizein options if memory issues - Check model is preloaded (first request is slower)
Connection Errors
- Verify base URL is correct
- Check network connectivity
- Ensure server is running (
/healthendpoint)
π Support
- Documentation: GitHub README
- Issues: GitHub Issues
- Hugging Face Space: fahmiaziz/api-embedding
π Changelog
v3.0.0 (Current)
- β¨ Added reranking endpoint (
/api/v1/rerank) - β¨ Support for CrossEncoder models
- β¨ Unified batch-only response format
- β¨ Flexible kwargs support
- β¨ In-memory caching
- β¨ Improved error handling
- β¨ Comprehensive documentation
- π Fixed type hint errors in RerankModel
- π Fixed duplicate parameter errors in rerank endpoint
Last Updated: 2025-11-02