PansGPT Qwen3 Embedding API
A stable, Docker-based API for generating text embeddings using the Qwen3-Embedding-0.6B model. This space provides a reliable service for the PansGPT application.
Features
- Single Text Embedding: Generate embeddings for individual texts
- Batch Processing: Process multiple texts efficiently
- Similarity Calculation: Compute cosine similarity between embeddings
- Docker-based: Stable deployment with containerization
- Health Monitoring: Built-in health check endpoints
- Fallback Support: Automatic fallback to sentence-transformers if needed
API Endpoints
1. Single Text Embedding
POST /api/predict
Content-Type: application/json
{
"data": ["Your text here"]
}
2. Batch Text Embedding
POST /api/predict
Content-Type: application/json
{
"data": [["Text 1", "Text 2", "Text 3"]]
}
3. Health Check
GET /health
Usage Examples
Python
import requests
import json
# Single text embedding
response = requests.post(
"https://ojochegbeng-pansgpt.hf.space/api/predict",
json={"data": ["Hello, world!"]}
)
embedding = response.json()["data"][0]
# Batch embedding
response = requests.post(
"https://ojochegbeng-pansgpt.hf.space/api/predict",
json={"data": [["Text 1", "Text 2", "Text 3"]]}
)
embeddings = response.json()["data"][0]
JavaScript
// Single text embedding
const response = await fetch("https://ojochegbeng-pansgpt.hf.space/api/predict", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ data: ["Hello, world!"] })
});
const embedding = (await response.json()).data[0];
// Batch embedding
const response = await fetch("https://ojochegbeng-pansgpt.hf.space/api/predict", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ data: [["Text 1", "Text 2", "Text 3"]] })
});
const embeddings = (await response.json()).data[0];
Model Information
- Base Model: Qwen3-Embedding-0.6B
- Embedding Dimension: 1024 (Qwen3) or 384 (fallback)
- Max Input Length: 512 tokens
- Device: Auto-detects CUDA/CPU
Docker Configuration
This space uses Docker for stable deployment:
- Base Image: Python 3.11-slim
- Port: 7860
- Health Check: Built-in monitoring
- Non-root User: Security best practices
Performance
- Single Text: ~100-500ms (depending on hardware)
- Batch Processing: Optimized for multiple texts
- Memory Usage: ~2-4GB RAM
- Concurrent Requests: Supports multiple simultaneous requests
Integration with PansGPT
This API is specifically designed for the PansGPT application:
- Stable Connection: Docker-based deployment eliminates connection issues
- Consistent Performance: Reliable response times
- Error Handling: Comprehensive error handling and fallbacks
- Monitoring: Built-in health checks for monitoring
Support
For issues or questions:
- Check the health endpoint first:
/health - Review the logs for error details
- Ensure your input format matches the expected structure
Note: This space is optimized for stability and reliability. The Docker-based deployment ensures consistent performance for the PansGPT application.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support