PansGPT Qwen3 Embedding API

A stable, Docker-based API for generating text embeddings using the Qwen3-Embedding-0.6B model. This space provides a reliable service for the PansGPT application.

Features

  • Single Text Embedding: Generate embeddings for individual texts
  • Batch Processing: Process multiple texts efficiently
  • Similarity Calculation: Compute cosine similarity between embeddings
  • Docker-based: Stable deployment with containerization
  • Health Monitoring: Built-in health check endpoints
  • Fallback Support: Automatic fallback to sentence-transformers if needed

API Endpoints

1. Single Text Embedding

POST /api/predict
Content-Type: application/json

{
    "data": ["Your text here"]
}

2. Batch Text Embedding

POST /api/predict
Content-Type: application/json

{
    "data": [["Text 1", "Text 2", "Text 3"]]
}

3. Health Check

GET /health

Usage Examples

Python

import requests
import json

# Single text embedding
response = requests.post(
    "https://ojochegbeng-pansgpt.hf.space/api/predict",
    json={"data": ["Hello, world!"]}
)
embedding = response.json()["data"][0]

# Batch embedding
response = requests.post(
    "https://ojochegbeng-pansgpt.hf.space/api/predict",
    json={"data": [["Text 1", "Text 2", "Text 3"]]}
)
embeddings = response.json()["data"][0]

JavaScript

// Single text embedding
const response = await fetch("https://ojochegbeng-pansgpt.hf.space/api/predict", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ data: ["Hello, world!"] })
});
const embedding = (await response.json()).data[0];

// Batch embedding
const response = await fetch("https://ojochegbeng-pansgpt.hf.space/api/predict", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ data: [["Text 1", "Text 2", "Text 3"]] })
});
const embeddings = (await response.json()).data[0];

Model Information

  • Base Model: Qwen3-Embedding-0.6B
  • Embedding Dimension: 1024 (Qwen3) or 384 (fallback)
  • Max Input Length: 512 tokens
  • Device: Auto-detects CUDA/CPU

Docker Configuration

This space uses Docker for stable deployment:

  • Base Image: Python 3.11-slim
  • Port: 7860
  • Health Check: Built-in monitoring
  • Non-root User: Security best practices

Performance

  • Single Text: ~100-500ms (depending on hardware)
  • Batch Processing: Optimized for multiple texts
  • Memory Usage: ~2-4GB RAM
  • Concurrent Requests: Supports multiple simultaneous requests

Integration with PansGPT

This API is specifically designed for the PansGPT application:

  1. Stable Connection: Docker-based deployment eliminates connection issues
  2. Consistent Performance: Reliable response times
  3. Error Handling: Comprehensive error handling and fallbacks
  4. Monitoring: Built-in health checks for monitoring

Support

For issues or questions:

  • Check the health endpoint first: /health
  • Review the logs for error details
  • Ensure your input format matches the expected structure

Note: This space is optimized for stability and reliability. The Docker-based deployment ensures consistent performance for the PansGPT application.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support