Spaces:
Sleeping
Sleeping
Qwen3 Docker Deployment for PansGPT
This folder contains all the files needed to deploy a stable, Docker-based Qwen3 embedding API to Hugging Face Spaces for your PansGPT application.
π Files Overview
Core Application Files
app.py- Main FastAPI application with Qwen3-Embedding-0.6B modelDockerfile- Optimized Docker configuration for Hugging Face Spacesrequirements.txt- Python dependencies for the application
Integration Files
qwen-embedding-service-docker.ts- TypeScript service for your PansGPT apptest-pansgpt-api.js- Test script to verify the deployed API
Deployment Files
deploy-to-hf.sh- Automated deployment script for Hugging Face Spaces
π Quick Start
1. Deploy to Hugging Face Spaces
# Make sure you're logged in to Hugging Face
huggingface-cli login --token YOUR_TOKEN
# Deploy using the script
./deploy-to-hf.sh
2. Manual Deployment
# Clone your space
git clone https://YOUR_TOKEN@huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
# Copy files to the space directory
cp app.py Dockerfile requirements.txt README.md YOUR_SPACE_NAME/
# Commit and push
cd YOUR_SPACE_NAME
git add .
git commit -m "Add Qwen3 embedding API"
git push
3. Test the Deployment
# Test the deployed API
node test-pansgpt-api.js
π§ Integration with PansGPT
Update Your .env File
QWEN_API_URL=https://your-username-your-space-name.hf.space/api/predict
Replace Your Embedding Service
- Copy
qwen-embedding-service-docker.tstosrc/lib/ - Update your imports to use the new service
- The new service uses direct HTTP calls instead of Gradio client
Example Usage
import { generateEmbeddings } from './qwen-embedding-service-docker';
// Generate embeddings
const embeddings = await generateEmbeddings(["Your text here"]);
π API Endpoints
- Main API:
POST /api/predict - Health Check:
GET /health - Web Interface: Available at your space URL
API Usage Examples
Single Text Embedding
curl -X POST "https://your-space.hf.space/api/predict" \
-H "Content-Type: application/json" \
-d '{"data": ["Your text here"]}'
Batch Text Embedding
curl -X POST "https://your-space.hf.space/api/predict" \
-H "Content-Type: application/json" \
-d '{"data": [["Text 1", "Text 2", "Text 3"]]}'
π― Model Information
- Model: Qwen3-Embedding-0.6B
- Dimensions: 1024
- Context Length: 32K tokens
- Languages: 100+ languages supported
- Performance: State-of-the-art on MTEB benchmark
π Troubleshooting
Common Issues
Space Not Building
- Check the space logs in Hugging Face
- Ensure all files are properly uploaded
- Verify Dockerfile syntax
API Not Responding
- Wait 2-5 minutes for the space to fully start
- Check the health endpoint:
/health - Verify the space is running (not sleeping)
Embedding Errors
- Check model loading in the logs
- Verify input text format
- Ensure text is not too long (max 512 tokens)
Health Check
curl https://your-space.hf.space/health
Expected response:
{
"status": "healthy",
"model_loaded": true
}
π Performance
- Response Time: 100-500ms per request
- Memory Usage: 2-4GB RAM
- Concurrent Requests: Multiple simultaneous requests supported
- Uptime: Much more stable than Gradio client connections
π Updates
To update your deployed space:
- Make changes to the files in this folder
- Upload the updated files to your Hugging Face Space
- The space will automatically rebuild with the new changes
π Notes
- This Docker-based deployment is much more stable than the previous Gradio client approach
- The Qwen3 model provides better embeddings than the previous Qwen2.5 model
- All files are optimized for Hugging Face Spaces deployment
- The service includes comprehensive error handling and fallback mechanisms
π Support
If you encounter issues:
- Check the space logs in Hugging Face
- Verify your API URL is correct
- Ensure the space is running and not sleeping
- Test with the provided test script
Deployment Status: β Ready for production use Last Updated: September 2025 Model Version: Qwen3-Embedding-0.6B