Spaces:

Ojochegbeng
/

Pansgpt

Sleeping

App Files Files Community

Pansgpt / README.md

Ojochegbeng

Upload 7 files

56f66cf verified 4 months ago

preview code

raw

history blame

4.56 kB

Qwen3 Docker Deployment for PansGPT

This folder contains all the files needed to deploy a stable, Docker-based Qwen3 embedding API to Hugging Face Spaces for your PansGPT application.

📁 Files Overview

Core Application Files

app.py - Main FastAPI application with Qwen3-Embedding-0.6B model
Dockerfile - Optimized Docker configuration for Hugging Face Spaces
requirements.txt - Python dependencies for the application

Integration Files

qwen-embedding-service-docker.ts - TypeScript service for your PansGPT app
test-pansgpt-api.js - Test script to verify the deployed API

Deployment Files

deploy-to-hf.sh - Automated deployment script for Hugging Face Spaces

🚀 Quick Start

1. Deploy to Hugging Face Spaces

# Make sure you're logged in to Hugging Face
huggingface-cli login --token YOUR_TOKEN

# Deploy using the script
./deploy-to-hf.sh

2. Manual Deployment

# Clone your space
git clone https://YOUR_TOKEN@huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

# Copy files to the space directory
cp app.py Dockerfile requirements.txt README.md YOUR_SPACE_NAME/

# Commit and push
cd YOUR_SPACE_NAME
git add .
git commit -m "Add Qwen3 embedding API"
git push

3. Test the Deployment

# Test the deployed API
node test-pansgpt-api.js

🔧 Integration with PansGPT

Update Your .env File

QWEN_API_URL=https://your-username-your-space-name.hf.space/api/predict

Replace Your Embedding Service

Copy qwen-embedding-service-docker.ts to src/lib/
Update your imports to use the new service
The new service uses direct HTTP calls instead of Gradio client

Example Usage

import { generateEmbeddings } from './qwen-embedding-service-docker';

// Generate embeddings
const embeddings = await generateEmbeddings(["Your text here"]);

📊 API Endpoints

Main API: POST /api/predict
Health Check: GET /health
Web Interface: Available at your space URL

API Usage Examples

Single Text Embedding

curl -X POST "https://your-space.hf.space/api/predict" \
  -H "Content-Type: application/json" \
  -d '{"data": ["Your text here"]}'

Batch Text Embedding

curl -X POST "https://your-space.hf.space/api/predict" \
  -H "Content-Type: application/json" \
  -d '{"data": [["Text 1", "Text 2", "Text 3"]]}'

🎯 Model Information

Model: Qwen3-Embedding-0.6B
Dimensions: 1024
Context Length: 32K tokens
Languages: 100+ languages supported
Performance: State-of-the-art on MTEB benchmark

🔍 Troubleshooting

Common Issues

Space Not Building
- Check the space logs in Hugging Face
- Ensure all files are properly uploaded
- Verify Dockerfile syntax
API Not Responding
- Wait 2-5 minutes for the space to fully start
- Check the health endpoint: /health
- Verify the space is running (not sleeping)
Embedding Errors
- Check model loading in the logs
- Verify input text format
- Ensure text is not too long (max 512 tokens)

Health Check

curl https://your-space.hf.space/health

Expected response:

{
  "status": "healthy",
  "model_loaded": true
}

📈 Performance

Response Time: 100-500ms per request
Memory Usage: 2-4GB RAM
Concurrent Requests: Multiple simultaneous requests supported
Uptime: Much more stable than Gradio client connections

🔄 Updates

To update your deployed space:

Make changes to the files in this folder
Upload the updated files to your Hugging Face Space
The space will automatically rebuild with the new changes

📝 Notes

This Docker-based deployment is much more stable than the previous Gradio client approach
The Qwen3 model provides better embeddings than the previous Qwen2.5 model
All files are optimized for Hugging Face Spaces deployment
The service includes comprehensive error handling and fallback mechanisms

🆘 Support

If you encounter issues:

Check the space logs in Hugging Face
Verify your API URL is correct
Ensure the space is running and not sleeping
Test with the provided test script

Deployment Status: ✅ Ready for production use Last Updated: September 2025 Model Version: Qwen3-Embedding-0.6B