Pansgpt / README.md
Ojochegbeng's picture
Upload 7 files
56f66cf verified
|
raw
history blame
4.56 kB

Qwen3 Docker Deployment for PansGPT

This folder contains all the files needed to deploy a stable, Docker-based Qwen3 embedding API to Hugging Face Spaces for your PansGPT application.

πŸ“ Files Overview

Core Application Files

  • app.py - Main FastAPI application with Qwen3-Embedding-0.6B model
  • Dockerfile - Optimized Docker configuration for Hugging Face Spaces
  • requirements.txt - Python dependencies for the application

Integration Files

  • qwen-embedding-service-docker.ts - TypeScript service for your PansGPT app
  • test-pansgpt-api.js - Test script to verify the deployed API

Deployment Files

  • deploy-to-hf.sh - Automated deployment script for Hugging Face Spaces

πŸš€ Quick Start

1. Deploy to Hugging Face Spaces

# Make sure you're logged in to Hugging Face
huggingface-cli login --token YOUR_TOKEN

# Deploy using the script
./deploy-to-hf.sh

2. Manual Deployment

# Clone your space
git clone https://YOUR_TOKEN@huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

# Copy files to the space directory
cp app.py Dockerfile requirements.txt README.md YOUR_SPACE_NAME/

# Commit and push
cd YOUR_SPACE_NAME
git add .
git commit -m "Add Qwen3 embedding API"
git push

3. Test the Deployment

# Test the deployed API
node test-pansgpt-api.js

πŸ”§ Integration with PansGPT

Update Your .env File

QWEN_API_URL=https://your-username-your-space-name.hf.space/api/predict

Replace Your Embedding Service

  1. Copy qwen-embedding-service-docker.ts to src/lib/
  2. Update your imports to use the new service
  3. The new service uses direct HTTP calls instead of Gradio client

Example Usage

import { generateEmbeddings } from './qwen-embedding-service-docker';

// Generate embeddings
const embeddings = await generateEmbeddings(["Your text here"]);

πŸ“Š API Endpoints

  • Main API: POST /api/predict
  • Health Check: GET /health
  • Web Interface: Available at your space URL

API Usage Examples

Single Text Embedding

curl -X POST "https://your-space.hf.space/api/predict" \
  -H "Content-Type: application/json" \
  -d '{"data": ["Your text here"]}'

Batch Text Embedding

curl -X POST "https://your-space.hf.space/api/predict" \
  -H "Content-Type: application/json" \
  -d '{"data": [["Text 1", "Text 2", "Text 3"]]}'

🎯 Model Information

  • Model: Qwen3-Embedding-0.6B
  • Dimensions: 1024
  • Context Length: 32K tokens
  • Languages: 100+ languages supported
  • Performance: State-of-the-art on MTEB benchmark

πŸ” Troubleshooting

Common Issues

  1. Space Not Building

    • Check the space logs in Hugging Face
    • Ensure all files are properly uploaded
    • Verify Dockerfile syntax
  2. API Not Responding

    • Wait 2-5 minutes for the space to fully start
    • Check the health endpoint: /health
    • Verify the space is running (not sleeping)
  3. Embedding Errors

    • Check model loading in the logs
    • Verify input text format
    • Ensure text is not too long (max 512 tokens)

Health Check

curl https://your-space.hf.space/health

Expected response:

{
  "status": "healthy",
  "model_loaded": true
}

πŸ“ˆ Performance

  • Response Time: 100-500ms per request
  • Memory Usage: 2-4GB RAM
  • Concurrent Requests: Multiple simultaneous requests supported
  • Uptime: Much more stable than Gradio client connections

πŸ”„ Updates

To update your deployed space:

  1. Make changes to the files in this folder
  2. Upload the updated files to your Hugging Face Space
  3. The space will automatically rebuild with the new changes

πŸ“ Notes

  • This Docker-based deployment is much more stable than the previous Gradio client approach
  • The Qwen3 model provides better embeddings than the previous Qwen2.5 model
  • All files are optimized for Hugging Face Spaces deployment
  • The service includes comprehensive error handling and fallback mechanisms

πŸ†˜ Support

If you encounter issues:

  1. Check the space logs in Hugging Face
  2. Verify your API URL is correct
  3. Ensure the space is running and not sleeping
  4. Test with the provided test script

Deployment Status: βœ… Ready for production use Last Updated: September 2025 Model Version: Qwen3-Embedding-0.6B