medsam-inference / README_INTEGRATION.md
Anigor66
Initial commit
0b86477

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

πŸ”— Integration Guide - Use HF Space in Your Backend

Quick Integration (3 Steps)

Step 1: Copy the client file

# Copy the client to your backend directory
cp medsam_space_client.py ../medsam_space_client.py

Step 2: Update your app.py

Find this code in app.py (around line 86-104):

# OLD CODE - Remove this:
sam_checkpoint = "models/sam_vit_h_4b8939.pth"
model_type = "vit_b"
sam = None
sam_predictor = None

try:
    if os.path.exists(sam_checkpoint):
        sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
        sam.to(device=device)
        sam_predictor = SamPredictor(sam)
        print("SAM model loaded successfully")
    else:
        print(f"Warning: SAM checkpoint not found at {sam_checkpoint}")
except Exception as e:
    print(f"Warning: Failed to load SAM model: {e}")

Replace with:

# NEW CODE - Add this:
from medsam_space_client import MedSAMSpacePredictor

# Initialize Space predictor
MEDSAM_SPACE_URL = os.getenv('MEDSAM_SPACE_URL', 
    'https://YOUR_USERNAME-medsam-inference.hf.space/api/predict')

sam_predictor = MedSAMSpacePredictor(MEDSAM_SPACE_URL)
print("βœ“ MedSAM Space predictor initialized")

Step 3: Update your .env

cd backend
echo "MEDSAM_SPACE_URL=https://YOUR_USERNAME-medsam-inference.hf.space/api/predict" >> .env

That's it! Your code now uses the HF Space API! πŸŽ‰


What Changes?

βœ… These STAY THE SAME (No changes needed!)

All your endpoint code stays exactly the same:

@app.route('/api/segment', methods=['POST'])
def segment_with_sam():
    # ... existing code ...
    
    # This works exactly the same!
    sam_predictor.set_image(image_array)
    masks, scores, _ = sam_predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )
    
    # Get the best mask
    best_mask = masks[np.argmax(scores)]
    
    # ... rest of your code ...

πŸ”„ What's Different

Before (Local SAM):

  • Loads 2.5GB model into memory
  • Uses GPU/CPU for inference
  • Fast but requires resources

After (HF Space):

  • No model loading
  • API call to HF Space
  • Slightly slower but no resource usage

Complete Example

Here's a complete before/after comparison:

BEFORE (app.py with local SAM):

from segment_anything import sam_model_registry, SamPredictor

# Initialize SAM locally (loads 2.5GB model)
sam = sam_model_registry["vit_b"](checkpoint="models/sam_vit_h_4b8939.pth")
sam.to(device=device)
sam_predictor = SamPredictor(sam)

@app.route('/api/segment', methods=['POST'])
def segment():
    data = request.json
    image_data = data.get('image')
    x, y = data.get('x'), data.get('y')
    
    # Decode image
    image_bytes = base64.b64decode(image_data.split(',')[1])
    image = Image.open(BytesIO(image_bytes))
    image_array = np.array(image.convert('RGB'))
    
    # Segment with SAM
    sam_predictor.set_image(image_array)
    masks, scores, _ = sam_predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )
    
    # Get best mask
    best_mask = masks[np.argmax(scores)]
    
    return jsonify({'success': True})

AFTER (app.py with HF Space):

from medsam_space_client import MedSAMSpacePredictor

# Initialize Space predictor (no model loading!)
sam_predictor = MedSAMSpacePredictor(
    "https://YOUR_USERNAME-medsam-inference.hf.space/api/predict"
)

@app.route('/api/segment', methods=['POST'])
def segment():
    data = request.json
    image_data = data.get('image')
    x, y = data.get('x'), data.get('y')
    
    # Decode image
    image_bytes = base64.b64decode(image_data.split(',')[1])
    image = Image.open(BytesIO(image_bytes))
    image_array = np.array(image.convert('RGB'))
    
    # Segment with SAM Space (SAME CODE!)
    sam_predictor.set_image(image_array)
    masks, scores, _ = sam_predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )
    
    # Get best mask (SAME CODE!)
    best_mask = masks[np.argmax(scores)]
    
    return jsonify({'success': True})

Notice: Only the initialization changed! Everything else is identical! ✨


Testing

1. Test the client directly:

# test_client.py
from medsam_space_client import MedSAMSpacePredictor
import numpy as np
from PIL import Image

# Initialize
predictor = MedSAMSpacePredictor(
    "https://YOUR_USERNAME-medsam-inference.hf.space/api/predict"
)

# Load test image
image = np.array(Image.open("test_image.jpg"))

# Set image
predictor.set_image(image)

# Predict
masks, scores, _ = predictor.predict(
    point_coords=np.array([[200, 150]]),
    point_labels=np.array([1]),
    multimask_output=True
)

print(f"βœ… Got {len(masks)} masks")
print(f"   Scores: {scores}")
print(f"   Best score: {scores.max():.4f}")

2. Test your full backend:

# Start your backend
python app.py

# In another terminal, test the endpoint
curl -X POST http://localhost:5000/api/segment \
  -H "Content-Type: application/json" \
  -d '{
    "image": "...",
    "x": 200,
    "y": 150
  }'

Deployment

Now your backend is lightweight and can deploy to Vercel!

Update requirements.txt for Vercel:

# requirements_vercel.txt
Flask==2.3.3
Flask-CORS==4.0.0
requests==2.31.0
Pillow>=10.0.0
numpy>=1.24.0

# No torch, no segment-anything!

Deploy to Vercel:

cd backend

# Create vercel.json
cat > vercel.json << 'EOF'
{
  "version": 2,
  "builds": [{"src": "app.py", "use": "@vercel/python"}],
  "routes": [{"src": "/(.*)", "dest": "app.py"}]
}
EOF

# Deploy
vercel
vercel env add MEDSAM_SPACE_URL
# Paste: https://YOUR_USERNAME-medsam-inference.hf.space/api/predict
vercel --prod

Performance

Local SAM:

  • βœ… Fast: 1-3 seconds
  • ❌ Memory: 2.5GB+
  • ❌ Requires GPU for speed

HF Space (Free CPU):

  • ⚠️ Slower: 5-10 seconds
  • βœ… Memory: None (API call)
  • ⚠️ May sleep (first request slow)

HF Space (GPU T4):

  • βœ… Fast: 1-2 seconds
  • βœ… Memory: None (API call)
  • βœ… Always on
  • πŸ’° Cost: $0.60/hour

Troubleshooting

"Failed to get prediction from MedSAM Space"

β†’ Check MEDSAM_SPACE_URL is correct β†’ Check Space is running (visit URL in browser)

First request is very slow (20-30s)

β†’ Normal! Free tier Spaces sleep after inactivity β†’ They wake up on first request β†’ Subsequent requests are faster

"Request timeout"

β†’ Space might be overloaded β†’ Try again in a minute β†’ Or upgrade to GPU tier


Summary

βœ… What you did:

  1. Copied medsam_space_client.py to backend
  2. Changed 5 lines in app.py (just initialization)
  3. Added MEDSAM_SPACE_URL to .env

βœ… What stays the same:

  • All your endpoint code
  • All your SAM prediction calls
  • Your entire application logic

βœ… What you gained:

  • No more 2.5GB model in memory
  • Can deploy to Vercel/serverless
  • Model hosted on HuggingFace (free!)

πŸŽ‰ Your backend is now cloud-ready!