medsam-inference

Sleeping

File size: 7,169 Bytes

0b86477

# 🔗 Integration Guide - Use HF Space in Your Backend

## Quick Integration (3 Steps)

### Step 1: Copy the client file

```bash
# Copy the client to your backend directory
cp medsam_space_client.py ../medsam_space_client.py
```

### Step 2: Update your app.py

Find this code in `app.py` (around line 86-104):

```python
# OLD CODE - Remove this:
sam_checkpoint = "models/sam_vit_h_4b8939.pth"
model_type = "vit_b"
sam = None
sam_predictor = None

try:
    if os.path.exists(sam_checkpoint):
        sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
        sam.to(device=device)
        sam_predictor = SamPredictor(sam)
        print("SAM model loaded successfully")
    else:
        print(f"Warning: SAM checkpoint not found at {sam_checkpoint}")
except Exception as e:
    print(f"Warning: Failed to load SAM model: {e}")
```

Replace with:

```python
# NEW CODE - Add this:
from medsam_space_client import MedSAMSpacePredictor

# Initialize Space predictor
MEDSAM_SPACE_URL = os.getenv('MEDSAM_SPACE_URL', 
    'https://YOUR_USERNAME-medsam-inference.hf.space/api/predict')

sam_predictor = MedSAMSpacePredictor(MEDSAM_SPACE_URL)
print("✓ MedSAM Space predictor initialized")
```

### Step 3: Update your .env

```bash
cd backend
echo "MEDSAM_SPACE_URL=https://YOUR_USERNAME-medsam-inference.hf.space/api/predict" >> .env
```

**That's it!** Your code now uses the HF Space API! 🎉

---

## What Changes?

### ✅ These STAY THE SAME (No changes needed!)

All your endpoint code stays exactly the same:

```python
@app.route('/api/segment', methods=['POST'])
def segment_with_sam():
    # ... existing code ...
    
    # This works exactly the same!
    sam_predictor.set_image(image_array)
    masks, scores, _ = sam_predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )
    
    # Get the best mask
    best_mask = masks[np.argmax(scores)]
    
    # ... rest of your code ...
```

### 🔄 What's Different

**Before (Local SAM):**
- Loads 2.5GB model into memory
- Uses GPU/CPU for inference
- Fast but requires resources

**After (HF Space):**
- No model loading
- API call to HF Space
- Slightly slower but no resource usage

---

## Complete Example

Here's a complete before/after comparison:

### BEFORE (app.py with local SAM):

```python
from segment_anything import sam_model_registry, SamPredictor

# Initialize SAM locally (loads 2.5GB model)
sam = sam_model_registry["vit_b"](checkpoint="models/sam_vit_h_4b8939.pth")
sam.to(device=device)
sam_predictor = SamPredictor(sam)

@app.route('/api/segment', methods=['POST'])
def segment():
    data = request.json
    image_data = data.get('image')
    x, y = data.get('x'), data.get('y')
    
    # Decode image
    image_bytes = base64.b64decode(image_data.split(',')[1])
    image = Image.open(BytesIO(image_bytes))
    image_array = np.array(image.convert('RGB'))
    
    # Segment with SAM
    sam_predictor.set_image(image_array)
    masks, scores, _ = sam_predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )
    
    # Get best mask
    best_mask = masks[np.argmax(scores)]
    
    return jsonify({'success': True})
```

### AFTER (app.py with HF Space):

```python
from medsam_space_client import MedSAMSpacePredictor

# Initialize Space predictor (no model loading!)
sam_predictor = MedSAMSpacePredictor(
    "https://YOUR_USERNAME-medsam-inference.hf.space/api/predict"
)

@app.route('/api/segment', methods=['POST'])
def segment():
    data = request.json
    image_data = data.get('image')
    x, y = data.get('x'), data.get('y')
    
    # Decode image
    image_bytes = base64.b64decode(image_data.split(',')[1])
    image = Image.open(BytesIO(image_bytes))
    image_array = np.array(image.convert('RGB'))
    
    # Segment with SAM Space (SAME CODE!)
    sam_predictor.set_image(image_array)
    masks, scores, _ = sam_predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )
    
    # Get best mask (SAME CODE!)
    best_mask = masks[np.argmax(scores)]
    
    return jsonify({'success': True})
```

**Notice:** Only the initialization changed! Everything else is identical! ✨

---

## Testing

### 1. Test the client directly:

```python
# test_client.py
from medsam_space_client import MedSAMSpacePredictor
import numpy as np
from PIL import Image

# Initialize
predictor = MedSAMSpacePredictor(
    "https://YOUR_USERNAME-medsam-inference.hf.space/api/predict"
)

# Load test image
image = np.array(Image.open("test_image.jpg"))

# Set image
predictor.set_image(image)

# Predict
masks, scores, _ = predictor.predict(
    point_coords=np.array([[200, 150]]),
    point_labels=np.array([1]),
    multimask_output=True
)

print(f"✅ Got {len(masks)} masks")
print(f"   Scores: {scores}")
print(f"   Best score: {scores.max():.4f}")
```

### 2. Test your full backend:

```bash
# Start your backend
python app.py

# In another terminal, test the endpoint
curl -X POST http://localhost:5000/api/segment \
  -H "Content-Type: application/json" \
  -d '{
    "image": "data:image/jpeg;base64,/9j/4AAQ...",
    "x": 200,
    "y": 150
  }'
```

---

## Deployment

Now your backend is lightweight and can deploy to Vercel!

### Update requirements.txt for Vercel:

```txt
# requirements_vercel.txt
Flask==2.3.3
Flask-CORS==4.0.0
requests==2.31.0
Pillow>=10.0.0
numpy>=1.24.0

# No torch, no segment-anything!
```

### Deploy to Vercel:

```bash
cd backend

# Create vercel.json
cat > vercel.json << 'EOF'
{
  "version": 2,
  "builds": [{"src": "app.py", "use": "@vercel/python"}],
  "routes": [{"src": "/(.*)", "dest": "app.py"}]
}
EOF

# Deploy
vercel
vercel env add MEDSAM_SPACE_URL
# Paste: https://YOUR_USERNAME-medsam-inference.hf.space/api/predict
vercel --prod
```

---

## Performance

### Local SAM:
- ✅ Fast: 1-3 seconds
- ❌ Memory: 2.5GB+
- ❌ Requires GPU for speed

### HF Space (Free CPU):
- ⚠️ Slower: 5-10 seconds
- ✅ Memory: None (API call)
- ⚠️ May sleep (first request slow)

### HF Space (GPU T4):
- ✅ Fast: 1-2 seconds
- ✅ Memory: None (API call)
- ✅ Always on
- 💰 Cost: $0.60/hour

---

## Troubleshooting

### "Failed to get prediction from MedSAM Space"
→ Check MEDSAM_SPACE_URL is correct
→ Check Space is running (visit URL in browser)

### First request is very slow (20-30s)
→ Normal! Free tier Spaces sleep after inactivity
→ They wake up on first request
→ Subsequent requests are faster

### "Request timeout"
→ Space might be overloaded
→ Try again in a minute
→ Or upgrade to GPU tier

---

## Summary

✅ **What you did:**
1. Copied `medsam_space_client.py` to backend
2. Changed 5 lines in `app.py` (just initialization)
3. Added `MEDSAM_SPACE_URL` to `.env`

✅ **What stays the same:**
- All your endpoint code
- All your SAM prediction calls
- Your entire application logic

✅ **What you gained:**
- No more 2.5GB model in memory
- Can deploy to Vercel/serverless
- Model hosted on HuggingFace (free!)

🎉 **Your backend is now cloud-ready!**