ABTestPredictor / SPACE_SETUP_GUIDE.md
nitish-spz's picture
Deploy: Optimize for Hugging Face Spaces - remove AI APIs and unnecessary files
ec4c5af
# Hugging Face Space Setup Guide
## Understanding Your Repositories
You have **TWO** separate repositories:
### 1. Model Repository: `nitish-spz/ABTestPredictor`
- **Purpose**: Store the model files
- **Contains**:
- `multimodal_gated_model_2.7_GGG.pth` (789 MB)
- `multimodal_cat_mappings_GGG.json`
- **Access**: Read-only, Space downloads from here
### 2. Space Repository: `SpiralyzeLLC/ABTestPredictor`
- **Purpose**: Run the Gradio application
- **Contains**: All application code + downloads model from repo #1
- **Access**: This is what deploys and runs
## Required Files in Space Repository
Your Space needs these files (but NOT the large model file):
```
βœ… app.py # Main application code
βœ… requirements.txt # Python dependencies
βœ… packages.txt # System dependencies (tesseract-ocr)
βœ… README.md # Project documentation
βœ… confidence_scores.json # Confidence data (14KB)
βœ… .gitattributes # Git LFS config
βœ… .dockerignore # Build optimization
❌ model/ folder # NOT needed - downloads from model repo
❌ patterbs.json # NOT needed - removed feature
❌ metadata.js # NOT needed - removed feature
❌ confidence_scores.js # NOT needed - use .json instead
```
## How the Model Loading Works
Your `app.py` is configured to:
1. Check if model exists locally in `model/` folder
2. If not, download from `nitish-spz/ABTestPredictor` model repository
3. Cache it for future use
```python
# In app.py lines 707-748
if os.path.exists(MODEL_SAVE_PATH):
model_path = MODEL_SAVE_PATH
print(f"βœ… Using local model")
else:
print(f"πŸ“₯ Downloading from Model Hub...")
model_path = download_model_from_hub()
```
## Deployment Steps
### Step 1: Verify Required Files Exist Locally
```bash
cd /Users/nitish/Spiralyze/HuggingFace/Spaces/ABTestPredictor
# Check essential files
ls -lh app.py requirements.txt packages.txt README.md confidence_scores.json
# Should all exist
```
### Step 2: Remove Large/Unnecessary Files
```bash
# Remove the local model folder (Space will download from model repo)
rm -rf model/
# Remove unused files from old version
rm -f patterbs.json metadata.js confidence_scores.js frontend.html index_v2.html
```
### Step 3: Verify Git Remote Points to Space
```bash
git remote -v
# Should show: https://huggingface.co/spaces/SpiralyzeLLC/ABTestPredictor
```
### Step 4: Commit and Push to Space
```bash
# Add all files
git add .
# Commit
git commit -m "Deploy: Add all application files, download model from hub"
# Push to Space
git push origin main
```
### Step 5: Monitor Build
1. Go to https://huggingface.co/spaces/SpiralyzeLLC/ABTestPredictor
2. Click "Logs" tab
3. Watch the build progress
4. First build takes 5-10 minutes (downloading model)
## If Build Fails
### Check These Files Exist in Space Repo:
```bash
# Essential files checklist
app.py βœ…
requirements.txt βœ…
packages.txt βœ…
README.md βœ…
confidence_scores.json βœ…
.dockerignore βœ…
.gitattributes βœ…
```
### Verify Model Repo is Accessible
Your app downloads from `nitish-spz/ABTestPredictor`. Verify:
1. Go to https://huggingface.co/nitish-spz/ABTestPredictor
2. Check files are visible
3. Make sure it's **public** (not private)
### Check requirements.txt
```bash
cat requirements.txt
```
Should contain:
```
torch
transformers
pandas
scikit-learn
Pillow
gradio
pytesseract
spaces
huggingface_hub
python-dotenv
```
### Check packages.txt
```bash
cat packages.txt
```
Should contain:
```
tesseract-ocr
```
## Common Issues
### Issue 1: "Model file not found"
**Cause**: Model repo is private or inaccessible
**Fix**: Make `nitish-spz/ABTestPredictor` public
### Issue 2: "No module named X"
**Cause**: Missing dependency in requirements.txt
**Fix**: Add the missing package to requirements.txt
### Issue 3: "Tesseract not found"
**Cause**: Missing system dependency
**Fix**: Ensure packages.txt contains `tesseract-ocr`
### Issue 4: Build hangs at "Installing requirements"
**Cause**: PyTorch is large (~2GB)
**Fix**: Wait 5-10 minutes, this is normal
## Space Configuration
Your Space should have these settings:
- **SDK**: Gradio
- **SDK Version**: 4.44.0
- **Hardware**: GPU (recommended: T4 or better)
- **Python Version**: 3.10 (default)
- **Visibility**: Public or Private (your choice)
## File Size Limits
- **Space repo**: Each file < 50MB (except LFS)
- **Model repo**: Files > 10MB should use Git LFS
- **Total Space size**: No hard limit, but keep it reasonable
## Success Indicators
βœ… Build completes without errors
βœ… Space status shows "Running"
βœ… Can access the Gradio interface
βœ… Making predictions returns results
βœ… Logs show "Successfully loaded model"
## Expected First-Run Behavior
```
πŸš€ Using device: cuda
πŸ”₯ GPU: Tesla T4
πŸ“₯ Model not found locally, downloading from Model Hub...
πŸ“₯ Downloading model from Hugging Face Model Hub: nitish-spz/ABTestPredictor
βœ… Model downloaded to: /home/user/.cache/huggingface/...
βœ… Successfully loaded GGG model weights
βœ… Model and processors loaded successfully.
Running on public URL: https://spiralyzellc-abtestpredictor.hf.space
```
## Testing After Deployment
### Test 1: Web Interface
1. Visit your Space URL
2. Upload test images
3. Select categories
4. Click predict
5. Should see results in ~3-5 seconds
### Test 2: API Client
```python
from gradio_client import Client
client = Client("SpiralyzeLLC/ABTestPredictor")
result = client.predict(
"control.jpg",
"variant.jpg",
"SaaS", "B2B", "High-Intent Lead Gen",
"B2B Software & Tech", "Awareness & Discovery",
api_name="/predict_with_categorical_data"
)
print(result)
```
## Need Help?
1. Check Space logs for errors
2. Review DEPLOYMENT_FIX.md for detailed troubleshooting
3. Verify all required files are in Space repo
4. Ensure model repo is public and accessible