Spaces:
Running
A newer version of the Gradio SDK is available:
6.2.0
Deployment Instructions for Hugging Face Spaces
This guide will help you deploy the Tokenizer Playground to Hugging Face Spaces.
Prerequisites
- A Hugging Face account (create one at https://huggingface.co/join)
- Git installed on your local machine
- (Optional) Hugging Face CLI installed:
pip install huggingface-hub
Step 1: Create a New Space
- Go to https://huggingface.co/spaces
- Click on "Create new Space"
- Fill in the following:
- Space name: Choose a unique name (e.g., "tokenizer-playground")
- Select the Space SDK: Choose Gradio
- Select the Space hardware: Start with CPU basic (free tier)
- Repo type: Public or Private (your choice)
- Click "Create Space"
Step 2: Clone Your Space Repository
After creating the space, you'll be redirected to your space page. Clone the repository:
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME
Step 3: Add the Application Files
Copy all the files from this project to your Space repository:
# Copy the application files
cp path/to/tokenizer/app.py .
cp path/to/tokenizer/requirements.txt .
cp path/to/tokenizer/README.md .
cp path/to/tokenizer/.gitignore .
Step 4: Commit and Push
git add .
git commit -m "Initial deployment of Tokenizer Playground"
git push
Step 5: Monitor the Build
- Go to your Space URL: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
- Click on the "Files" tab to verify all files are uploaded
- Click on the "Logs" tab to monitor the build process
- The space will automatically build and deploy
Step 6: (Optional) Configure Settings
Secrets and Environment Variables
If you want to use private models or add API keys:
- Go to your Space settings
- Add secrets under "Repository secrets"
- Access them in your code using
os.environ['SECRET_NAME']
Hardware Upgrade
For better performance:
- Go to Settings → Hardware
- Select a GPU tier (T4 small, T4 medium, A10G small, etc.)
- Note: GPU tiers are paid options
Persistent Storage
For caching tokenizers:
- Go to Settings → Persistent storage
- Enable persistent storage (paid feature)
- This will cache downloaded models between restarts
Troubleshooting
Common Issues
Build fails with dependency errors
- Check that all packages in requirements.txt are compatible
- Try pinning specific versions if conflicts occur
Space crashes on startup
- Check the logs for error messages
- Ensure the app.py file has
app.launch()at the end - Verify Python syntax is correct
Models fail to load
- Some models require authentication
- Add your HF token as a secret if needed
- Some models might be too large for free tier
Slow performance
- Consider upgrading to GPU hardware
- Enable persistent storage to cache models
- Reduce the number of pre-loaded models
Resource Limits
Free Tier (CPU basic):
- 2 vCPU
- 16 GB RAM
- No GPU
- Limited concurrent users
Recommendations for Production:
- Use T4 small or medium for good balance of cost/performance
- Enable persistent storage to avoid re-downloading models
- Consider implementing request queuing for high traffic
Local Testing Before Deployment
Always test locally before deploying:
# Install dependencies
pip install -r requirements.txt
# Run the application
python app.py
# Test in browser at http://localhost:7860
Updating Your Space
To update your deployed Space:
# Make changes to your files
git add .
git commit -m "Update: description of changes"
git push
The Space will automatically rebuild and redeploy.
Using the Hugging Face CLI (Alternative Method)
If you have the Hugging Face CLI installed:
# Login to Hugging Face
huggingface-cli login
# Upload files directly
huggingface-cli upload YOUR_USERNAME/YOUR_SPACE_NAME . . --repo-type=space
Performance Optimization Tips
- Lazy Loading: The app already implements tokenizer caching
- Model Selection: Start with smaller models for testing
- Batch Processing: The compare feature processes models efficiently
- Error Handling: Comprehensive error handling is implemented
Security Considerations
- Never commit secrets: Use environment variables for sensitive data
- Model Access: Some models require authentication tokens
- Input Validation: The app validates all inputs
- Rate Limiting: Consider implementing rate limiting for production
Support
- For Space-specific issues: https://huggingface.co/docs/hub/spaces
- For Gradio issues: https://gradio.app/docs
- For tokenizer issues: https://huggingface.co/docs/transformers/main_classes/tokenizer
Next Steps
After successful deployment:
- Share your Space URL with colleagues
- Embed the Space in websites using the embed feature
- Monitor usage in the Analytics tab
- Collect feedback and iterate on features
- Consider adding more tokenizers based on user needs
Good luck with your deployment! The Tokenizer Playground should provide a valuable tool for the NLP research community.