Spaces:
Sleeping
Sleeping
| # Deployment Instructions for Hugging Face Spaces | |
| This guide will help you deploy the Tokenizer Playground to Hugging Face Spaces. | |
| ## Prerequisites | |
| 1. A Hugging Face account (create one at https://huggingface.co/join) | |
| 2. Git installed on your local machine | |
| 3. (Optional) Hugging Face CLI installed: `pip install huggingface-hub` | |
| ## Step 1: Create a New Space | |
| 1. Go to https://huggingface.co/spaces | |
| 2. Click on "Create new Space" | |
| 3. Fill in the following: | |
| - **Space name**: Choose a unique name (e.g., "tokenizer-playground") | |
| - **Select the Space SDK**: Choose **Gradio** | |
| - **Select the Space hardware**: Start with **CPU basic** (free tier) | |
| - **Repo type**: Public or Private (your choice) | |
| 4. Click "Create Space" | |
| ## Step 2: Clone Your Space Repository | |
| After creating the space, you'll be redirected to your space page. Clone the repository: | |
| ```bash | |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME | |
| cd YOUR_SPACE_NAME | |
| ``` | |
| ## Step 3: Add the Application Files | |
| Copy all the files from this project to your Space repository: | |
| ```bash | |
| # Copy the application files | |
| cp path/to/tokenizer/app.py . | |
| cp path/to/tokenizer/requirements.txt . | |
| cp path/to/tokenizer/README.md . | |
| cp path/to/tokenizer/.gitignore . | |
| ``` | |
| ## Step 4: Commit and Push | |
| ```bash | |
| git add . | |
| git commit -m "Initial deployment of Tokenizer Playground" | |
| git push | |
| ``` | |
| ## Step 5: Monitor the Build | |
| 1. Go to your Space URL: https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME | |
| 2. Click on the "Files" tab to verify all files are uploaded | |
| 3. Click on the "Logs" tab to monitor the build process | |
| 4. The space will automatically build and deploy | |
| ## Step 6: (Optional) Configure Settings | |
| ### Secrets and Environment Variables | |
| If you want to use private models or add API keys: | |
| 1. Go to your Space settings | |
| 2. Add secrets under "Repository secrets" | |
| 3. Access them in your code using `os.environ['SECRET_NAME']` | |
| ### Hardware Upgrade | |
| For better performance: | |
| 1. Go to Settings → Hardware | |
| 2. Select a GPU tier (T4 small, T4 medium, A10G small, etc.) | |
| 3. Note: GPU tiers are paid options | |
| ### Persistent Storage | |
| For caching tokenizers: | |
| 1. Go to Settings → Persistent storage | |
| 2. Enable persistent storage (paid feature) | |
| 3. This will cache downloaded models between restarts | |
| ## Troubleshooting | |
| ### Common Issues | |
| 1. **Build fails with dependency errors** | |
| - Check that all packages in requirements.txt are compatible | |
| - Try pinning specific versions if conflicts occur | |
| 2. **Space crashes on startup** | |
| - Check the logs for error messages | |
| - Ensure the app.py file has `app.launch()` at the end | |
| - Verify Python syntax is correct | |
| 3. **Models fail to load** | |
| - Some models require authentication | |
| - Add your HF token as a secret if needed | |
| - Some models might be too large for free tier | |
| 4. **Slow performance** | |
| - Consider upgrading to GPU hardware | |
| - Enable persistent storage to cache models | |
| - Reduce the number of pre-loaded models | |
| ### Resource Limits | |
| **Free Tier (CPU basic):** | |
| - 2 vCPU | |
| - 16 GB RAM | |
| - No GPU | |
| - Limited concurrent users | |
| **Recommendations for Production:** | |
| - Use T4 small or medium for good balance of cost/performance | |
| - Enable persistent storage to avoid re-downloading models | |
| - Consider implementing request queuing for high traffic | |
| ## Local Testing Before Deployment | |
| Always test locally before deploying: | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run the application | |
| python app.py | |
| # Test in browser at http://localhost:7860 | |
| ``` | |
| ## Updating Your Space | |
| To update your deployed Space: | |
| ```bash | |
| # Make changes to your files | |
| git add . | |
| git commit -m "Update: description of changes" | |
| git push | |
| ``` | |
| The Space will automatically rebuild and redeploy. | |
| ## Using the Hugging Face CLI (Alternative Method) | |
| If you have the Hugging Face CLI installed: | |
| ```bash | |
| # Login to Hugging Face | |
| huggingface-cli login | |
| # Upload files directly | |
| huggingface-cli upload YOUR_USERNAME/YOUR_SPACE_NAME . . --repo-type=space | |
| ``` | |
| ## Performance Optimization Tips | |
| 1. **Lazy Loading**: The app already implements tokenizer caching | |
| 2. **Model Selection**: Start with smaller models for testing | |
| 3. **Batch Processing**: The compare feature processes models efficiently | |
| 4. **Error Handling**: Comprehensive error handling is implemented | |
| ## Security Considerations | |
| 1. **Never commit secrets**: Use environment variables for sensitive data | |
| 2. **Model Access**: Some models require authentication tokens | |
| 3. **Input Validation**: The app validates all inputs | |
| 4. **Rate Limiting**: Consider implementing rate limiting for production | |
| ## Support | |
| - For Space-specific issues: https://huggingface.co/docs/hub/spaces | |
| - For Gradio issues: https://gradio.app/docs | |
| - For tokenizer issues: https://huggingface.co/docs/transformers/main_classes/tokenizer | |
| ## Next Steps | |
| After successful deployment: | |
| 1. Share your Space URL with colleagues | |
| 2. Embed the Space in websites using the embed feature | |
| 3. Monitor usage in the Analytics tab | |
| 4. Collect feedback and iterate on features | |
| 5. Consider adding more tokenizers based on user needs | |
| --- | |
| Good luck with your deployment! The Tokenizer Playground should provide a valuable tool for the NLP research community. |