NextTokenPrediction / DEPLOYMENT.md
Polarium
AI Text Assistant
c76198f
# Deployment Instructions
## Deploying to Hugging Face Spaces
### Prerequisites
- A Hugging Face account (free)
- Git installed locally
### Steps
1. **Create a new Space on Hugging Face:**
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Choose a name (e.g., "ai-text-assistant")
- Select "Gradio" as the SDK
- Choose visibility (Public or Private)
- Click "Create Space"
2. **Clone your Space repository:**
```bash
git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME
```
3. **Copy the application files:**
Copy these files from this project to your Space repository:
- `app.py`
- `requirements.txt`
- `README.md`
- `.gitignore` (optional)
4. **Commit and push:**
```bash
git add .
git commit -m "Initial commit: AI Text Assistant"
git push
```
5. **Wait for deployment:**
- Hugging Face Spaces will automatically detect the changes
- The build process will install dependencies and start the app
- This may take 5-10 minutes for the first deployment
- You can watch the build logs in the Space's "Logs" tab
6. **Access your app:**
- Once deployed, your app will be available at:
- `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
### Local Testing
To test locally before deploying:
```bash
# Install dependencies
pip install -r requirements.txt
# Run the app
python app.py
```
The app will be available at `http://127.0.0.1:7860`
### Configuration Options
#### Hardware
For better performance, you can upgrade your Space's hardware:
- Go to Space Settings → Hardware
- Options include CPU (free), GPU T4 (small fee), GPU A10G, etc.
- The app works on CPU but will be faster with GPU
#### Environment Variables
You can set these in Space Settings → Variables:
- `TRANSFORMERS_CACHE`: Custom cache directory for models
- `HF_HOME`: Hugging Face home directory
### Troubleshooting
**Build fails with memory errors:**
- The models are relatively small, but if you encounter issues:
- Upgrade to a better hardware tier
- Or consider using Hugging Face Inference API instead
**App starts slowly:**
- The first run downloads models (~1GB for Qwen, ~1.6GB for BART)
- Subsequent runs will use cached models
- Model loading takes 30-60 seconds on CPU
**Token alternatives not showing:**
- Make sure you hover over the generated words
- The tooltip appears on hover with a slight delay
- Try different browsers if issues persist
### Performance Notes
- **First Load:** Slow due to model downloads
- **Model Loading:** 30-60 seconds on CPU, 5-10 seconds on GPU
- **Generation Speed:**
- Qwen (0.5B): ~10-20 tokens/sec on CPU, ~100+ tokens/sec on GPU
- BART-large: ~5-10 tokens/sec on CPU, ~50+ tokens/sec on GPU
### Support
For issues or questions:
- Check Hugging Face Spaces documentation: https://huggingface.co/docs/hub/spaces
- Open an issue on the repository
- Contact: Your email/contact info