Spaces:

Polarium
/

NextTokenPrediction

Runtime error

File size: 2,970 Bytes

c76198f

# Deployment Instructions

## Deploying to Hugging Face Spaces

### Prerequisites
- A Hugging Face account (free)
- Git installed locally

### Steps

1. **Create a new Space on Hugging Face:**
   - Go to https://huggingface.co/spaces
   - Click "Create new Space"
   - Choose a name (e.g., "ai-text-assistant")
   - Select "Gradio" as the SDK
   - Choose visibility (Public or Private)
   - Click "Create Space"

2. **Clone your Space repository:**
   ```bash
   git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
   cd YOUR_SPACE_NAME
   ```

3. **Copy the application files:**
   Copy these files from this project to your Space repository:
   - `app.py`
   - `requirements.txt`
   - `README.md`
   - `.gitignore` (optional)

4. **Commit and push:**
   ```bash
   git add .
   git commit -m "Initial commit: AI Text Assistant"
   git push
   ```

5. **Wait for deployment:**
   - Hugging Face Spaces will automatically detect the changes
   - The build process will install dependencies and start the app
   - This may take 5-10 minutes for the first deployment
   - You can watch the build logs in the Space's "Logs" tab

6. **Access your app:**
   - Once deployed, your app will be available at:
   - `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`

### Local Testing

To test locally before deploying:

```bash
# Install dependencies
pip install -r requirements.txt

# Run the app
python app.py
```

The app will be available at `http://127.0.0.1:7860`

### Configuration Options

#### Hardware
For better performance, you can upgrade your Space's hardware:
- Go to Space Settings → Hardware
- Options include CPU (free), GPU T4 (small fee), GPU A10G, etc.
- The app works on CPU but will be faster with GPU

#### Environment Variables
You can set these in Space Settings → Variables:
- `TRANSFORMERS_CACHE`: Custom cache directory for models
- `HF_HOME`: Hugging Face home directory

### Troubleshooting

**Build fails with memory errors:**
- The models are relatively small, but if you encounter issues:
- Upgrade to a better hardware tier
- Or consider using Hugging Face Inference API instead

**App starts slowly:**
- The first run downloads models (~1GB for Qwen, ~1.6GB for BART)
- Subsequent runs will use cached models
- Model loading takes 30-60 seconds on CPU

**Token alternatives not showing:**
- Make sure you hover over the generated words
- The tooltip appears on hover with a slight delay
- Try different browsers if issues persist

### Performance Notes

- **First Load:** Slow due to model downloads
- **Model Loading:** 30-60 seconds on CPU, 5-10 seconds on GPU
- **Generation Speed:** 
  - Qwen (0.5B): ~10-20 tokens/sec on CPU, ~100+ tokens/sec on GPU
  - BART-large: ~5-10 tokens/sec on CPU, ~50+ tokens/sec on GPU

### Support

For issues or questions:
- Check Hugging Face Spaces documentation: https://huggingface.co/docs/hub/spaces
- Open an issue on the repository
- Contact: Your email/contact info