Spaces:

Polarium
/

NextTokenPrediction

Sleeping

NextTokenPrediction / DEPLOYMENT.md

Polarium

AI Text Assistant

c76198f 14 days ago

2.97 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Deployment Instructions

Create a new Space on Hugging Face:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Choose a name (e.g., "ai-text-assistant")
- Select "Gradio" as the SDK
- Choose visibility (Public or Private)
- Click "Create Space"

Clone your Space repository:

git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
cd YOUR_SPACE_NAME

Copy the application files: Copy these files from this project to your Space repository:
- app.py
- requirements.txt
- README.md
- .gitignore (optional)

Commit and push:

git add .
git commit -m "Initial commit: AI Text Assistant"
git push

Wait for deployment:
- Hugging Face Spaces will automatically detect the changes
- The build process will install dependencies and start the app
- This may take 5-10 minutes for the first deployment
- You can watch the build logs in the Space's "Logs" tab
Access your app:
- Once deployed, your app will be available at:
- https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME

To test locally before deploying:

# Install dependencies
pip install -r requirements.txt

# Run the app
python app.py

The app will be available at http://127.0.0.1:7860

For better performance, you can upgrade your Space's hardware:

You can set these in Space Settings → Variables:

Build fails with memory errors:

App starts slowly:

Token alternatives not showing:

First Load: Slow due to model downloads
Model Loading: 30-60 seconds on CPU, 5-10 seconds on GPU
Generation Speed:
- Qwen (0.5B): ~10-20 tokens/sec on CPU, ~100+ tokens/sec on GPU
- BART-large: ~5-10 tokens/sec on CPU, ~50+ tokens/sec on GPU

For issues or questions: