Spaces:

Polarium
/

NextTokenPrediction

Sleeping

App Files Files Community

NextTokenPrediction / DEPLOYMENT.md

Polarium

AI Text Assistant

c76198f about 2 months ago

preview code

raw

history blame contribute delete

2.97 kB

	# Deployment Instructions

	## Deploying to Hugging Face Spaces

	### Prerequisites
	- A Hugging Face account (free)
	- Git installed locally

	### Steps

	1. Create a new Space on Hugging Face:
	- Go to https://huggingface.co/spaces
	- Click "Create new Space"
	- Choose a name (e.g., "ai-text-assistant")
	- Select "Gradio" as the SDK
	- Choose visibility (Public or Private)
	- Click "Create Space"

	2. Clone your Space repository:
	```bash
	git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
	cd YOUR_SPACE_NAME
	```

	3. Copy the application files:
	Copy these files from this project to your Space repository:
	- `app.py`
	- `requirements.txt`
	- `README.md`
	- `.gitignore` (optional)

	4. Commit and push:
	```bash
	git add .
	git commit -m "Initial commit: AI Text Assistant"
	git push
	```

	5. Wait for deployment:
	- Hugging Face Spaces will automatically detect the changes
	- The build process will install dependencies and start the app
	- This may take 5-10 minutes for the first deployment
	- You can watch the build logs in the Space's "Logs" tab

	6. Access your app:
	- Once deployed, your app will be available at:
	- `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`

	### Local Testing

	To test locally before deploying:

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Run the app
	python app.py
	```

	The app will be available at `http://127.0.0.1:7860`

	### Configuration Options

	#### Hardware
	For better performance, you can upgrade your Space's hardware:
	- Go to Space Settings → Hardware
	- Options include CPU (free), GPU T4 (small fee), GPU A10G, etc.
	- The app works on CPU but will be faster with GPU

	#### Environment Variables
	You can set these in Space Settings → Variables:
	- `TRANSFORMERS_CACHE`: Custom cache directory for models
	- `HF_HOME`: Hugging Face home directory

	### Troubleshooting

	Build fails with memory errors:
	- The models are relatively small, but if you encounter issues:
	- Upgrade to a better hardware tier
	- Or consider using Hugging Face Inference API instead

	App starts slowly:
	- The first run downloads models (~1GB for Qwen, ~1.6GB for BART)
	- Subsequent runs will use cached models
	- Model loading takes 30-60 seconds on CPU

	Token alternatives not showing:
	- Make sure you hover over the generated words
	- The tooltip appears on hover with a slight delay
	- Try different browsers if issues persist

	### Performance Notes

	- First Load: Slow due to model downloads
	- Model Loading: 30-60 seconds on CPU, 5-10 seconds on GPU
	- Generation Speed:
	- Qwen (0.5B): ~10-20 tokens/sec on CPU, ~100+ tokens/sec on GPU
	- BART-large: ~5-10 tokens/sec on CPU, ~50+ tokens/sec on GPU

	### Support

	For issues or questions:
	- Check Hugging Face Spaces documentation: https://huggingface.co/docs/hub/spaces
	- Open an issue on the repository
	- Contact: Your email/contact info