Spaces:

DaVinciCode
/

doctra-document-parser

Sleeping

File size: 4,452 Bytes

91cfe57

# Doctra Hugging Face Spaces Deployment Guide

## 🚀 Quick Deployment

### Option 1: Direct Upload to Hugging Face Spaces

1. **Create a new Space**:
   - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
   - Click "Create new Space"
   - Choose "Gradio" as the SDK
   - Set the title to "Doctra - Document Parser"

2. **Upload files**:
   - Upload all files from this `hf_space` folder to your Space
   - Make sure `app.py` is in the root directory

3. **Configure environment**:
   - Go to Settings → Secrets
   - Add `VLM_API_KEY` if you want to use VLM features
   - Set the value to your API key (OpenAI, Anthropic, Google, etc.)

### Option 2: Git Repository Deployment

1. **Create a Git repository**:
   ```bash
   git init
   git add .
   git commit -m "Initial Doctra HF Space deployment"
   git remote add origin <your-repo-url>
   git push -u origin main
   ```

2. **Connect to Hugging Face Spaces**:
   - Create a new Space
   - Choose "Git repository" as the source
   - Enter your repository URL
   - Set the app file to `app.py`

### Option 3: Docker Deployment

1. **Build the Docker image**:
   ```bash
   docker build -t doctra-hf-space .
   ```

2. **Run the container**:
   ```bash
   docker run -p 7860:7860 doctra-hf-space
   ```

## 🔧 Configuration

### Environment Variables

Set these in your Hugging Face Space settings:

- `VLM_API_KEY`: Your API key for VLM providers
- `GRADIO_SERVER_NAME`: Server hostname (default: 0.0.0.0)
- `GRADIO_SERVER_PORT`: Server port (default: 7860)

### Hardware Requirements

- **CPU**: Minimum 2 cores recommended
- **RAM**: Minimum 4GB, 8GB+ recommended
- **Storage**: 10GB+ for models and dependencies
- **GPU**: Optional but recommended for faster processing

## 📊 Performance Optimization

### For Hugging Face Spaces

1. **Use CPU-optimized models** when GPU is not available
2. **Reduce DPI settings** for faster processing
3. **Process smaller documents** to avoid memory issues
4. **Enable caching** for repeated operations

### For Local Deployment

1. **Use GPU acceleration** when available
2. **Increase memory limits** for large documents
3. **Use SSD storage** for better I/O performance
4. **Configure proper logging** for debugging

## 🐛 Troubleshooting

### Common Issues

1. **Import Errors**:
   - Check that all dependencies are in `requirements.txt`
   - Verify Python version compatibility

2. **Memory Issues**:
   - Reduce DPI settings
   - Process smaller documents
   - Increase available memory

3. **API Key Issues**:
   - Verify API key is correctly set
   - Check provider-specific requirements
   - Test API connectivity

4. **File Upload Issues**:
   - Check file size limits
   - Verify file format support
   - Ensure proper permissions

### Debug Mode

To enable debug mode, set:
```bash
export GRADIO_DEBUG=1
```

## 📈 Monitoring

### Health Checks

- Monitor CPU and memory usage
- Check disk space availability
- Verify API key validity
- Test document processing pipeline

### Logs

- Application logs: Check Gradio output
- Error logs: Monitor for exceptions
- Performance logs: Track processing times
- User logs: Monitor usage patterns

## 🔄 Updates

### Updating the Application

1. **Code updates**: Push changes to your repository
2. **Dependency updates**: Update `requirements.txt`
3. **Model updates**: Download new model versions
4. **Configuration updates**: Modify environment variables

### Version Control

- Use semantic versioning
- Tag releases appropriately
- Maintain changelog
- Test before deployment

## 🛡️ Security

### Best Practices

1. **API Keys**: Store securely, never commit to code
2. **File Uploads**: Validate file types and sizes
3. **Rate Limiting**: Implement to prevent abuse
4. **Input Validation**: Sanitize all user inputs

### Privacy

- No data is stored permanently
- Files are processed in temporary directories
- API calls are made securely
- User data is not logged

## 📞 Support

For issues and questions:

1. **GitHub Issues**: Report bugs and feature requests
2. **Documentation**: Check the main README.md
3. **Community**: Join discussions on Hugging Face
4. **Email**: Contact the development team

## 🎯 Next Steps

After successful deployment:

1. **Test all features** with sample documents
2. **Configure monitoring** and alerting
3. **Set up backups** for important data
4. **Plan for scaling** based on usage
5. **Gather user feedback** for improvements