File size: 4,452 Bytes
91cfe57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
# Doctra Hugging Face Spaces Deployment Guide

## πŸš€ Quick Deployment

### Option 1: Direct Upload to Hugging Face Spaces

1. **Create a new Space**:
   - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
   - Click "Create new Space"
   - Choose "Gradio" as the SDK
   - Set the title to "Doctra - Document Parser"

2. **Upload files**:
   - Upload all files from this `hf_space` folder to your Space
   - Make sure `app.py` is in the root directory

3. **Configure environment**:
   - Go to Settings β†’ Secrets
   - Add `VLM_API_KEY` if you want to use VLM features
   - Set the value to your API key (OpenAI, Anthropic, Google, etc.)

### Option 2: Git Repository Deployment

1. **Create a Git repository**:
   ```bash
   git init
   git add .
   git commit -m "Initial Doctra HF Space deployment"
   git remote add origin <your-repo-url>
   git push -u origin main
   ```

2. **Connect to Hugging Face Spaces**:
   - Create a new Space
   - Choose "Git repository" as the source
   - Enter your repository URL
   - Set the app file to `app.py`

### Option 3: Docker Deployment

1. **Build the Docker image**:
   ```bash
   docker build -t doctra-hf-space .
   ```

2. **Run the container**:
   ```bash
   docker run -p 7860:7860 doctra-hf-space
   ```

## πŸ”§ Configuration

### Environment Variables

Set these in your Hugging Face Space settings:

- `VLM_API_KEY`: Your API key for VLM providers
- `GRADIO_SERVER_NAME`: Server hostname (default: 0.0.0.0)
- `GRADIO_SERVER_PORT`: Server port (default: 7860)

### Hardware Requirements

- **CPU**: Minimum 2 cores recommended
- **RAM**: Minimum 4GB, 8GB+ recommended
- **Storage**: 10GB+ for models and dependencies
- **GPU**: Optional but recommended for faster processing

## πŸ“Š Performance Optimization

### For Hugging Face Spaces

1. **Use CPU-optimized models** when GPU is not available
2. **Reduce DPI settings** for faster processing
3. **Process smaller documents** to avoid memory issues
4. **Enable caching** for repeated operations

### For Local Deployment

1. **Use GPU acceleration** when available
2. **Increase memory limits** for large documents
3. **Use SSD storage** for better I/O performance
4. **Configure proper logging** for debugging

## πŸ› Troubleshooting

### Common Issues

1. **Import Errors**:
   - Check that all dependencies are in `requirements.txt`
   - Verify Python version compatibility

2. **Memory Issues**:
   - Reduce DPI settings
   - Process smaller documents
   - Increase available memory

3. **API Key Issues**:
   - Verify API key is correctly set
   - Check provider-specific requirements
   - Test API connectivity

4. **File Upload Issues**:
   - Check file size limits
   - Verify file format support
   - Ensure proper permissions

### Debug Mode

To enable debug mode, set:
```bash
export GRADIO_DEBUG=1
```

## πŸ“ˆ Monitoring

### Health Checks

- Monitor CPU and memory usage
- Check disk space availability
- Verify API key validity
- Test document processing pipeline

### Logs

- Application logs: Check Gradio output
- Error logs: Monitor for exceptions
- Performance logs: Track processing times
- User logs: Monitor usage patterns

## πŸ”„ Updates

### Updating the Application

1. **Code updates**: Push changes to your repository
2. **Dependency updates**: Update `requirements.txt`
3. **Model updates**: Download new model versions
4. **Configuration updates**: Modify environment variables

### Version Control

- Use semantic versioning
- Tag releases appropriately
- Maintain changelog
- Test before deployment

## πŸ›‘οΈ Security

### Best Practices

1. **API Keys**: Store securely, never commit to code
2. **File Uploads**: Validate file types and sizes
3. **Rate Limiting**: Implement to prevent abuse
4. **Input Validation**: Sanitize all user inputs

### Privacy

- No data is stored permanently
- Files are processed in temporary directories
- API calls are made securely
- User data is not logged

## πŸ“ž Support

For issues and questions:

1. **GitHub Issues**: Report bugs and feature requests
2. **Documentation**: Check the main README.md
3. **Community**: Join discussions on Hugging Face
4. **Email**: Contact the development team

## 🎯 Next Steps

After successful deployment:

1. **Test all features** with sample documents
2. **Configure monitoring** and alerting
3. **Set up backups** for important data
4. **Plan for scaling** based on usage
5. **Gather user feedback** for improvements