abdelkader commited on
Commit
bf5eae6
·
1 Parent(s): fdc3a9e
API_USAGE.md DELETED
@@ -1,233 +0,0 @@
1
- # 🔌 API Usage Guide - Hugging Face Spaces
2
-
3
- Sur Hugging Face Spaces, **seul Gradio est exposé publiquement**. L'API FastAPI (port 8000) n'est pas accessible depuis l'extérieur.
4
-
5
- **Mais Gradio expose automatiquement une API REST native!** 🎉
6
-
7
- ## 📡 Accéder à l'API depuis l'extérieur
8
-
9
- ### Option 1: API Gradio Native (Recommandé)
10
-
11
- Gradio expose automatiquement une API REST à l'endpoint `/api/predict`.
12
-
13
- #### Python avec `gradio_client`:
14
-
15
- ```python
16
- from gradio_client import Client
17
-
18
- # Remplacez par votre Space URL
19
- client = Client("AI-DrivenTesting/CU1-X")
20
-
21
- # Appeler l'API
22
- result = client.predict(
23
- "screenshot.png", # image (filepath or PIL Image)
24
- 0.35, # confidence_threshold (float)
25
- 2, # thickness (int)
26
- True, # enable_clip (bool)
27
- True, # enable_ocr (bool)
28
- False, # enable_blip (bool)
29
- False, # ocr_only (bool)
30
- "Only image & button", # blip_scope (str)
31
- False, # preprocess (bool)
32
- "RF-DETR Optimized (Recommended)", # preprocess_mode (str)
33
- "standard", # preprocess_preset (str)
34
- api_name="/predict"
35
- )
36
-
37
- # Résultat: (annotated_image, summary, detections_json)
38
- annotated_image, summary, detections_json = result
39
- print(detections_json)
40
- ```
41
-
42
- #### REST API (curl):
43
-
44
- ```bash
45
- # Pour un Space public
46
- curl -X POST "https://AI-DrivenTesting-CU1-X.hf.space/api/predict" \
47
- -H "Content-Type: application/json" \
48
- -d '{
49
- "data": [
50
- "screenshot.png", # Base64 encoded image or URL
51
- 0.35,
52
- 2,
53
- true,
54
- true,
55
- false,
56
- false,
57
- "Only image & button",
58
- false,
59
- "RF-DETR Optimized (Recommended)",
60
- "standard"
61
- ]
62
- }'
63
- ```
64
-
65
- **Note:** Pour les images, vous devez soit:
66
- - Utiliser une URL publique vers l'image
67
- - Encoder l'image en base64
68
- - Utiliser `gradio_client` qui gère ça automatiquement
69
-
70
- #### REST API avec Python `requests`:
71
-
72
- ```python
73
- import requests
74
- import base64
75
- from PIL import Image
76
- import io
77
-
78
- # Encoder l'image en base64
79
- def image_to_base64(image_path):
80
- with open(image_path, "rb") as f:
81
- return base64.b64encode(f.read()).decode()
82
-
83
- # Appeler l'API
84
- url = "https://AI-DrivenTesting-CU1-X.hf.space/api/predict"
85
- image_b64 = image_to_base64("screenshot.png")
86
-
87
- response = requests.post(
88
- url,
89
- json={
90
- "data": [
91
- f"data:image/png;base64,{image_b64}",
92
- 0.35,
93
- 2,
94
- True,
95
- True,
96
- False,
97
- False,
98
- "Only image & button",
99
- False,
100
- "RF-DETR Optimized (Recommended)",
101
- "standard"
102
- ]
103
- },
104
- timeout=120
105
- )
106
-
107
- result = response.json()
108
- print(result)
109
- ```
110
-
111
- ### Option 2: API FastAPI (Interne uniquement)
112
-
113
- L'API FastAPI sur le port 8000 **n'est PAS accessible depuis l'extérieur** du Space HF.
114
-
115
- Elle fonctionne uniquement:
116
- - ✅ En local (`python app.py`)
117
- - ✅ Entre les processus internes du Space
118
- - ❌ **PAS depuis l'extérieur du Space**
119
-
120
- ## 🔑 Authentification
121
-
122
- ### Spaces Publics
123
- - Aucune authentification requise
124
- - API accessible directement
125
-
126
- ### Spaces Privés
127
- - Nécessite un token Hugging Face
128
- - Ajoutez le header: `Authorization: Bearer <HF_TOKEN>`
129
-
130
- ```python
131
- from gradio_client import Client
132
-
133
- client = Client(
134
- "AI-DrivenTesting/CU1-X",
135
- hf_token="your_hf_token_here" # Pour les Spaces privés
136
- )
137
- ```
138
-
139
- ## 📊 Paramètres de l'API
140
-
141
- | Paramètre | Type | Description | Valeur par défaut |
142
- |-----------|------|-------------|-------------------|
143
- | `image` | file/str | Image à analyser | - |
144
- | `confidence_threshold` | float | Seuil de confiance (0.1-0.9) | 0.35 |
145
- | `thickness` | int | Épaisseur des boîtes (1-6) | 2 |
146
- | `enable_clip` | bool | Activer classification CLIP | False |
147
- | `enable_ocr` | bool | Activer extraction OCR | True |
148
- | `enable_blip` | bool | Activer descriptions BLIP | False |
149
- | `ocr_only` | bool | Mode OCR seul (skip detection) | False |
150
- | `blip_scope` | str | Portée BLIP ("Only image & button" ou "All elements") | "Only image & button" |
151
- | `preprocess` | bool | Activer preprocessing | False |
152
- | `preprocess_mode` | str | Mode preprocessing | "RF-DETR Optimized (Recommended)" |
153
- | `preprocess_preset` | str | Preset preprocessing | "standard" |
154
-
155
- ## 📝 Format de Réponse
156
-
157
- ```json
158
- {
159
- "annotated_image": "base64_encoded_image",
160
- "summary": "Markdown summary text",
161
- "detections_json": {
162
- "success": true,
163
- "detections": [...],
164
- "total_detections": 10,
165
- "image_size": {"width": 1080, "height": 1920},
166
- "parameters": {...},
167
- "type_distribution": {...}
168
- }
169
- }
170
- ```
171
-
172
- ## 🚀 Exemples Complets
173
-
174
- ### Exemple 1: Détection Simple
175
-
176
- ```python
177
- from gradio_client import Client
178
-
179
- client = Client("AI-DrivenTesting/CU1-X")
180
-
181
- result = client.predict(
182
- "screenshot.png",
183
- 0.35, 2, False, True, False, False, "Only image & button",
184
- False, "RF-DETR Optimized (Recommended)", "standard",
185
- api_name="/predict"
186
- )
187
-
188
- annotated_image, summary, detections = result
189
- print(f"Found {detections['total_detections']} elements")
190
- ```
191
-
192
- ### Exemple 2: Détection Complète avec CLIP
193
-
194
- ```python
195
- result = client.predict(
196
- "screenshot.png",
197
- 0.35, 2, True, True, False, False, "Only image & button",
198
- False, "RF-DETR Optimized (Recommended)", "standard",
199
- api_name="/predict"
200
- )
201
- ```
202
-
203
- ### Exemple 3: OCR Seulement
204
-
205
- ```python
206
- result = client.predict(
207
- "screenshot.png",
208
- 0.35, 2, False, True, False, True, "Only image & button",
209
- False, "RF-DETR Optimized (Recommended)", "standard",
210
- api_name="/predict"
211
- )
212
- ```
213
-
214
- ## ⚠️ Limitations HF Spaces
215
-
216
- 1. **Timeout:** 60 secondes par défaut (peut être augmenté dans Settings)
217
- 2. **Mémoire:** Limite selon le hardware choisi
218
- 3. **CPU/GPU:** Performance dépend du hardware sélectionné
219
- 4. **API FastAPI:** Non accessible depuis l'extérieur
220
-
221
- ## 🔗 Liens Utiles
222
-
223
- - [Gradio Client Docs](https://www.gradio.app/guides/getting-started-with-the-python-client)
224
- - [HF Spaces API Docs](https://huggingface.co/docs/hub/spaces-sdks-gradio#api-tab)
225
- - [HF Authentication](https://huggingface.co/docs/hub/security-tokens)
226
-
227
- ## 💡 Tips
228
-
229
- - Utilisez `gradio_client` pour une meilleure gestion des images
230
- - Pour les gros fichiers, utilisez des URLs publiques
231
- - Activez le preprocessing pour des résultats cohérents sur différents devices
232
- - Mode OCR-only est plus rapide si vous voulez juste le texte
233
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DEPLOYMENT.md DELETED
@@ -1,164 +0,0 @@
1
- # 🚀 Guide de Déploiement Hugging Face Spaces
2
-
3
- ## 📋 Scripts Disponibles
4
-
5
- ### 1. `check_hf_space.sh` - Vérification Pré-Déploiement
6
-
7
- Vérifie que tout est prêt avant de déployer:
8
-
9
- ```bash
10
- ./check_hf_space.sh
11
- ```
12
-
13
- **Vérifie:**
14
- - ✅ Python version (>= 3.12)
15
- - ✅ Fichiers requis (app.py, requirements.txt, etc.)
16
- - ✅ Répertoires requis (detection/, api/, ui/, rfdetr/)
17
- - ✅ model.pth présent et tracké par Git LFS
18
- - ✅ Configuration Git LFS
19
- - ✅ Métadonnées README.md (frontmatter YAML)
20
- - ✅ requirements.txt complet
21
- - ✅ Syntaxe Python valide
22
- - ✅ Configuration Git et remote HF
23
- - ✅ Connexion Hugging Face CLI
24
-
25
- ### 2. `deploy_hf_space.sh` - Déploiement Automatique
26
-
27
- Déploie automatiquement vers Hugging Face Spaces:
28
-
29
- ```bash
30
- ./deploy_hf_space.sh
31
- ```
32
-
33
- **Fait automatiquement:**
34
- - ✅ Configure Git LFS pour model.pth
35
- - ✅ Vérifie/configure le remote HF
36
- - ✅ Vérifie la connexion HF
37
- - ✅ Met à jour requirements.txt si nécessaire
38
- - ✅ Stage tous les fichiers
39
- - ✅ Commit avec message descriptif
40
- - ✅ Push vers HF Spaces
41
- - ✅ Affiche l'URL du Space
42
-
43
- ## 🎯 Workflow Recommandé
44
-
45
- ### Étape 1: Vérifier
46
-
47
- ```bash
48
- ./check_hf_space.sh
49
- ```
50
-
51
- **Résultat attendu:**
52
- ```
53
- ✅ All checks passed! Ready to deploy! ✨
54
- ```
55
-
56
- ### Étape 2: Déployer
57
-
58
- ```bash
59
- ./deploy_hf_space.sh
60
- ```
61
-
62
- Le script va:
63
- 1. Vérifier Git LFS
64
- 2. Configurer le remote si nécessaire
65
- 3. Vérifier la connexion HF
66
- 4. Commit et push
67
- 5. Afficher l'URL du Space
68
-
69
- ### Étape 3: Suivre le Build
70
-
71
- Le script affichera l'URL de votre Space:
72
- ```
73
- https://huggingface.co/spaces/YOUR_USERNAME/CU1-X
74
- ```
75
-
76
- Cliquez sur **"Logs"** pour voir le build en direct.
77
-
78
- ## 📡 Accéder à l'API
79
-
80
- Une fois déployé, votre API est accessible via:
81
-
82
- ### API Gradio Native
83
-
84
- ```python
85
- from gradio_client import Client
86
-
87
- client = Client("AI-DrivenTesting/CU1-X")
88
- result = client.predict(
89
- "screenshot.png",
90
- 0.35, 2, True, True, False, False, "Only image & button",
91
- False, "RF-DETR Optimized (Recommended)", "standard",
92
- api_name="/predict"
93
- )
94
- ```
95
-
96
- **Voir:** `API_USAGE.md` pour plus de détails
97
-
98
- ## 🔧 Dépannage
99
-
100
- ### Erreur: "Git LFS not installed"
101
-
102
- ```bash
103
- # macOS
104
- brew install git-lfs
105
- git lfs install
106
-
107
- # Linux
108
- sudo apt install git-lfs
109
- git lfs install
110
- ```
111
-
112
- ### Erreur: "Not logged in"
113
-
114
- ```bash
115
- hf login
116
- # OU
117
- huggingface-cli login
118
- ```
119
-
120
- ### Erreur: "model.pth not tracked by LFS"
121
-
122
- ```bash
123
- git lfs track "*.pth"
124
- git add .gitattributes model.pth
125
- git commit -m "Add model with LFS"
126
- ```
127
-
128
- ### Erreur: "No remote configured"
129
-
130
- Le script `deploy_hf_space.sh` vous demandera de configurer le remote automatiquement.
131
-
132
- Ou manuellement:
133
- ```bash
134
- git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/CU1-X
135
- ```
136
-
137
- ## 📊 Checklist Rapide
138
-
139
- Avant de déployer:
140
-
141
- - [ ] `./check_hf_space.sh` passe tous les tests
142
- - [ ] Git LFS installé et configuré
143
- - [ ] Connecté à Hugging Face (`hf login`)
144
- - [ ] model.pth présent (~510MB)
145
- - [ ] Remote HF configuré
146
-
147
- Pour déployer:
148
-
149
- ```bash
150
- ./deploy_hf_space.sh
151
- ```
152
-
153
- ## 🎉 Après le Déploiement
154
-
155
- Votre Space sera accessible à:
156
- - **Interface Web:** `https://huggingface.co/spaces/YOUR_USERNAME/CU1-X`
157
- - **API:** `https://YOUR_USERNAME-CU1-X.hf.space/api/predict`
158
-
159
- **Temps de build:** 5-10 minutes (première fois)
160
-
161
- ---
162
-
163
- **Besoin d'aide?** Consultez `API_USAGE.md` pour utiliser l'API!
164
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
QUICK_DEPLOY.md DELETED
@@ -1,54 +0,0 @@
1
- # ⚡ Déploiement Rapide - 2 Commandes
2
-
3
- ## 🚀 Déployer en 2 Étapes
4
-
5
- ### 1️⃣ Vérifier que tout est OK
6
-
7
- ```bash
8
- ./check_hf_space.sh
9
- ```
10
-
11
- **Résultat attendu:** ✅ All checks passed!
12
-
13
- ### 2️⃣ Déployer vers HF Spaces
14
-
15
- ```bash
16
- ./deploy_hf_space.sh
17
- ```
18
-
19
- **C'est tout!** 🎉
20
-
21
- ## 📡 Après le Déploiement
22
-
23
- Votre Space sera accessible à:
24
- - **Web UI:** https://huggingface.co/spaces/AI-DrivenTesting/CU1-X
25
- - **API:** https://AI-DrivenTesting-CU1-X.hf.space/api/predict
26
-
27
- ## 🔌 Utiliser l'API
28
-
29
- ```python
30
- from gradio_client import Client
31
-
32
- client = Client("AI-DrivenTesting/CU1-X")
33
- result = client.predict(
34
- "screenshot.png",
35
- 0.35, 2, True, True, False, False, "Only image & button",
36
- False, "RF-DETR Optimized (Recommended)", "standard",
37
- api_name="/predict"
38
- )
39
-
40
- annotated_image, summary, detections = result
41
- print(detections)
42
- ```
43
-
44
- **Voir:** `API_USAGE.md` pour plus d'exemples
45
-
46
- ## ⏱️ Temps de Build
47
-
48
- - **Premier build:** 5-10 minutes
49
- - **Builds suivants:** 2-3 minutes
50
-
51
- ---
52
-
53
- **C'est tout! Simple et rapide! 🚀**
54
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README_DEPLOYMENT.md DELETED
@@ -1,81 +0,0 @@
1
- # 📦 Résumé - Fichiers de Déploiement HF Spaces
2
-
3
- ## ✅ Fichiers Créés
4
-
5
- ### 🚀 Scripts de Déploiement
6
-
7
- 1. **`check_hf_space.sh`** - Script de vérification pré-déploiement
8
- - Vérifie 10 points critiques
9
- - Affiche warnings et erreurs
10
- - Exit code 0 si OK, 1 si erreurs
11
-
12
- 2. **`deploy_hf_space.sh`** - Script de déploiement automatique
13
- - Configure Git LFS automatiquement
14
- - Vérifie/configure remote HF
15
- - Commit et push vers HF Spaces
16
- - Affiche l'URL du Space
17
-
18
- ### 📚 Documentation
19
-
20
- 1. **`API_USAGE.md`** - Guide complet d'utilisation de l'API
21
- - Comment utiliser l'API Gradio native
22
- - Exemples Python et REST
23
- - Paramètres et format de réponse
24
-
25
- 2. **`DEPLOYMENT.md`** - Guide de déploiement détaillé
26
- - Workflow étape par étape
27
- - Dépannage
28
- - Checklist
29
-
30
- 3. **`QUICK_DEPLOY.md`** - Guide ultra-rapide
31
- - 2 commandes pour déployer
32
- - Exemple API rapide
33
-
34
- ### 📝 Configuration
35
-
36
- 1. **`requirements-full.txt`** - Toutes les dépendances
37
- 2. **`requirements.txt`** - Copie de requirements-full.txt (pour HF)
38
- 3. **`.gitattributes`** - Configuration Git LFS pour *.pth
39
- 4. **`README.md`** - Mis à jour avec métadonnées HF Spaces
40
-
41
- ### 💡 Exemples
42
-
43
- 1. **`examples/api_example.py`** - Exemple Python d'utilisation de l'API
44
-
45
- ## 🎯 Utilisation
46
-
47
- ### Vérifier avant déploiement:
48
- ```bash
49
- ./check_hf_space.sh
50
- ```
51
-
52
- ### Déployer:
53
- ```bash
54
- ./deploy_hf_space.sh
55
- ```
56
-
57
- ## 📊 État Actuel
58
-
59
- ✅ **Tout est prêt!**
60
-
61
- - ✅ Tous les fichiers requis présents
62
- - ✅ model.pth tracké par Git LFS
63
- - ✅ Git LFS configuré
64
- - ✅ README.md avec métadonnées HF
65
- - ✅ requirements.txt complet
66
- - ✅ Remote HF configuré
67
- - ✅ Connecté à Hugging Face
68
-
69
- **Prochaine étape:** `./deploy_hf_space.sh`
70
-
71
- ## 🔗 URLs Importantes
72
-
73
- Une fois déployé:
74
- - **Space:** https://huggingface.co/spaces/AI-DrivenTesting/CU1-X
75
- - **API:** https://AI-DrivenTesting-CU1-X.hf.space/api/predict
76
- - **Logs:** https://huggingface.co/spaces/AI-DrivenTesting/CU1-X/logs
77
-
78
- ---
79
-
80
- **Tout est prêt pour le déploiement! 🚀**
81
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
START.md DELETED
@@ -1,314 +0,0 @@
1
- # 🚀 Quick Start Guide
2
-
3
- ## Unified Architecture API
4
-
5
- The project now uses a **unified architecture** where every interface goes through the REST API.
6
-
7
- ```
8
- ┌─────────────────────────────────────────────┐
9
- │ │
10
- │ Gradio UI (app.py / app_ui.py) │
11
- │ │
12
- └──────────────────┬──────────────────────────┘
13
-
14
- │ HTTP/REST
15
-
16
- ┌──────────────────▼──────────────────────────┐
17
- │ │
18
- │ FastAPI Server (app_api.py) │
19
- │ │
20
- ├─────────────────────────────────────────────┤
21
- │ Detection Service │
22
- │ ├─ RF-DETR (detection) │
23
- │ ├─ CLIP (classification) │
24
- │ ├─ OCR (text extraction) │
25
- │ └─ BLIP (visual description) │
26
- └─────────────────────────────────────────────┘
27
- ```
28
-
29
- ---
30
-
31
- ## 🎯 3 Ways to Launch
32
-
33
- ### Option 1: Automatic Launch (Recommended for tests)
34
-
35
- **One command starts everything:**
36
-
37
- ```bash
38
- python app.py
39
- ```
40
-
41
- **What happens:**
42
- 1. ✅ Starts the API in the background (port 8000)
43
- 2. ✅ Waits until the API is ready
44
- 3. ✅ Launches the Gradio interface (port 7860)
45
- 4. ✅ Handles clean shutdown with Ctrl+C
46
-
47
- **Access:**
48
- - Gradio Interface: http://localhost:7860
49
- - API Docs: http://localhost:8000/docs
50
-
51
- ---
52
-
53
- ### Option 2: Manual Launch (2 terminals)
54
-
55
- **For more control and debugging:**
56
-
57
- **Terminal 1 - API Server:**
58
- ```bash
59
- python app_api.py
60
- ```
61
-
62
- **Terminal 2 - Gradio UI:**
63
- ```bash
64
- python app_ui.py
65
- ```
66
-
67
- **Access:**
68
- - Gradio Interface: http://localhost:7860
69
- - API Docs: http://localhost:8000/docs
70
-
71
- ---
72
-
73
- ### Option 3: API Only
74
-
75
- **To use only the API (integration, scripts, etc.):**
76
-
77
- ```bash
78
- python app_api.py
79
- ```
80
-
81
- **Test the API:**
82
- ```bash
83
- # Health check
84
- curl http://localhost:8000/health
85
-
86
- # Detect elements
87
- curl -X POST "http://localhost:8000/detect" \
88
- -F "image=@screenshot.png" \
89
- -F "confidence_threshold=0.35" \
90
- -F "enable_clip=true" \
91
- -F "enable_ocr=true"
92
- ```
93
-
94
- **Interactive documentation:**
95
- - OpenAPI Docs: http://localhost:8000/docs
96
- - ReDoc: http://localhost:8000/redoc
97
-
98
- ---
99
-
100
- ## 🔧 Configuration
101
-
102
- ### Environment Variables
103
-
104
- **API Server:**
105
- ```bash
106
- export UVICORN_HOST="0.0.0.0" # Default: 0.0.0.0
107
- export UVICORN_PORT="8000" # Default: 8000
108
- ```
109
-
110
- **Gradio UI:**
111
- ```bash
112
- export GRADIO_SERVER_NAME="0.0.0.0" # Default: 0.0.0.0
113
- export GRADIO_SERVER_PORT="7860" # Default: 7860
114
- export CU1_API_URL="http://localhost:8000" # API URL
115
- ```
116
-
117
- **Example with custom ports:**
118
- ```bash
119
- # API on port 9000, UI on port 9001
120
- export UVICORN_PORT="9000"
121
- export GRADIO_SERVER_PORT="9001"
122
- export CU1_API_URL="http://localhost:9000"
123
-
124
- python app.py
125
- ```
126
-
127
- ---
128
-
129
- ## 🧪 Quick Tests
130
-
131
- ### Test 1: Make sure the API works
132
-
133
- ```bash
134
- # In one terminal
135
- python app_api.py
136
-
137
- # In another terminal
138
- curl http://localhost:8000/health
139
- ```
140
-
141
- **Expected result:**
142
- ```json
143
- {
144
- "status": "healthy",
145
- "cuda_available": false,
146
- "device": "cpu"
147
- }
148
- ```
149
-
150
- ---
151
-
152
- ### Test 2: Test detection via the interface
153
-
154
- ```bash
155
- python app.py
156
- ```
157
-
158
- 1. Open http://localhost:7860
159
- 2. Upload an image
160
- 3. Click "🔍 Detect Elements"
161
- 4. Check the results
162
-
163
- ---
164
-
165
- ### Test 3: Test detection through the API
166
-
167
- ```bash
168
- # Start the API
169
- python app_api.py
170
-
171
- # In another terminal, test with curl
172
- curl -X POST "http://localhost:8000/detect" \
173
- -F "image=@votre_image.png" \
174
- -F "confidence_threshold=0.35" \
175
- -F "enable_ocr=true" \
176
- | jq .
177
- ```
178
-
179
- ---
180
-
181
- ## 🐛 Troubleshooting
182
-
183
- ### Issue: "Connection Error - Cannot connect to API"
184
-
185
- **Solution:**
186
- 1. Make sure the API is running: `curl http://localhost:8000/health`
187
- 2. Check the ports: no conflict with other apps
188
- 3. Check the API logs for errors
189
-
190
- ### Issue: "Port already in use"
191
-
192
- **Solution:**
193
- ```bash
194
- # Find the process that uses the port
195
- lsof -i :8000 # or :7860
196
-
197
- # Kill the process
198
- kill -9 <PID>
199
-
200
- # Or use a different port
201
- export UVICORN_PORT="9000"
202
- export GRADIO_SERVER_PORT="9001"
203
- ```
204
-
205
- ### Issue: "Module not found"
206
-
207
- **Solution:**
208
- ```bash
209
- # Reinstall dependencies
210
- pip install -r requirements.txt
211
- ```
212
-
213
- ### Issue: Models slow to load
214
-
215
- **Reason:** The first startup downloads the models
216
-
217
- **Solution:** Be patient, the models are cached after the first download
218
- - RF-DETR model (~few MB)
219
- - CLIP model (~600 MB)
220
- - BLIP model (~1 GB)
221
- - EasyOCR models (~100 MB)
222
-
223
- ---
224
-
225
- ## 📊 Monitoring
226
-
227
- ### API logs
228
-
229
- The logs appear in the terminal where you launched `app_api.py`
230
-
231
- ### UI logs
232
-
233
- The logs appear in the terminal where you launched `app.py` or `app_ui.py`
234
-
235
- ### Metrics
236
-
237
- Visit http://localhost:8000/docs to view the API statistics
238
-
239
- ---
240
-
241
- ## ✅ Benefits of the Unified Architecture
242
-
243
- 1. **Single code path** → Easier to maintain
244
- 2. **Consistent behavior** → Same results everywhere
245
- 3. **Easy to test** → Only one API to test
246
- 4. **Scalable** → Can separate API and UI on different servers
247
- 5. **Simplified debugging** → Logs centralized in the API
248
-
249
- ---
250
-
251
- ## 🎯 For Developers
252
-
253
- ### Code Architecture
254
-
255
- ```
256
- .
257
- ├── app.py # ✨ Unified launcher (API + UI)
258
- ├── app_api.py # FastAPI server
259
- ├── app_ui.py # Gradio UI client (manual)
260
-
261
- ├── api/
262
- │ └── endpoints.py # FastAPI endpoints
263
-
264
- ├── detection/
265
- │ ├── service.py # Detection service
266
- │ ├── service_factory.py # Singleton pattern
267
- │ ├── image_utils.py # Image utilities
268
- │ ├── ocr_handler.py # OCR-only processing
269
- │ └── response_builder.py # Response formatting
270
-
271
- └── ui/
272
- ├── detection_wrapper.py # Detection wrappers
273
- ├── gradio_interface.py # Gradio interface (API client)
274
- └── shared_interface.py # Shared UI components
275
- ```
276
-
277
- ### Request Flow
278
-
279
- ```
280
- 1. User uploads image in Gradio
281
-
282
- 2. `detect_with_api()` sends an HTTP POST to `/detect`
283
-
284
- 3. API endpoint validates the request
285
-
286
- 4. `DetectionService.analyze()` processes the image
287
-
288
- 5. Response formatted with `response_builder`
289
-
290
- 6. JSON returned to Gradio UI
291
-
292
- 7. UI displays annotated image + results
293
- ```
294
-
295
- ---
296
-
297
- ## 📝 Notes
298
-
299
- - **Thread Safety:** The service uses a singleton but passes parameters directly to `analyze()` to avoid race conditions
300
- - **Performance:** The first call is slow (model loading), then fast
301
- - **Memory:** Models use ~2-3 GB of RAM
302
- - **GPU:** Automatic CUDA/MPS detection if available
303
-
304
- ---
305
-
306
- ## 🚀 Next Steps
307
-
308
- 1. **Test locally:** `python app.py`
309
- 2. **Explore the API:** http://localhost:8000/docs
310
- 3. **Customize:** Adjust parameters in the interface
311
- 4. **Deploy:** See `DEPLOYMENT.md` for production
312
-
313
- Happy testing! 🎉
314
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
UNIFIED_ARCHITECTURE.md DELETED
@@ -1,443 +0,0 @@
1
- # 🎯 Unified Architecture - Technical Documentation
2
-
3
- ## Date
4
- 2025-11-10
5
-
6
- ## Objective
7
- Unify the architecture so that **all interfaces** go through the REST API, removing the duality between "HF Spaces" mode and "Production" mode.
8
-
9
- ---
10
-
11
- ## ✅ What Changed
12
-
13
- ### BEFORE (Dual Architecture)
14
-
15
- ```
16
- ┌─────────────────────────────────────────────────┐
17
- │ Mode 1: HF Spaces (app.py) │
18
- │ └─> DIRECT access to DetectionService │
19
- │ (no API) │
20
- └─────────────────────────────────────────────────┘
21
-
22
- ┌─────────────────────────────────────────────────┐
23
- │ Mode 2: Production (app_ui.py) │
24
- │ └─> Access via HTTP API │
25
- │ (microservices architecture) │
26
- └─────────────────────────────────────────────────┘
27
- ```
28
-
29
- **Problems:**
30
- - ❌ Two different code paths
31
- - ❌ Potentially different behaviors
32
- - ❌ Complex maintenance (two modes to test)
33
- - ❌ Bugs possible in one mode but not the other
34
-
35
- ---
36
-
37
- ### AFTER (Unified Architecture)
38
-
39
- ```
40
- ┌─────────────────────────────────────────────────┐
41
- │ │
42
- │ ALL INTERFACES │
43
- │ (app.py, app_ui.py, etc.) │
44
- │ │
45
- └────────────────────┬────────────────────────────┘
46
-
47
- │ HTTP/REST
48
- │ (detect_with_api)
49
-
50
- ┌────────────────────▼────────────────────────────┐
51
- │ │
52
- │ FastAPI Server │
53
- │ (api/endpoints.py) │
54
- │ │
55
- ├─────────────────────────────────────────────────┤
56
- │ Detection Service │
57
- │ (detection/service.py) │
58
- │ │
59
- └─────────────────────────────────────────────────┘
60
- ```
61
-
62
- **Benefits:**
63
- - ✅ One single code path
64
- - ✅ Consistent behavior everywhere
65
- - ✅ Simplified maintenance
66
- - ✅ Unified tests
67
- - ✅ Easier debugging
68
-
69
- ---
70
-
71
- ## 📝 File Changes
72
-
73
- ### 1. `app.py` - Major Transformation
74
-
75
- **BEFORE:**
76
- ```python
77
- from ui.detection_wrapper import detect_with_service
78
-
79
- demo = create_interface(
80
- detection_fn=detect_with_service, # Direct access
81
- title_suffix="Hugging Face Spaces Mode",
82
- show_api_info=False
83
- )
84
- ```
85
-
86
- **AFTER:**
87
- ```python
88
- from ui.detection_wrapper import detect_with_api
89
-
90
- # Launch the API as a subprocess
91
- api_process = start_api_server()
92
-
93
- # UI uses the API
94
- detection_fn = partial(detect_with_api, api_url=API_URL)
95
-
96
- demo = create_interface(
97
- detection_fn=detection_fn, # Via API
98
- title_suffix="Unified API Mode",
99
- show_api_info=True,
100
- api_url=API_URL
101
- )
102
- ```
103
-
104
- **New features:**
105
- - 🚀 Automatically starts the API in the background
106
- - ⏳ Waits until the API is ready (health check)
107
- - 🛑 Handles clean shutdown (Ctrl+C)
108
- - 📡 Displays access URLs
109
-
110
- ---
111
-
112
- ### 2. `app_api.py` - Dynamic Configuration
113
-
114
- **Additions:**
115
- ```python
116
- # Support environment variables
117
- host = os.getenv("UVICORN_HOST", "0.0.0.0")
118
- port = int(os.getenv("UVICORN_PORT", "8000"))
119
- ```
120
-
121
- **Allows:**
122
- - Port configuration through environment variables
123
- - Usage by the subprocess in app.py
124
-
125
- ---
126
-
127
- ### 3. Documentation
128
-
129
- **New files:**
130
- - ✨ `START.md` - Complete quick start guide
131
- - ✨ `UNIFIED_ARCHITECTURE.md` - This document
132
- - ✨ `test_unified_architecture.py` - Validation tests
133
-
134
- **Updated files:**
135
- - 📝 `README.md` - Updated Quick Start section
136
- - 📝 `README.md` - Updated HF Spaces section
137
-
138
- ---
139
-
140
- ## 🚀 How to Use
141
-
142
- ### Mode 1: Automatic Launch (Recommended)
143
-
144
- **One command:**
145
- ```bash
146
- python app.py
147
- ```
148
-
149
- **What happens:**
150
- 1. Starts the API as a subprocess (port 8000)
151
- 2. Waits for the health check
152
- 3. Launches the Gradio UI (port 7860)
153
- 4. Both communicate via HTTP
154
-
155
- **Clean shutdown:**
156
- - Ctrl+C stops the UI AND the API automatically
157
-
158
- ---
159
-
160
- ### Mode 2: Manual Launch (Debug)
161
-
162
- **Two terminals:**
163
- ```bash
164
- # Terminal 1
165
- python app_api.py
166
-
167
- # Terminal 2
168
- python app_ui.py
169
- ```
170
-
171
- **Useful for:**
172
- - Viewing logs separately
173
- - Restarting the UI without restarting the API
174
- - Advanced debugging
175
-
176
- ---
177
-
178
- ### Mode 3: API Only
179
-
180
- ```bash
181
- python app_api.py
182
- ```
183
-
184
- **Good for:**
185
- - External integrations
186
- - Python scripts
187
- - API tests
188
-
189
- ---
190
-
191
- ## 🧪 Tests and Validation
192
-
193
- ### Automated Test Script
194
-
195
- ```bash
196
- python test_unified_architecture.py
197
- ```
198
-
199
- **Checks:**
200
- - ✅ All required files exist
201
- - ✅ Valid Python syntax
202
- - ✅ `app.py` uses `detect_with_api`
203
- - ✅ No direct service access from the UI
204
- - ✅ Consistent architecture
205
-
206
- ### Test Results
207
-
208
- ```
209
- ✅✅✅ ALL TESTS PASS!
210
-
211
- 📊 Unified architecture summary:
212
- - ✅ `app.py` launches the API as a subprocess
213
- - ✅ All interfaces use `detect_with_api`
214
- - ✅ Consistent architecture everywhere
215
- - ✅ No direct service access from the UI
216
- ```
217
-
218
- ---
219
-
220
- ## 🔄 Unified Request Flow
221
-
222
- ### Before (Dual Mode)
223
-
224
- **HF Spaces Mode:**
225
- ```
226
- User → Gradio → detect_with_service() → DetectionService.analyze()
227
- ```
228
-
229
- **Production Mode:**
230
- ```
231
- User → Gradio → detect_with_api() → HTTP → API → DetectionService.analyze()
232
- ```
233
-
234
- ### After (Unified Mode)
235
-
236
- **All modes:**
237
- ```
238
- User → Gradio → detect_with_api() → HTTP → API → DetectionService.analyze()
239
- ```
240
-
241
- ---
242
-
243
- ## 📊 Technical Benefits
244
-
245
- ### 1. Maintainability
246
-
247
- **BEFORE:**
248
- - 2 code paths to maintain
249
- - Tests to run for each mode
250
- - Regression risk in one mode
251
-
252
- **AFTER:**
253
- - Only 1 code path
254
- - Unified tests
255
- - Guaranteed identical behavior
256
-
257
- ---
258
-
259
- ### 2. Debugging
260
-
261
- **BEFORE:**
262
- - Bug in `app.py`? Check `detect_with_service`
263
- - Bug in `app_ui.py`? Check `detect_with_api`
264
- - Different per mode
265
-
266
- **AFTER:**
267
- - All bugs go through the API
268
- - Logs centralized in the API
269
- - A single place to debug
270
-
271
- ---
272
-
273
- ### 3. Scalability
274
-
275
- **BEFORE:**
276
- - HF Spaces mode: monolithic
277
- - Production mode: scalable
278
- - Different behaviors
279
-
280
- **AFTER:**
281
- - Same architecture everywhere
282
- - Can easily separate API/UI on different servers
283
- - Load balancing possible
284
-
285
- ---
286
-
287
- ### 4. Testing
288
-
289
- **BEFORE:**
290
- ```bash
291
- # Test HF Spaces
292
- pytest test_app.py
293
-
294
- # Test Production
295
- pytest test_api.py
296
- pytest test_ui.py
297
- ```
298
-
299
- **AFTER:**
300
- ```bash
301
- # Single test suite
302
- pytest test_api.py # Tests the entire logic
303
- ```
304
-
305
- ---
306
-
307
- ## 🔧 Configuration
308
-
309
- ### Environment Variables
310
-
311
- ```bash
312
- # API Server
313
- export UVICORN_HOST="0.0.0.0"
314
- export UVICORN_PORT="8000"
315
-
316
- # Gradio UI
317
- export GRADIO_SERVER_NAME="0.0.0.0"
318
- export GRADIO_SERVER_PORT="7860"
319
- export CU1_API_URL="http://localhost:8000"
320
- ```
321
-
322
- ### Example: Custom Ports
323
-
324
- ```bash
325
- # API on port 9000, UI on port 9001
326
- export UVICORN_PORT="9000"
327
- export GRADIO_SERVER_PORT="9001"
328
- export CU1_API_URL="http://localhost:9000"
329
-
330
- python app.py
331
- ```
332
-
333
- ---
334
-
335
- ## 🎯 Impact on Existing Code
336
-
337
- ### No Breaking Changes
338
-
339
- - ✅ `app_api.py` still works on its own
340
- - ✅ `app_ui.py` still works on its own
341
- - ✅ Python APIs (`DetectionService`) are unchanged
342
- - ✅ Existing scripts keep working
343
-
344
- ### What’s New
345
-
346
- - ✨ `app.py` now launches the API automatically
347
- - ✨ Consistent architecture everywhere
348
- - ✨ Better documentation
349
-
350
- ---
351
-
352
- ## 📈 Metrics
353
-
354
- | Metric | Before | After | Improvement |
355
- |----------|-------|-------|--------------|
356
- | **Code paths** | 2 | 1 | -50% |
357
- | **Testing complexity** | High | Low | -60% |
358
- | **Bug risk** | Medium | Low | -70% |
359
- | **Debugging ease** | Medium | High | +80% |
360
-
361
- ---
362
-
363
- ## 🚨 Points to Watch
364
-
365
- ### 1. Performance
366
-
367
- **Impact:** Negligible (~10-50ms of extra HTTP latency)
368
-
369
- **Why it’s OK:**
370
- - Models take 30-60 seconds
371
- - 50ms HTTP latency = 0.1% of total time
372
- - Negligible compared to processing
373
-
374
- ---
375
-
376
- ### 2. Memory
377
-
378
- **Before (HF Spaces mode):** 1 process
379
- **After:** 2 processes (API + UI)
380
-
381
- **Impact:** +100-200 MB (Gradio UI overhead)
382
-
383
- **Why it’s OK:**
384
- - Models already use 2-3 GB
385
- - +200 MB = 7% overhead
386
- - Acceptable for architectural consistency
387
-
388
- ---
389
-
390
- ### 3. Deployment
391
-
392
- **HF Spaces:** No change
393
- - The `app.py` file handles everything
394
- - Automatically launches API + UI
395
- - Works out of the box
396
-
397
- **Docker:** Possible update
398
- - See `DEPLOYMENT.md` for details
399
- - May require 2 containers or a supervisor
400
-
401
- ---
402
-
403
- ## 🎓 Lessons Learned
404
-
405
- ### 1. Dual Architecture = Bad Idea
406
-
407
- Having two modes (HF Spaces vs Production) seemed convenient at first but created more problems than it solved.
408
-
409
- ### 2. HTTP Overhead Is Negligible
410
-
411
- The HTTP overhead is so small compared to ML processing that it’s negligible. The clean architecture is worth the cost.
412
-
413
- ### 3. Unified Tests = Better Quality
414
-
415
- Having a single code path makes testing much easier and reduces bugs.
416
-
417
- ---
418
-
419
- ## ✅ Conclusion
420
-
421
- Unifying the architecture to a 100% API model is a **success**:
422
-
423
- ✅ **Cleaner code** - Single path
424
- ✅ **Easier to maintain** - Less complexity
425
- ✅ **Easier to test** - Unified tests
426
- ✅ **Consistent behavior** - Same results everywhere
427
- ✅ **No breaking changes** - Backward compatible
428
-
429
- **Result:** Professional, scalable, and maintainable architecture! 🚀
430
-
431
- ---
432
-
433
- ## 📚 Related Documentation
434
-
435
- - 📖 [START.md](START.md) - Quick start guide
436
- - 📖 [README.md](README.md) - Main documentation
437
- - 📖 [DEPLOYMENT.md](DEPLOYMENT.md) - Deployment guide
438
- - 🧪 [test_unified_architecture.py](test_unified_architecture.py) - Tests
439
-
440
- ---
441
-
442
- **Questions?** Check [START.md](START.md) or open an issue on GitHub.
443
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__.py DELETED
@@ -1,35 +0,0 @@
1
- """
2
- CU-1 UI Element Detector
3
-
4
- A powerful UI element detection library for identifying and extracting
5
- information from user interface screenshots.
6
- """
7
-
8
- try:
9
- # When imported as a proper package
10
- from .cu1_detector import (
11
- CU1Detector,
12
- predict,
13
- get_predictions_json,
14
- get_prediction_image,
15
- get_detector
16
- )
17
- except Exception:
18
- # Fallback for direct import context (e.g., pytest collecting project root)
19
- from cu1_detector import ( # type: ignore
20
- CU1Detector,
21
- predict,
22
- get_predictions_json,
23
- get_prediction_image,
24
- get_detector
25
- )
26
-
27
- __version__ = "1.0.0"
28
- __all__ = [
29
- "CU1Detector",
30
- "predict",
31
- "get_predictions_json",
32
- "get_prediction_image",
33
- "get_detector"
34
- ]
35
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
api/endpoints.py CHANGED
@@ -188,6 +188,7 @@ async def detect_ui_elements(
188
  detail="When ocr_only=true, enable_clip and enable_blip must be false"
189
  )
190
 
 
191
  # OCR-only path: Bypass detection service
192
  if ocr_only:
193
  detections = ocr_handler.process_ocr_only(pil_image)
@@ -197,15 +198,25 @@ async def detect_ui_elements(
197
  thickness=line_thickness,
198
  return_format="numpy"
199
  )
200
- return response_builder.build_ocr_only_response(
201
- detections=detections,
202
- image_width=pil_image.width,
203
- image_height=pil_image.height,
 
 
 
 
204
  annotated_image=annotated,
205
  confidence_threshold=confidence_threshold,
206
- line_thickness=line_thickness
 
 
 
 
 
207
  )
208
 
 
209
  # Standard detection path: Use detection service
210
  import time
211
  start_time = time.time()
@@ -248,8 +259,9 @@ async def detect_ui_elements(
248
  total_time = time.time() - start_time
249
  print(f"[API] Total detection time: {total_time:.2f}s")
250
 
251
- # Build response
252
- return response_builder.build_detection_response(
 
253
  analysis=analysis,
254
  image=pil_image,
255
  annotated_image=annotated,
@@ -259,9 +271,9 @@ async def detect_ui_elements(
259
  enable_ocr=enable_ocr,
260
  enable_blip=enable_blip,
261
  blip_scope=blip_scope,
262
- ocr_only=False,
263
- include_annotated_image=True
264
  )
 
265
 
266
  except HTTPException:
267
  raise
 
188
  detail="When ocr_only=true, enable_clip and enable_blip must be false"
189
  )
190
 
191
+
192
  # OCR-only path: Bypass detection service
193
  if ocr_only:
194
  detections = ocr_handler.process_ocr_only(pil_image)
 
198
  thickness=line_thickness,
199
  return_format="numpy"
200
  )
201
+ # Build analysis structure for simplified response
202
+ analysis = {
203
+ "detections": detections,
204
+ "image_size": {"width": pil_image.width, "height": pil_image.height}
205
+ }
206
+ return response_builder.build_simplified_response(
207
+ analysis=analysis,
208
+ image=pil_image,
209
  annotated_image=annotated,
210
  confidence_threshold=confidence_threshold,
211
+ line_thickness=line_thickness,
212
+ enable_clip=False,
213
+ enable_ocr=True,
214
+ enable_blip=False,
215
+ blip_scope=None,
216
+ ocr_only=True
217
  )
218
 
219
+
220
  # Standard detection path: Use detection service
221
  import time
222
  start_time = time.time()
 
259
  total_time = time.time() - start_time
260
  print(f"[API] Total detection time: {total_time:.2f}s")
261
 
262
+
263
+ # Build response using simplified format
264
+ return response_builder.build_simplified_response(
265
  analysis=analysis,
266
  image=pil_image,
267
  annotated_image=annotated,
 
271
  enable_ocr=enable_ocr,
272
  enable_blip=enable_blip,
273
  blip_scope=blip_scope,
274
+ ocr_only=False
 
275
  )
276
+
277
 
278
  except HTTPException:
279
  raise
app.py CHANGED
@@ -1,197 +1,84 @@
1
  """
2
- Unified Entry Point - API Architecture
3
 
4
- This file now uses a unified API-based architecture for all deployments.
5
- Both local development and Hugging Face Spaces use the same API layer.
 
 
6
 
7
- Architecture:
8
- 1. Starts API server in background (subprocess)
9
- 2. Starts Gradio UI that connects to the API
10
- 3. Everything goes through HTTP/REST
11
-
12
- Benefits:
13
- - Single code path to maintain
14
- - Consistent behavior everywhere
15
- - Easy to test and debug
16
- - Proper separation of concerns
17
 
18
  Usage:
19
  python app.py
20
-
21
- The script will automatically:
22
- - Start the API server on http://localhost:8000
23
- - Start the Gradio UI on http://localhost:7860
24
  """
25
 
26
  import os
27
  os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
28
 
29
- import subprocess
30
- import time
31
  import sys
32
- import signal
33
- import requests
34
- from functools import partial
35
 
36
- # Use shared UI components
37
  from ui.shared_interface import create_interface
38
- from ui.detection_wrapper import detect_with_api
39
 
40
 
41
  # Configuration
42
- API_HOST = os.getenv("API_HOST", "0.0.0.0")
43
- API_PORT = int(os.getenv("API_PORT", "8000"))
44
- API_URL = f"http://localhost:{API_PORT}"
45
-
46
  UI_HOST = os.getenv("GRADIO_SERVER_NAME", "0.0.0.0")
47
  UI_PORT = int(os.getenv("GRADIO_SERVER_PORT", "7860"))
48
 
49
 
50
- def start_api_server():
51
- """Start the API server in a subprocess"""
52
- print("🚀 Starting API server...")
53
-
54
- # Start API server as subprocess
55
- api_process = subprocess.Popen(
56
- [sys.executable, "app_api.py"],
57
- env={**os.environ, "UVICORN_HOST": API_HOST, "UVICORN_PORT": str(API_PORT)},
58
- stdout=subprocess.PIPE,
59
- stderr=subprocess.STDOUT,
60
- text=True,
61
- bufsize=1
62
- )
63
-
64
- # Wait for API to be ready
65
- max_wait = 60 # seconds
66
- wait_interval = 0.5
67
- elapsed = 0
68
-
69
- print(f"⏳ Waiting for API server at {API_URL}...")
70
-
71
- while elapsed < max_wait:
72
- try:
73
- response = requests.get(f"{API_URL}/health", timeout=2)
74
- if response.status_code == 200:
75
- print(f"✅ API server ready at {API_URL}")
76
-
77
- # Optional: Warmup models to avoid timeout on first request
78
- # This is especially useful for CPU-only environments
79
- warmup_enabled = os.getenv("CU1_WARMUP_MODELS", "true").lower() in {"1", "true", "yes", "y"}
80
- if warmup_enabled:
81
- print("🔥 Warming up models (this may take 1-3 minutes on first run)...")
82
- try:
83
- warmup_timeout = int(os.getenv("CU1_WARMUP_TIMEOUT", "180")) # 3 minutes default
84
- warmup_response = requests.post(f"{API_URL}/warmup", timeout=warmup_timeout)
85
- if warmup_response.status_code == 200:
86
- print("✅ Models warmed up successfully!")
87
- else:
88
- print(f"⚠️ Warmup returned status {warmup_response.status_code}, continuing anyway...")
89
- except requests.exceptions.Timeout:
90
- print("⚠️ Warmup timed out, but API is ready. First request may be slower.")
91
- except requests.exceptions.RequestException as e:
92
- print(f"⚠️ Warmup failed: {e}, but API is ready. First request may be slower.")
93
-
94
- return api_process
95
- except requests.exceptions.RequestException:
96
- pass
97
-
98
- time.sleep(wait_interval)
99
- elapsed += wait_interval
100
-
101
- # Check if process died
102
- if api_process.poll() is not None:
103
- print("❌ API server failed to start!")
104
- print("\nAPI server output:")
105
- if api_process.stdout:
106
- print(api_process.stdout.read())
107
- sys.exit(1)
108
-
109
- print(f"❌ API server did not start within {max_wait} seconds")
110
- api_process.terminate()
111
- sys.exit(1)
112
-
113
-
114
  def main():
115
- """Main entry point - Unified API architecture"""
116
 
117
  print("=" * 70)
118
- print("🎯 CU-1 UI Element Detector - Unified API Mode")
119
  print("=" * 70)
120
- print("\n📡 Architecture: All traffic goes through API layer")
121
- print(f" - API Server: {API_URL}")
122
  print(f" - Gradio UI: http://localhost:{UI_PORT}")
123
  print("\n🏗️ Benefits:")
124
- print(" - Single code path (easier to maintain)")
125
- print(" - Consistent behavior everywhere")
126
- print(" - Proper microservices architecture")
127
  print("=" * 70 + "\n")
128
 
129
- # Start API server in background
130
- api_process = start_api_server()
131
-
132
- # Setup cleanup on exit
133
- def cleanup(signum=None, frame=None):
134
- print("\n\n🛑 Shutting down...")
135
- if api_process and api_process.poll() is None:
136
- print(" Stopping API server...")
137
- api_process.terminate()
138
- try:
139
- api_process.wait(timeout=5)
140
- except subprocess.TimeoutExpired:
141
- api_process.kill()
142
- print(" Goodbye! 👋")
143
- sys.exit(0)
144
-
145
- signal.signal(signal.SIGINT, cleanup)
146
- signal.signal(signal.SIGTERM, cleanup)
147
-
148
  try:
149
- # Create Gradio interface with API detection function
150
- detection_fn = partial(detect_with_api, api_url=API_URL)
151
-
152
  demo = create_interface(
153
- detection_fn=detection_fn,
154
- title_suffix="Unified API Mode",
155
- show_api_info=True,
156
- api_url=API_URL
157
  )
158
 
159
  print(f"\n🎨 Starting Gradio UI on http://localhost:{UI_PORT}...\n")
160
 
161
  # Launch Gradio with automatic port fallback
162
  # API is automatically exposed at /api/predict for HF Spaces
163
- # Configure queue with longer timeout for CPU processing and model loading
164
  try:
165
- demo.queue(
166
- max_size=10, # Allow up to 10 queued requests
167
- default_concurrency_limit=1 # Process one at a time to avoid memory issues
168
- ).launch(
169
  server_name=UI_HOST,
170
  server_port=UI_PORT,
171
- share=False,
172
- max_threads=1 # Single thread to avoid memory issues
173
  )
174
  except OSError as e:
175
  if "Cannot find empty port" in str(e):
176
  print(f"⚠️ Port {UI_PORT} is busy, trying to find a free port...")
177
- demo.queue(
178
- max_size=10,
179
- default_concurrency_limit=1
180
- ).launch(
181
  server_name=UI_HOST,
182
  server_port=None, # Auto-select free port
183
- share=False,
184
- max_threads=1
185
  )
186
  else:
187
  raise
188
  except KeyboardInterrupt:
189
- cleanup()
 
190
  except Exception as e:
191
  print(f"\n❌ Error: {e}")
192
- cleanup()
193
- finally:
194
- cleanup()
195
 
196
 
197
  if __name__ == "__main__":
 
1
  """
2
+ Unified Entry Point - Direct Mode for HuggingFace Spaces
3
 
4
+ Simplified architecture for HuggingFace Spaces:
5
+ - Direct service access (no API subprocess)
6
+ - Faster and more reliable
7
+ - No HTTP overhead
8
 
9
+ For production with separated API/UI, use:
10
+ - python app_api.py (API server)
11
+ - python app_ui.py (UI client)
 
 
 
 
 
 
 
12
 
13
  Usage:
14
  python app.py
 
 
 
 
15
  """
16
 
17
  import os
18
  os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
19
 
 
 
20
  import sys
 
 
 
21
 
22
+ # Use shared UI components with DIRECT service access
23
  from ui.shared_interface import create_interface
24
+ from ui.detection_wrapper import detect_with_service
25
 
26
 
27
  # Configuration
 
 
 
 
28
  UI_HOST = os.getenv("GRADIO_SERVER_NAME", "0.0.0.0")
29
  UI_PORT = int(os.getenv("GRADIO_SERVER_PORT", "7860"))
30
 
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  def main():
33
+ """Main entry point - Direct service mode for HuggingFace Spaces"""
34
 
35
  print("=" * 70)
36
+ print("🎯 CU-1 UI Element Detector - Direct Mode")
37
  print("=" * 70)
38
+ print("\n📡 Architecture: Direct service access (optimized for HF Spaces)")
 
39
  print(f" - Gradio UI: http://localhost:{UI_PORT}")
40
  print("\n🏗️ Benefits:")
41
+ print(" - Faster (no HTTP overhead)")
42
+ print(" - More reliable (no subprocess)")
43
+ print(" - Simpler architecture")
44
  print("=" * 70 + "\n")
45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  try:
47
+ # Create Gradio interface with DIRECT detection function
 
 
48
  demo = create_interface(
49
+ detection_fn=detect_with_service,
50
+ title_suffix="Direct Mode",
51
+ show_api_info=False
 
52
  )
53
 
54
  print(f"\n🎨 Starting Gradio UI on http://localhost:{UI_PORT}...\n")
55
 
56
  # Launch Gradio with automatic port fallback
57
  # API is automatically exposed at /api/predict for HF Spaces
 
58
  try:
59
+ demo.queue().launch(
 
 
 
60
  server_name=UI_HOST,
61
  server_port=UI_PORT,
62
+ share=False
 
63
  )
64
  except OSError as e:
65
  if "Cannot find empty port" in str(e):
66
  print(f"⚠️ Port {UI_PORT} is busy, trying to find a free port...")
67
+ demo.queue().launch(
 
 
 
68
  server_name=UI_HOST,
69
  server_port=None, # Auto-select free port
70
+ share=False
 
71
  )
72
  else:
73
  raise
74
  except KeyboardInterrupt:
75
+ print("\n\n🛑 Shutting down... Goodbye! 👋")
76
+ sys.exit(0)
77
  except Exception as e:
78
  print(f"\n❌ Error: {e}")
79
+ import traceback
80
+ traceback.print_exc()
81
+ sys.exit(1)
82
 
83
 
84
  if __name__ == "__main__":
app_api.py DELETED
@@ -1,58 +0,0 @@
1
- """
2
- API Server Entry Point
3
-
4
- Starts the FastAPI server for UI element detection.
5
-
6
- Usage:
7
- python app_api.py
8
-
9
- The API will be available at:
10
- - Root: http://localhost:8000
11
- - Detect endpoint: http://localhost:8000/detect
12
- - Health check: http://localhost:8000/health
13
- - Interactive docs: http://localhost:8000/docs
14
- """
15
-
16
- import os
17
- os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
18
-
19
- import uvicorn
20
- from api.endpoints import app
21
-
22
-
23
- def main():
24
- """Start the API server"""
25
- # Get configuration from environment
26
- host = os.getenv("UVICORN_HOST", "0.0.0.0")
27
- port = int(os.getenv("UVICORN_PORT", "8000"))
28
-
29
- print("=" * 70)
30
- print("🚀 CU-1 UI Element Detector - API Server")
31
- print("=" * 70)
32
- print("\n📐 Architecture:")
33
- print(" RF-DETR: Detects UI elements (single class)")
34
- print(" CLIP: Classifies elements into 6 types")
35
- print(" OCR: Extracts text content")
36
- print(" BLIP: Generates visual descriptions")
37
- print(f"\n📡 API Endpoints:")
38
- print(f" - Root: http://localhost:{port}")
39
- print(f" - Detect: http://localhost:{port}/detect")
40
- print(f" - Health: http://localhost:{port}/health")
41
- print(f" - Warmup: http://localhost:{port}/warmup (preload models)")
42
- print(f" - Docs: http://localhost:{port}/docs")
43
- print("\n💡 Tip: The Gradio UI connects to this API")
44
- print(" Run 'python app_ui.py' in another terminal")
45
- print(" Or run 'python app.py' to start both automatically")
46
- print("=" * 70 + "\n")
47
-
48
- uvicorn.run(
49
- app,
50
- host=host,
51
- port=port,
52
- log_level="info"
53
- )
54
-
55
-
56
- if __name__ == "__main__":
57
- main()
58
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app_ui.py DELETED
@@ -1,80 +0,0 @@
1
- """
2
- Gradio UI Server Entry Point
3
-
4
- Starts the Gradio web interface for UI element detection.
5
-
6
- IMPORTANT: The API server must be running for this to work!
7
-
8
- Usage:
9
- # Terminal 1: Start API server
10
- python app_api.py
11
-
12
- # Terminal 2: Start UI server
13
- python app_ui.py
14
-
15
- The UI will be available at:
16
- - Gradio Interface: http://localhost:7860
17
-
18
- Configuration:
19
- Set environment variables to customize:
20
- - CU1_API_URL: API endpoint (default: http://localhost:8000)
21
- - GRADIO_SERVER_NAME: Server host (default: 0.0.0.0)
22
- - GRADIO_SERVER_PORT: Server port (default: 7860)
23
- - GRADIO_SHARE: Enable sharing (default: false)
24
- """
25
-
26
- import os
27
- os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'
28
-
29
- from ui.gradio_interface import create_gradio_interface
30
-
31
-
32
- def main():
33
- """Start the Gradio UI server"""
34
- api_url = os.getenv("CU1_API_URL", "http://localhost:8000")
35
-
36
- print("=" * 70)
37
- print("🎨 CU-1 UI Element Detector - Gradio UI")
38
- print("=" * 70)
39
- print("\n⚠️ IMPORTANT: Make sure the API server is running!")
40
- print(" If not started, run in another terminal:")
41
- print(" python app_api.py")
42
- print(f"\n🔗 API Connection: {api_url}")
43
- print(" Change with: export CU1_API_URL=http://your-api:8000")
44
- print("\n📱 Gradio Interface: http://localhost:7860")
45
- print("\n🏗️ Architecture:")
46
- print(" This UI is a CLIENT of the API (service-oriented)")
47
- print(" All detection logic runs in the API server")
48
- print(" UI communicates via HTTP/REST")
49
- print("=" * 70 + "\n")
50
-
51
- demo = create_gradio_interface()
52
-
53
- # Read configuration from environment
54
- server_name = os.getenv("GRADIO_SERVER_NAME", "0.0.0.0")
55
- port_env = os.getenv("GRADIO_SERVER_PORT") or os.getenv("PORT")
56
- server_port = int(port_env) if port_env and port_env.isdigit() else 7860
57
- share_env = os.getenv("GRADIO_SHARE", "false").lower()
58
- share = share_env in {"1", "true", "yes", "y"}
59
-
60
- try:
61
- demo.queue().launch(
62
- server_name=server_name,
63
- server_port=server_port,
64
- share=share
65
- )
66
- except OSError as e:
67
- if "Cannot find empty port" in str(e):
68
- print(f"\n⚠️ Port {server_port} is busy, trying to find a free port...")
69
- demo.queue().launch(
70
- server_name=server_name,
71
- server_port=None, # Auto-select free port
72
- share=share
73
- )
74
- else:
75
- raise
76
-
77
-
78
- if __name__ == "__main__":
79
- main()
80
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
check_hf_space.sh DELETED
@@ -1,286 +0,0 @@
1
- #!/bin/bash
2
- # Script de vérification pour Hugging Face Spaces
3
- # Vérifie que tout est prêt pour le déploiement
4
-
5
- set -e
6
-
7
- # Colors
8
- RED='\033[0;31m'
9
- GREEN='\033[0;32m'
10
- YELLOW='\033[1;33m'
11
- BLUE='\033[0;34m'
12
- NC='\033[0m'
13
-
14
- print_info() { echo -e "${BLUE}ℹ️ $1${NC}"; }
15
- print_success() { echo -e "${GREEN}✅ $1${NC}"; }
16
- print_warning() { echo -e "${YELLOW}⚠️ $1${NC}"; }
17
- print_error() { echo -e "${RED}❌ $1${NC}"; }
18
-
19
- FAILURES=0
20
- WARNINGS=0
21
-
22
- echo ""
23
- print_info "🔍 Hugging Face Spaces Pre-Deployment Check"
24
- echo "================================================"
25
- echo ""
26
-
27
- # Test 1: Python version
28
- print_info "Test 1: Python version..."
29
- PYTHON_VERSION=$(python --version 2>&1 | awk '{print $2}')
30
- PYTHON_MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1)
31
- PYTHON_MINOR=$(echo $PYTHON_VERSION | cut -d. -f2)
32
-
33
- if [ "$PYTHON_MAJOR" -ge 3 ] && [ "$PYTHON_MINOR" -ge 12 ]; then
34
- print_success "Python $PYTHON_VERSION (>= 3.12)"
35
- else
36
- print_warning "Python $PYTHON_VERSION (recommended: >= 3.12)"
37
- WARNINGS=$((WARNINGS + 1))
38
- fi
39
- echo ""
40
-
41
- # Test 2: Required files
42
- print_info "Test 2: Required files..."
43
- REQUIRED_FILES=(
44
- "app.py"
45
- "app_api.py"
46
- "app_ui.py"
47
- "requirements.txt"
48
- "README.md"
49
- ".gitattributes"
50
- )
51
-
52
- for file in "${REQUIRED_FILES[@]}"; do
53
- if [ -f "$file" ]; then
54
- print_success "$file exists"
55
- else
56
- print_error "$file NOT FOUND"
57
- FAILURES=$((FAILURES + 1))
58
- fi
59
- done
60
- echo ""
61
-
62
- # Test 3: Required directories
63
- print_info "Test 3: Required directories..."
64
- REQUIRED_DIRS=(
65
- "detection"
66
- "api"
67
- "ui"
68
- "rfdetr"
69
- )
70
-
71
- for dir in "${REQUIRED_DIRS[@]}"; do
72
- if [ -d "$dir" ]; then
73
- print_success "$dir/ exists"
74
- else
75
- print_error "$dir/ NOT FOUND"
76
- FAILURES=$((FAILURES + 1))
77
- fi
78
- done
79
- echo ""
80
-
81
- # Test 4: model.pth
82
- print_info "Test 4: Model weights (model.pth)..."
83
- if [ -f "model.pth" ]; then
84
- SIZE=$(du -h model.pth | cut -f1)
85
- SIZE_BYTES=$(stat -f%z model.pth 2>/dev/null || stat -c%s model.pth)
86
-
87
- if [ $SIZE_BYTES -gt 100000000 ]; then # > 100MB
88
- print_success "model.pth exists ($SIZE)"
89
-
90
- # Check Git LFS
91
- if git lfs ls-files | grep -q "model.pth"; then
92
- print_success "model.pth tracked by Git LFS"
93
- else
94
- print_warning "model.pth NOT tracked by Git LFS (will fail on push)"
95
- WARNINGS=$((WARNINGS + 1))
96
- fi
97
- else
98
- print_warning "model.pth size: $SIZE (seems small, verify it's correct)"
99
- WARNINGS=$((WARNINGS + 1))
100
- fi
101
- else
102
- print_error "model.pth NOT FOUND"
103
- FAILURES=$((FAILURES + 1))
104
- fi
105
- echo ""
106
-
107
- # Test 5: Git LFS
108
- print_info "Test 5: Git LFS configuration..."
109
- if command -v git-lfs &> /dev/null; then
110
- print_success "Git LFS installed"
111
-
112
- if git lfs env &> /dev/null; then
113
- print_success "Git LFS initialized"
114
- else
115
- print_warning "Git LFS not initialized"
116
- WARNINGS=$((WARNINGS + 1))
117
- fi
118
-
119
- if grep -q "*.pth.*lfs" .gitattributes 2>/dev/null; then
120
- print_success ".gitattributes tracks *.pth"
121
- else
122
- print_error ".gitattributes doesn't track *.pth"
123
- FAILURES=$((FAILURES + 1))
124
- fi
125
- else
126
- print_error "Git LFS not installed!"
127
- print_info " Install: brew install git-lfs (macOS) or sudo apt install git-lfs (Linux)"
128
- FAILURES=$((FAILURES + 1))
129
- fi
130
- echo ""
131
-
132
- # Test 6: README.md frontmatter
133
- print_info "Test 6: README.md frontmatter (HF Spaces metadata)..."
134
- if [ -f "README.md" ]; then
135
- if head -n 1 README.md | grep -q "^---$"; then
136
- print_success "README.md has YAML frontmatter"
137
-
138
- # Check key fields
139
- if grep -q "^sdk: gradio" README.md; then
140
- print_success "sdk: gradio found"
141
- else
142
- print_warning "sdk: gradio not found"
143
- WARNINGS=$((WARNINGS + 1))
144
- fi
145
-
146
- if grep -q "^app_file: app.py" README.md; then
147
- print_success "app_file: app.py found"
148
- else
149
- print_warning "app_file: app.py not found"
150
- WARNINGS=$((WARNINGS + 1))
151
- fi
152
-
153
- if grep -q "^python_version:" README.md; then
154
- print_success "python_version specified"
155
- else
156
- print_warning "python_version not specified"
157
- WARNINGS=$((WARNINGS + 1))
158
- fi
159
- else
160
- print_error "README.md missing YAML frontmatter"
161
- FAILURES=$((FAILURES + 1))
162
- fi
163
- else
164
- print_error "README.md not found"
165
- FAILURES=$((FAILURES + 1))
166
- fi
167
- echo ""
168
-
169
- # Test 7: requirements.txt
170
- print_info "Test 7: requirements.txt..."
171
- if [ -f "requirements.txt" ]; then
172
- if [ -s "requirements.txt" ]; then
173
- LINE_COUNT=$(wc -l < requirements.txt)
174
- if [ $LINE_COUNT -gt 5 ]; then
175
- print_success "requirements.txt looks complete ($LINE_COUNT lines)"
176
- else
177
- print_warning "requirements.txt seems minimal ($LINE_COUNT lines)"
178
- WARNINGS=$((WARNINGS + 1))
179
- fi
180
-
181
- # Check for critical dependencies
182
- if grep -q "gradio" requirements.txt; then
183
- print_success "gradio found in requirements.txt"
184
- else
185
- print_error "gradio NOT found in requirements.txt"
186
- FAILURES=$((FAILURES + 1))
187
- fi
188
-
189
- if grep -q "torch" requirements.txt; then
190
- print_success "torch found in requirements.txt"
191
- else
192
- print_warning "torch not found (may be needed)"
193
- WARNINGS=$((WARNINGS + 1))
194
- fi
195
- else
196
- print_error "requirements.txt is empty"
197
- FAILURES=$((FAILURES + 1))
198
- fi
199
- else
200
- print_error "requirements.txt not found"
201
- FAILURES=$((FAILURES + 1))
202
- fi
203
- echo ""
204
-
205
- # Test 8: Python syntax
206
- print_info "Test 8: Python syntax validation..."
207
- for pyfile in app.py app_api.py app_ui.py; do
208
- if [ -f "$pyfile" ]; then
209
- if python -m py_compile "$pyfile" 2>/dev/null; then
210
- print_success "$pyfile syntax valid"
211
- else
212
- print_error "$pyfile has syntax errors"
213
- FAILURES=$((FAILURES + 1))
214
- fi
215
- fi
216
- done
217
- echo ""
218
-
219
- # Test 9: Git repository
220
- print_info "Test 9: Git repository..."
221
- if [ -d ".git" ]; then
222
- print_success "Git repository initialized"
223
-
224
- # Check remote
225
- if git remote -v | grep -q "huggingface.co"; then
226
- REMOTE_URL=$(git remote get-url origin 2>/dev/null || echo "unknown")
227
- print_success "HF Space remote configured: $REMOTE_URL"
228
- else
229
- print_warning "No Hugging Face remote configured"
230
- WARNINGS=$((WARNINGS + 1))
231
- fi
232
-
233
- # Check for uncommitted changes
234
- if [ -n "$(git status --porcelain)" ]; then
235
- print_warning "Uncommitted changes detected"
236
- WARNINGS=$((WARNINGS + 1))
237
- else
238
- print_success "All changes committed"
239
- fi
240
- else
241
- print_warning "Not a git repository (will need git init)"
242
- WARNINGS=$((WARNINGS + 1))
243
- fi
244
- echo ""
245
-
246
- # Test 10: Hugging Face CLI
247
- print_info "Test 10: Hugging Face CLI..."
248
- if command -v huggingface-cli &> /dev/null || command -v hf &> /dev/null; then
249
- print_success "Hugging Face CLI installed"
250
-
251
- # Check login
252
- if huggingface-cli whoami &> /dev/null 2>&1 || hf auth whoami &> /dev/null 2>&1; then
253
- USERNAME=$(huggingface-cli whoami 2>/dev/null || hf auth whoami 2>/dev/null | head -n1)
254
- print_success "Logged in as: $USERNAME"
255
- else
256
- print_warning "Not logged in to Hugging Face"
257
- print_info " Run: huggingface-cli login or hf login"
258
- WARNINGS=$((WARNINGS + 1))
259
- fi
260
- else
261
- print_warning "Hugging Face CLI not installed"
262
- print_info " Install: pip install huggingface-hub"
263
- WARNINGS=$((WARNINGS + 1))
264
- fi
265
- echo ""
266
-
267
- # Summary
268
- echo "================================================"
269
- if [ $FAILURES -eq 0 ] && [ $WARNINGS -eq 0 ]; then
270
- print_success "All checks passed! Ready to deploy! ✨"
271
- echo ""
272
- print_info "Next step: Run ./deploy_hf_space.sh"
273
- exit 0
274
- elif [ $FAILURES -eq 0 ]; then
275
- print_warning "$WARNINGS warning(s) found"
276
- echo ""
277
- print_info "You can proceed, but consider fixing warnings"
278
- print_info "Next step: Run ./deploy_hf_space.sh"
279
- exit 0
280
- else
281
- print_error "$FAILURES critical error(s) and $WARNINGS warning(s)"
282
- echo ""
283
- print_info "Please fix the errors before deploying"
284
- exit 1
285
- fi
286
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
deploy_hf_space.sh DELETED
@@ -1,210 +0,0 @@
1
- #!/bin/bash
2
- # Script de déploiement pour Hugging Face Spaces
3
- # Build et push le Space vers Hugging Face
4
-
5
- set -e
6
-
7
- # Colors
8
- RED='\033[0;31m'
9
- GREEN='\033[0;32m'
10
- YELLOW='\033[1;33m'
11
- BLUE='\033[0;34m'
12
- NC='\033[0m'
13
-
14
- print_info() { echo -e "${BLUE}ℹ️ $1${NC}"; }
15
- print_success() { echo -e "${GREEN}✅ $1${NC}"; }
16
- print_warning() { echo -e "${YELLOW}⚠️ $1${NC}"; }
17
- print_error() { echo -e "${RED}❌ $1${NC}"; }
18
-
19
- echo ""
20
- print_info "🚀 Deploying CU1-X to Hugging Face Spaces"
21
- echo "================================================"
22
- echo ""
23
-
24
- # Check if we're in a git repo
25
- if [ ! -d ".git" ]; then
26
- print_error "Not a git repository!"
27
- print_info "Initializing git repository..."
28
- git init
29
- print_success "Git repository initialized"
30
- fi
31
-
32
- # Check Git LFS
33
- print_info "Configuring Git LFS..."
34
- if ! command -v git-lfs &> /dev/null; then
35
- print_error "Git LFS not installed!"
36
- print_info "Install with: brew install git-lfs (macOS) or sudo apt install git-lfs (Linux)"
37
- exit 1
38
- fi
39
-
40
- git lfs install > /dev/null 2>&1 || true
41
-
42
- # Ensure model.pth is tracked
43
- if [ -f "model.pth" ]; then
44
- if ! git lfs ls-files | grep -q "model.pth"; then
45
- print_info "Adding model.pth to Git LFS..."
46
- git lfs track "*.pth"
47
- git add .gitattributes
48
- print_success "model.pth configured for Git LFS"
49
- else
50
- print_success "model.pth already tracked by Git LFS"
51
- fi
52
- else
53
- print_error "model.pth not found!"
54
- exit 1
55
- fi
56
-
57
- # Check HF remote
58
- print_info "Checking Hugging Face remote..."
59
- if git remote | grep -q "origin"; then
60
- REMOTE_URL=$(git remote get-url origin 2>/dev/null || echo "")
61
- if echo "$REMOTE_URL" | grep -q "huggingface.co"; then
62
- print_success "HF Space remote configured: $REMOTE_URL"
63
- SPACE_URL=$(echo "$REMOTE_URL" | sed -E 's|.*spaces/([^/]+)/([^/]+).*|\1/\2|')
64
- print_info "Space URL: https://huggingface.co/spaces/$SPACE_URL"
65
- else
66
- print_warning "Remote exists but doesn't look like HF Space"
67
- print_info "Current remote: $REMOTE_URL"
68
- fi
69
- else
70
- print_warning "No remote configured"
71
- print_info "You'll need to add a remote:"
72
- print_info " git remote add origin https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME"
73
- read -p "Do you want to configure it now? (y/n) " -n 1 -r
74
- echo
75
- if [[ $REPLY =~ ^[Yy]$ ]]; then
76
- read -p "Enter your HF username: " HF_USERNAME
77
- read -p "Enter your Space name: " SPACE_NAME
78
- git remote add origin "https://huggingface.co/spaces/$HF_USERNAME/$SPACE_NAME"
79
- print_success "Remote configured"
80
- SPACE_URL="$HF_USERNAME/$SPACE_NAME"
81
- else
82
- print_error "Cannot deploy without remote"
83
- exit 1
84
- fi
85
- fi
86
-
87
- # Check login
88
- print_info "Checking Hugging Face login..."
89
- if command -v hf &> /dev/null; then
90
- if hf auth whoami &> /dev/null 2>&1; then
91
- USERNAME=$(hf auth whoami 2>/dev/null | head -n1)
92
- print_success "Logged in as: $USERNAME"
93
- else
94
- print_warning "Not logged in"
95
- print_info "Logging in..."
96
- hf login
97
- fi
98
- elif command -v huggingface-cli &> /dev/null; then
99
- if huggingface-cli whoami &> /dev/null 2>&1; then
100
- USERNAME=$(huggingface-cli whoami 2>/dev/null | head -n1)
101
- print_success "Logged in as: $USERNAME"
102
- else
103
- print_warning "Not logged in"
104
- print_info "Logging in..."
105
- huggingface-cli login
106
- fi
107
- else
108
- print_error "Hugging Face CLI not found!"
109
- print_info "Install: pip install huggingface-hub"
110
- exit 1
111
- fi
112
-
113
- # Ensure requirements.txt is complete
114
- print_info "Checking requirements.txt..."
115
- if [ -f "requirements-full.txt" ] && [ -f "requirements.txt" ]; then
116
- FULL_LINES=$(wc -l < requirements-full.txt)
117
- CURRENT_LINES=$(wc -l < requirements.txt)
118
-
119
- if [ $CURRENT_LINES -lt $FULL_LINES ]; then
120
- print_warning "requirements.txt seems incomplete"
121
- read -p "Use requirements-full.txt? (y/n) " -n 1 -r
122
- echo
123
- if [[ $REPLY =~ ^[Yy]$ ]]; then
124
- cp requirements-full.txt requirements.txt
125
- print_success "Updated requirements.txt from requirements-full.txt"
126
- fi
127
- fi
128
- fi
129
-
130
- # Stage all files
131
- print_info "Staging files..."
132
- git add .
133
- print_success "Files staged"
134
-
135
- # Check if there are changes
136
- if [ -z "$(git status --porcelain)" ]; then
137
- print_warning "No changes to commit"
138
- print_info "Everything is already up to date"
139
- else
140
- # Show what will be committed
141
- print_info "Changes to commit:"
142
- git status --short
143
-
144
- # Commit
145
- print_info "Creating commit..."
146
- COMMIT_MSG="Deploy CU1-X to Hugging Face Spaces
147
-
148
- - Multi-model AI pipeline (RF-DETR, CLIP, OCR, BLIP)
149
- - Unified API architecture
150
- - Gradio web interface
151
- - Full model weights included via Git LFS
152
- - Ready for production deployment"
153
-
154
- git commit -m "$COMMIT_MSG" || {
155
- print_error "Commit failed"
156
- exit 1
157
- }
158
- print_success "Changes committed"
159
- fi
160
-
161
- # Push to Hugging Face
162
- print_info "Pushing to Hugging Face Spaces..."
163
- print_warning "This may take several minutes (model.pth is 510MB)..."
164
- echo ""
165
-
166
- BRANCH=$(git branch --show-current 2>/dev/null || echo "main")
167
-
168
- if git push -u origin "$BRANCH" 2>&1 | tee /tmp/hf_push.log; then
169
- print_success "Push completed successfully!"
170
- echo ""
171
- echo "================================================"
172
- print_success "🎉 Deployment Successful!"
173
- echo "================================================"
174
- echo ""
175
-
176
- if [ -n "$SPACE_URL" ]; then
177
- print_info "Your Space is deploying at:"
178
- echo " https://huggingface.co/spaces/$SPACE_URL"
179
- echo ""
180
- print_info "Build progress:"
181
- echo " https://huggingface.co/spaces/$SPACE_URL/logs"
182
- echo ""
183
- print_info "Once deployed, your app will be at:"
184
- echo " https://huggingface.co/spaces/$SPACE_URL"
185
- echo ""
186
- print_info "API endpoint:"
187
- echo " https://$SPACE_URL.hf.space/api/predict"
188
- echo ""
189
- fi
190
-
191
- print_warning "First build may take 5-10 minutes"
192
- print_info "HF Spaces will automatically:"
193
- print_info " - Install dependencies from requirements.txt"
194
- print_info " - Download models (CLIP, BLIP, EasyOCR)"
195
- print_info " - Start app.py"
196
- print_info " - Expose Gradio interface and API"
197
- echo ""
198
- print_success "All done! 🎉"
199
- else
200
- print_error "Push failed!"
201
- echo ""
202
- print_info "Common issues:"
203
- print_info "1. Authentication failed: Run 'hf login' or 'huggingface-cli login'"
204
- print_info "2. Git LFS error: Ensure Git LFS is installed and model.pth is tracked"
205
- print_info "3. Network error: Check your internet connection"
206
- echo ""
207
- print_info "Check the error above for details"
208
- exit 1
209
- fi
210
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
detection/response_builder.py CHANGED
@@ -210,3 +210,105 @@ def build_ocr_only_response(
210
 
211
  return response
212
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210
 
211
  return response
212
 
213
+
214
+ def build_simplified_response(
215
+ analysis: Dict,
216
+ image: Image.Image,
217
+ annotated_image: Optional[np.ndarray] = None,
218
+ confidence_threshold: float = 0.35,
219
+ line_thickness: int = 2,
220
+ enable_clip: bool = False,
221
+ enable_ocr: bool = True,
222
+ enable_blip: bool = False,
223
+ blip_scope: Optional[str] = None,
224
+ ocr_only: bool = False
225
+ ) -> Dict:
226
+ """
227
+ Build simplified detection response for API/UI with format:
228
+ {
229
+ "detections": {
230
+ "icon 0": {"type": "text", "bbox": [x1, y1, x2, y2], "interactivity": false, "content": "..."},
231
+ "icon 1": {"type": "icon", "bbox": [x1, y1, x2, y2], "interactivity": true, "content": "..."}
232
+ },
233
+ "annotated_image": {"mime": "image/png", "base64": "..."}
234
+ }
235
+
236
+ Args:
237
+ analysis: Detection analysis results from DetectionService or OCR handler
238
+ image: Original PIL Image
239
+ annotated_image: Optional annotated image (numpy array, RGB)
240
+ confidence_threshold: Confidence threshold used
241
+ enable_clip: Whether CLIP classification was enabled
242
+ enable_ocr: Whether OCR was enabled
243
+ enable_blip: Whether BLIP was enabled
244
+ blip_scope: BLIP scope ("icons" or "all")
245
+ ocr_only: Whether this was OCR-only mode
246
+
247
+ Returns:
248
+ Simplified response dictionary with detections dict and annotated_image
249
+ """
250
+ # Extract detections
251
+ detections = analysis.get("detections", [])
252
+ image_width = analysis.get("image_size", {}).get("width", image.width)
253
+ image_height = analysis.get("image_size", {}).get("height", image.height)
254
+
255
+ # Interactive element types (buttons, inputs, icons, navigation, list items)
256
+ interactive_types = {"button", "input", "icon", "navigation", "list_item"}
257
+
258
+ # Build simplified detections dict
259
+ simplified_detections = {}
260
+ for idx, det in enumerate(detections):
261
+ # Get bounding box and normalize to 0-1 coordinates
262
+ box = det.get("box", {})
263
+ x1 = box.get("x1", 0) / image_width
264
+ y1 = box.get("y1", 0) / image_height
265
+ x2 = box.get("x2", 0) / image_width
266
+ y2 = box.get("y2", 0) / image_height
267
+
268
+ # Get type from CLIP classification
269
+ element_type = det.get("class_name", "")
270
+ if not element_type:
271
+ # Fallback: if no CLIP classification, default to "text" if has text, else "icon"
272
+ element_type = "text" if det.get("text", "").strip() else "icon"
273
+
274
+ # Determine interactivity based on type
275
+ is_interactive = element_type in interactive_types
276
+
277
+ # Fuse text and description into content
278
+ text = det.get("text", "").strip()
279
+ description = det.get("description", "").strip()
280
+
281
+ # Content priority: text first, then description
282
+ if text:
283
+ content = text
284
+ elif description:
285
+ content = description
286
+ else:
287
+ content = ""
288
+
289
+ # Build simplified detection entry
290
+ simplified_detections[f"icon {idx}"] = {
291
+ "type": element_type,
292
+ "bbox": [round(x1, 4), round(y1, 4), round(x2, 4), round(y2, 4)],
293
+ "interactivity": is_interactive,
294
+ "content": content
295
+ }
296
+
297
+ # Build response
298
+ response = {
299
+ "detections": simplified_detections
300
+ }
301
+
302
+ # Add annotated image if provided
303
+ if annotated_image is not None:
304
+ img_bgr = cv2.cvtColor(annotated_image, cv2.COLOR_RGB2BGR)
305
+ ok, png_bytes = cv2.imencode(".png", img_bgr)
306
+ if ok:
307
+ annotated_b64 = base64.b64encode(png_bytes.tobytes()).decode("ascii")
308
+ response["annotated_image"] = {
309
+ "mime": "image/png",
310
+ "base64": annotated_b64
311
+ }
312
+
313
+ return response
314
+
docs/PREPROCESSING_GUIDE.md DELETED
@@ -1,466 +0,0 @@
1
- # 📷 Image Preprocessing Guide - Cross-Device Consistency
2
-
3
- ## Problem
4
-
5
- Screenshots from different devices (Samsung, Google Pixel, Oppo, Xiaomi, etc.) show variations that can affect detection:
6
-
7
- ### 🎨 Color Variations
8
-
9
- | Device | Color Profile | Impact |
10
- |----------|---------------|--------|
11
- | **Samsung** | "Vivid" mode (saturated) | Very bright colors, can affect CLIP |
12
- | **Google Pixel** | sRGB (neutral) | Accurate but less vibrant colors |
13
- | **Oppo/Xiaomi** | Varies by mode | Variable saturation |
14
-
15
- ### 📊 Other Variations
16
-
17
- 1. **Screen calibration**
18
- - Different color temperature
19
- - Different gamma (brightness)
20
- - Variable contrast
21
-
22
- 2. **Compression**
23
- - PNG vs JPEG
24
- - Compression level
25
- - Compression artifacts
26
-
27
- 3. **Impact on detection**
28
- - ❌ Variable confidence scores
29
- - ❌ Less precise OCR
30
- - ❌ CLIP may classify differently
31
-
32
- ---
33
-
34
- ## ✅ Solution: Automatic Preprocessing
35
-
36
- ### Preprocessing Pipeline
37
-
38
- ```
39
- Original Screenshot
40
-
41
- 1. Denoising (removes JPEG/PNG artifacts)
42
-
43
- 2. Color normalization (→ standard sRGB)
44
-
45
- 3. Brightness normalization
46
-
47
- 4. CLAHE (improves local contrast)
48
-
49
- 5. Optional: Sharpening (improves OCR)
50
-
51
- Standardized Screenshot
52
- ```
53
-
54
- ---
55
-
56
- ## 🚀 Usage
57
-
58
- ### Option 1: Via API
59
-
60
- ```bash
61
- curl -X POST "http://localhost:8000/detect" \
62
- -F "image=@samsung_screenshot.png" \
63
- -F "preprocess=true" \
64
- -F "preprocess_preset=standard"
65
- ```
66
-
67
- ### Option 2: Via Python
68
-
69
- ```python
70
- from detection.service import DetectionService
71
-
72
- service = DetectionService()
73
-
74
- # With preprocessing
75
- results = service.analyze(
76
- "samsung_screenshot.png",
77
- preprocess=True,
78
- preprocess_preset="standard"
79
- )
80
-
81
- print(f"Preprocessed: {results['preprocessed']}")
82
- print(f"Detections: {len(results['detections'])}")
83
- ```
84
-
85
- ### Option 3: Via Standalone Module
86
-
87
- ```python
88
- from detection.image_preprocessing import preprocess_screenshot
89
- from PIL import Image
90
-
91
- # Preprocess the image
92
- img_preprocessed = preprocess_screenshot(
93
- "oppo_screenshot.png",
94
- preset="standard"
95
- )
96
-
97
- # Use it with your pipeline
98
- results = detector.analyze(img_preprocessed)
99
- ```
100
-
101
- ---
102
-
103
- ## 🎛️ Available Presets
104
-
105
- ### 1. **standard** (Recommended)
106
-
107
- Balance between normalization and preserving the original image.
108
-
109
- ```python
110
- preprocess=True, preprocess_preset="standard"
111
- ```
112
-
113
- **Enables:**
114
- - ✅ Denoising (medium strength)
115
- - ✅ Color normalization
116
- - ✅ Brightness normalization
117
- - ✅ CLAHE (adaptive contrast)
118
- - ❌ Sharpening
119
-
120
- **Use for:**
121
- - General detection
122
- - Screenshots with variable quality
123
- - Cross-device consistency
124
-
125
- ---
126
-
127
- ### 2. **aggressive**
128
-
129
- Maximum normalization for very different screenshots.
130
-
131
- ```python
132
- preprocess=True, preprocess_preset="aggressive"
133
- ```
134
-
135
- **Enables:**
136
- - ✅ Denoising (high strength)
137
- - ✅ Color normalization
138
- - ✅ Brightness normalization
139
- - ✅ CLAHE (adaptive contrast)
140
- - ✅ Sharpening (improves sharpness)
141
-
142
- **Use for:**
143
- - Blurry screenshots
144
- - Major differences between devices
145
- - When "standard" is not enough
146
-
147
- ---
148
-
149
- ### 3. **minimal**
150
-
151
- Light preprocessing, preserves the original image.
152
-
153
- ```python
154
- preprocess=True, preprocess_preset="minimal"
155
- ```
156
-
157
- **Enables:**
158
- - ✅ Denoising (low strength)
159
- - ✅ Brightness normalization
160
- - ❌ Color normalization
161
- - ❌ CLAHE
162
- - ❌ Sharpening
163
-
164
- **Use for:**
165
- - Screenshots already high quality
166
- - When you want minimal changes
167
- - Tests and comparisons
168
-
169
- ---
170
-
171
- ### 4. **ocr_optimized**
172
-
173
- Optimized specifically for OCR text extraction.
174
-
175
- ```python
176
- preprocess=True, preprocess_preset="ocr_optimized"
177
- ```
178
-
179
- **Enables:**
180
- - ✅ Denoising
181
- - ✅ Color normalization
182
- - ✅ Brightness normalization
183
- - ✅ CLAHE (improves text contrast)
184
- - ✅ Sharpening (sharper text)
185
-
186
- **Use for:**
187
- - OCR as a priority
188
- - Blurry or small text
189
- - Improving OCR accuracy
190
-
191
- ---
192
-
193
- ## 📊 Preset Comparison
194
-
195
- | Preset | Denoising | Color Normalization | Brightness | CLAHE | Sharpening | Use case |
196
- |--------|-----------|---------------------|------------|-------|-----------|-------------|
197
- | **minimal** | ✅ Light | ❌ | ✅ | ❌ | ❌ | High-quality images |
198
- | **standard** | ✅ Medium | ✅ | ✅ | ✅ | ❌ | General use (recommended) |
199
- | **aggressive** | ✅ Strong | ✅ | ✅ | ✅ | ✅ | Significant differences |
200
- | **ocr_optimized** | ✅ Medium | ✅ | ✅ | ✅ | ✅ | OCR priority |
201
-
202
- ---
203
-
204
- ## 🔬 Practical Examples
205
-
206
- ### Example 1: Samsung vs Pixel comparison
207
-
208
- **Without preprocessing:**
209
- ```python
210
- # Samsung (saturated colors)
211
- samsung_results = detector.analyze("samsung.png", preprocess=False)
212
- print(samsung_results['detections'][0]['confidence']) # 0.72
213
-
214
- # Pixel (neutral colors)
215
- pixel_results = detector.analyze("pixel.png", preprocess=False)
216
- print(pixel_results['detections'][0]['confidence']) # 0.68
217
- ```
218
-
219
- **With preprocessing:**
220
- ```python
221
- # Samsung (normalized)
222
- samsung_results = detector.analyze("samsung.png", preprocess=True)
223
- print(samsung_results['detections'][0]['confidence']) # 0.74
224
-
225
- # Pixel (normalized)
226
- pixel_results = detector.analyze("pixel.png", preprocess=True)
227
- print(pixel_results['detections'][0]['confidence']) # 0.74
228
- ```
229
-
230
- **Result:** More consistent confidence scores! ✅
231
-
232
- ---
233
-
234
- ### Example 2: OCR improvement
235
-
236
- ```python
237
- # Without preprocessing
238
- results_before = detector.analyze(
239
- "oppo_blurry.png",
240
- extract_text=True,
241
- preprocess=False
242
- )
243
- print(results_before['detections'][0]['text']) # "L0gin" ❌
244
-
245
- # With OCR-optimized
246
- results_after = detector.analyze(
247
- "oppo_blurry.png",
248
- extract_text=True,
249
- preprocess=True,
250
- preprocess_preset="ocr_optimized"
251
- )
252
- print(results_after['detections'][0]['text']) # "Login" ✅
253
- ```
254
-
255
- ---
256
-
257
- ### Example 3: Batch processing
258
-
259
- ```python
260
- from detection.image_preprocessing import preprocess_screenshot
261
- from pathlib import Path
262
-
263
- screenshots = Path("screenshots").glob("*.png")
264
-
265
- for screenshot in screenshots:
266
- # Preprocess
267
- img = preprocess_screenshot(screenshot, preset="standard")
268
-
269
- # Detect
270
- results = detector.analyze(
271
- img,
272
- confidence_threshold=0.35,
273
- use_clip=True,
274
- preprocess=False # Already preprocessed
275
- )
276
-
277
- print(f"{screenshot.name}: {len(results['detections'])} detections")
278
- ```
279
-
280
- ---
281
-
282
- ## ⚙️ Advanced Configuration
283
-
284
- ### Create a custom preset
285
-
286
- ```python
287
- from detection.image_preprocessing import ImagePreprocessor
288
-
289
- # Create your own preset
290
- custom_preprocessor = ImagePreprocessor(
291
- target_colorspace="srgb",
292
- normalize_contrast=True,
293
- normalize_brightness=True,
294
- denoise=True,
295
- enhance_sharpness=False,
296
- clahe_enabled=True,
297
- target_size=(1080, 1920) # Optional: resize
298
- )
299
-
300
- # Use it
301
- img_preprocessed = custom_preprocessor.preprocess("image.png")
302
- ```
303
-
304
- ---
305
-
306
- ## 📈 Performance Impact
307
-
308
- ### Processing time
309
-
310
- | Preset | Additional Time | Impact |
311
- |--------|-----------------|--------|
312
- | **minimal** | ~50-100ms | Negligible |
313
- | **standard** | ~100-200ms | Acceptable |
314
- | **aggressive** | ~200-400ms | Moderate |
315
- | **ocr_optimized** | ~150-300ms | Acceptable |
316
-
317
- **Note:** Total detection time is 30-60 seconds, so preprocessing overhead is negligible (<1% of total time).
318
-
319
- ### Accuracy
320
-
321
- | Metric | Without Preprocessing | With Standard | Improvement |
322
- |----------|-------------------|---------------|--------------|
323
- | **Cross-device consistency** | 65% | 92% | +27% |
324
- | **OCR accuracy** | 82% | 94% | +12% |
325
- | **Detection confidence** | Variable (±15%) | Stable (±3%) | +400% |
326
-
327
- ---
328
-
329
- ## 🎯 Recommendations
330
-
331
- ### When should you enable preprocessing?
332
-
333
- ✅ **ALWAYS enable it** if:
334
- - You test on multiple devices
335
- - Your screenshots come from different sources
336
- - You want consistent results
337
- - OCR is a priority
338
-
339
- ⚠️ **Optional** if:
340
- - All your screenshots come from the same device
341
- - You already standardized your captures
342
- - Processing time is critical
343
-
344
- ❌ **Not necessary** if:
345
- - You use synthetic images
346
- - You are testing the RF-DETR model itself
347
- - You need the exact original image
348
-
349
- ---
350
-
351
- ### Which preset should you choose?
352
-
353
- ```
354
- 📱 Production screenshots → standard
355
- 🔬 Cross-device tests → standard or aggressive
356
- 📝 OCR priority → ocr_optimized
357
- ⚡ Critical performance → minimal
358
- 🔧 Experimentation → aggressive (understand the limits)
359
- ```
360
-
361
- ---
362
-
363
- ## 🐛 Troubleshooting
364
-
365
- ### Preprocessing changes the image too much
366
-
367
- → Use `preset="minimal"`
368
-
369
- ### OCR is still inaccurate
370
-
371
- → Use `preset="ocr_optimized"` and check the quality of the source image
372
-
373
- ### Results still vary a lot
374
-
375
- → Use `preset="aggressive"` and check for resolution differences
376
-
377
- ### Preprocessing is too slow
378
-
379
- → Preprocessing is already optimized. If it's critical, use `preset="minimal"` or disable it.
380
-
381
- ---
382
-
383
- ## 📚 Technical References
384
-
385
- ### Algorithms Used
386
-
387
- 1. **Denoising**: `cv2.fastNlMeansDenoisingColored`
388
- - Removes JPEG/PNG artifacts
389
- - Preserves important edges
390
-
391
- 2. **Color normalization**: LAB conversion + normalization
392
- - Perceptually uniform color space
393
- - Reduces the impact of color profiles
394
-
395
- 3. **CLAHE**: `cv2.createCLAHE`
396
- - Improves local contrast
397
- - Preserves overall appearance
398
-
399
- 4. **Sharpening**: Unsharp Mask
400
- - Improves sharpness
401
- - Useful for OCR
402
-
403
- ---
404
-
405
- ## 💡 Practical Tips
406
-
407
- ### 1. Test without preprocessing first
408
-
409
- ```python
410
- # Test without preprocessing
411
- results_before = detector.analyze(image, preprocess=False)
412
-
413
- # Test with preprocessing
414
- results_after = detector.analyze(image, preprocess=True, preprocess_preset="standard")
415
-
416
- # Compare
417
- print(f"Before: {len(results_before['detections'])} detections")
418
- print(f"After: {len(results_after['detections'])} detections")
419
- ```
420
-
421
- ### 2. Save preprocessed images
422
-
423
- ```python
424
- from PIL import Image
425
- from detection.image_preprocessing import preprocess_screenshot
426
-
427
- # Preprocess and save
428
- img_preprocessed = preprocess_screenshot("original.png", preset="standard")
429
- Image.fromarray(img_preprocessed).save("preprocessed.png")
430
- ```
431
-
432
- ### 3. Batch testing
433
-
434
- ```bash
435
- # Script to test every preset
436
- for preset in minimal standard aggressive ocr_optimized; do
437
- curl -X POST "http://localhost:8000/detect" \
438
- -F "image=@test.png" \
439
- -F "preprocess=true" \
440
- -F "preprocess_preset=$preset" \
441
- > results_$preset.json
442
- done
443
- ```
444
-
445
- ---
446
-
447
- ## ✅ Summary
448
-
449
- Image preprocessing is **highly recommended** for:
450
- - ✅ Cross-device consistency
451
- - ✅ Improved OCR
452
- - ✅ Stable results
453
- - ✅ Negligible overhead (<1% of total time)
454
-
455
- **Recommended preset:** `standard` (good balance)
456
-
457
- **Enable it:**
458
- ```python
459
- results = detector.analyze(
460
- image,
461
- preprocess=True, # ← Turn me on!
462
- preprocess_preset="standard"
463
- )
464
- ```
465
-
466
- Now your results will be consistent whether you test on Samsung, Pixel, Oppo, or any other device! 🎉
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/START.md DELETED
@@ -1,314 +0,0 @@
1
- # 🚀 Quick Start Guide
2
-
3
- ## Unified Architecture API
4
-
5
- The project now uses a **unified architecture** where every interface goes through the REST API.
6
-
7
- ```
8
- ┌─────────────────────────────────────────────┐
9
- │ │
10
- │ Gradio UI (app.py / app_ui.py) │
11
- │ │
12
- └──────────────────┬──────────────────────────┘
13
-
14
- │ HTTP/REST
15
-
16
- ┌──────────────────▼──────────────────────────┐
17
- │ │
18
- │ FastAPI Server (app_api.py) │
19
- │ │
20
- ├─────────────────────────────────────────────┤
21
- │ Detection Service │
22
- │ ├─ RF-DETR (detection) │
23
- │ ├─ CLIP (classification) │
24
- │ ├─ OCR (text extraction) │
25
- │ └─ BLIP (visual description) │
26
- └─────────────────────────────────────────────┘
27
- ```
28
-
29
- ---
30
-
31
- ## 🎯 3 Ways to Launch
32
-
33
- ### Option 1: Automatic Launch (Recommended for tests)
34
-
35
- **One command starts everything:**
36
-
37
- ```bash
38
- python app.py
39
- ```
40
-
41
- **What happens:**
42
- 1. ✅ Starts the API in the background (port 8000)
43
- 2. ✅ Waits until the API is ready
44
- 3. ✅ Launches the Gradio interface (port 7860)
45
- 4. ✅ Handles clean shutdown with Ctrl+C
46
-
47
- **Access:**
48
- - Gradio Interface: http://localhost:7860
49
- - API Docs: http://localhost:8000/docs
50
-
51
- ---
52
-
53
- ### Option 2: Manual Launch (2 terminals)
54
-
55
- **For more control and debugging:**
56
-
57
- **Terminal 1 - API Server:**
58
- ```bash
59
- python app_api.py
60
- ```
61
-
62
- **Terminal 2 - Gradio UI:**
63
- ```bash
64
- python app_ui.py
65
- ```
66
-
67
- **Access:**
68
- - Gradio Interface: http://localhost:7860
69
- - API Docs: http://localhost:8000/docs
70
-
71
- ---
72
-
73
- ### Option 3: API Only
74
-
75
- **To use only the API (integration, scripts, etc.):**
76
-
77
- ```bash
78
- python app_api.py
79
- ```
80
-
81
- **Test the API:**
82
- ```bash
83
- # Health check
84
- curl http://localhost:8000/health
85
-
86
- # Detect elements
87
- curl -X POST "http://localhost:8000/detect" \
88
- -F "image=@screenshot.png" \
89
- -F "confidence_threshold=0.35" \
90
- -F "enable_clip=true" \
91
- -F "enable_ocr=true"
92
- ```
93
-
94
- **Interactive documentation:**
95
- - OpenAPI Docs: http://localhost:8000/docs
96
- - ReDoc: http://localhost:8000/redoc
97
-
98
- ---
99
-
100
- ## 🔧 Configuration
101
-
102
- ### Environment Variables
103
-
104
- **API Server:**
105
- ```bash
106
- export UVICORN_HOST="0.0.0.0" # Default: 0.0.0.0
107
- export UVICORN_PORT="8000" # Default: 8000
108
- ```
109
-
110
- **Gradio UI:**
111
- ```bash
112
- export GRADIO_SERVER_NAME="0.0.0.0" # Default: 0.0.0.0
113
- export GRADIO_SERVER_PORT="7860" # Default: 7860
114
- export CU1_API_URL="http://localhost:8000" # API URL
115
- ```
116
-
117
- **Example with custom ports:**
118
- ```bash
119
- # API on port 9000, UI on port 9001
120
- export UVICORN_PORT="9000"
121
- export GRADIO_SERVER_PORT="9001"
122
- export CU1_API_URL="http://localhost:9000"
123
-
124
- python app.py
125
- ```
126
-
127
- ---
128
-
129
- ## 🧪 Quick Tests
130
-
131
- ### Test 1: Make sure the API works
132
-
133
- ```bash
134
- # In one terminal
135
- python app_api.py
136
-
137
- # In another terminal
138
- curl http://localhost:8000/health
139
- ```
140
-
141
- **Expected result:**
142
- ```json
143
- {
144
- "status": "healthy",
145
- "cuda_available": false,
146
- "device": "cpu"
147
- }
148
- ```
149
-
150
- ---
151
-
152
- ### Test 2: Test detection via the interface
153
-
154
- ```bash
155
- python app.py
156
- ```
157
-
158
- 1. Open http://localhost:7860
159
- 2. Upload an image
160
- 3. Click "🔍 Detect Elements"
161
- 4. Check the results
162
-
163
- ---
164
-
165
- ### Test 3: Test detection through the API
166
-
167
- ```bash
168
- # Start the API
169
- python app_api.py
170
-
171
- # In another terminal, test with curl
172
- curl -X POST "http://localhost:8000/detect" \
173
- -F "image=@votre_image.png" \
174
- -F "confidence_threshold=0.35" \
175
- -F "enable_ocr=true" \
176
- | jq .
177
- ```
178
-
179
- ---
180
-
181
- ## 🐛 Troubleshooting
182
-
183
- ### Issue: "Connection Error - Cannot connect to API"
184
-
185
- **Solution:**
186
- 1. Make sure the API is running: `curl http://localhost:8000/health`
187
- 2. Check the ports: no conflict with other apps
188
- 3. Check the API logs for errors
189
-
190
- ### Issue: "Port already in use"
191
-
192
- **Solution:**
193
- ```bash
194
- # Find the process that uses the port
195
- lsof -i :8000 # or :7860
196
-
197
- # Kill the process
198
- kill -9 <PID>
199
-
200
- # Or use a different port
201
- export UVICORN_PORT="9000"
202
- export GRADIO_SERVER_PORT="9001"
203
- ```
204
-
205
- ### Issue: "Module not found"
206
-
207
- **Solution:**
208
- ```bash
209
- # Reinstall dependencies
210
- pip install -r requirements.txt
211
- ```
212
-
213
- ### Issue: Models slow to load
214
-
215
- **Reason:** The first startup downloads the models
216
-
217
- **Solution:** Be patient, the models are cached after the first download
218
- - RF-DETR model (~few MB)
219
- - CLIP model (~600 MB)
220
- - BLIP model (~1 GB)
221
- - EasyOCR models (~100 MB)
222
-
223
- ---
224
-
225
- ## 📊 Monitoring
226
-
227
- ### API logs
228
-
229
- The logs appear in the terminal where you launched `app_api.py`
230
-
231
- ### UI logs
232
-
233
- The logs appear in the terminal where you launched `app.py` or `app_ui.py`
234
-
235
- ### Metrics
236
-
237
- Visit http://localhost:8000/docs to view the API statistics
238
-
239
- ---
240
-
241
- ## ✅ Benefits of the Unified Architecture
242
-
243
- 1. **Single code path** → Easier to maintain
244
- 2. **Consistent behavior** → Same results everywhere
245
- 3. **Easy to test** → Only one API to test
246
- 4. **Scalable** → Can separate API and UI on different servers
247
- 5. **Simplified debugging** → Logs centralized in the API
248
-
249
- ---
250
-
251
- ## 🎯 For Developers
252
-
253
- ### Code Architecture
254
-
255
- ```
256
- .
257
- ├── app.py # ✨ Unified launcher (API + UI)
258
- ├── app_api.py # FastAPI server
259
- ├── app_ui.py # Gradio UI client (manual)
260
-
261
- ├── api/
262
- │ └── endpoints.py # FastAPI endpoints
263
-
264
- ├── detection/
265
- │ ├── service.py # Detection service
266
- │ ├── service_factory.py # Singleton pattern
267
- │ ├── image_utils.py # Image utilities
268
- │ ├── ocr_handler.py # OCR-only processing
269
- │ └── response_builder.py # Response formatting
270
-
271
- └── ui/
272
- ├── detection_wrapper.py # Detection wrappers
273
- ├── gradio_interface.py # Gradio interface (API client)
274
- └── shared_interface.py # Shared UI components
275
- ```
276
-
277
- ### Request Flow
278
-
279
- ```
280
- 1. User uploads image in Gradio
281
-
282
- 2. `detect_with_api()` sends an HTTP POST to `/detect`
283
-
284
- 3. API endpoint validates the request
285
-
286
- 4. `DetectionService.analyze()` processes the image
287
-
288
- 5. Response formatted with `response_builder`
289
-
290
- 6. JSON returned to Gradio UI
291
-
292
- 7. UI displays annotated image + results
293
- ```
294
-
295
- ---
296
-
297
- ## 📝 Notes
298
-
299
- - **Thread Safety:** The service uses a singleton but passes parameters directly to `analyze()` to avoid race conditions
300
- - **Performance:** The first call is slow (model loading), then fast
301
- - **Memory:** Models use ~2-3 GB of RAM
302
- - **GPU:** Automatic CUDA/MPS detection if available
303
-
304
- ---
305
-
306
- ## 🚀 Next Steps
307
-
308
- 1. **Test locally:** `python app.py`
309
- 2. **Explore the API:** http://localhost:8000/docs
310
- 3. **Customize:** Adjust parameters in the interface
311
- 4. **Deploy:** See `DEPLOYMENT.md` for production
312
-
313
- Happy testing! 🎉
314
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/UNIFIED_ARCHITECTURE.md DELETED
@@ -1,443 +0,0 @@
1
- # 🎯 Unified Architecture - Technical Documentation
2
-
3
- ## Date
4
- 2025-11-10
5
-
6
- ## Objective
7
- Unify the architecture so that **all interfaces** go through the REST API, removing the duality between "HF Spaces" mode and "Production" mode.
8
-
9
- ---
10
-
11
- ## ✅ What Changed
12
-
13
- ### BEFORE (Dual Architecture)
14
-
15
- ```
16
- ┌─────────────────────────────────────────────────┐
17
- │ Mode 1: HF Spaces (app.py) │
18
- │ └─> DIRECT access to DetectionService │
19
- │ (no API) │
20
- └─────────────────────────────────────────────────┘
21
-
22
- ┌─────────────────────────────────────────────────┐
23
- │ Mode 2: Production (app_ui.py) │
24
- │ └─> Access via HTTP API │
25
- │ (microservices architecture) │
26
- └─────────────────────────────────────────────────┘
27
- ```
28
-
29
- **Problems:**
30
- - ❌ Two different code paths
31
- - ❌ Potentially different behaviors
32
- - ❌ Complex maintenance (two modes to test)
33
- - ❌ Bugs possible in one mode but not the other
34
-
35
- ---
36
-
37
- ### AFTER (Unified Architecture)
38
-
39
- ```
40
- ┌─────────────────────────────────────────────────┐
41
- │ │
42
- │ ALL INTERFACES │
43
- │ (app.py, app_ui.py, etc.) │
44
- │ │
45
- └────────────────────┬────────────────────────────┘
46
-
47
- │ HTTP/REST
48
- │ (detect_with_api)
49
-
50
- ┌────────────────────▼────────────────────────────┐
51
- │ │
52
- │ FastAPI Server │
53
- │ (api/endpoints.py) │
54
- │ │
55
- ├─────────────────────────────────────────────────┤
56
- │ Detection Service │
57
- │ (detection/service.py) │
58
- │ │
59
- └─────────────────────────────────────────────────┘
60
- ```
61
-
62
- **Benefits:**
63
- - ✅ One single code path
64
- - ✅ Consistent behavior everywhere
65
- - ✅ Simplified maintenance
66
- - ✅ Unified tests
67
- - ✅ Easier debugging
68
-
69
- ---
70
-
71
- ## 📝 File Changes
72
-
73
- ### 1. `app.py` - Major Transformation
74
-
75
- **BEFORE:**
76
- ```python
77
- from ui.detection_wrapper import detect_with_service
78
-
79
- demo = create_interface(
80
- detection_fn=detect_with_service, # Direct access
81
- title_suffix="Hugging Face Spaces Mode",
82
- show_api_info=False
83
- )
84
- ```
85
-
86
- **AFTER:**
87
- ```python
88
- from ui.detection_wrapper import detect_with_api
89
-
90
- # Launch the API as a subprocess
91
- api_process = start_api_server()
92
-
93
- # UI uses the API
94
- detection_fn = partial(detect_with_api, api_url=API_URL)
95
-
96
- demo = create_interface(
97
- detection_fn=detection_fn, # Via API
98
- title_suffix="Unified API Mode",
99
- show_api_info=True,
100
- api_url=API_URL
101
- )
102
- ```
103
-
104
- **New features:**
105
- - 🚀 Automatically starts the API in the background
106
- - ⏳ Waits until the API is ready (health check)
107
- - 🛑 Handles clean shutdown (Ctrl+C)
108
- - 📡 Displays access URLs
109
-
110
- ---
111
-
112
- ### 2. `app_api.py` - Dynamic Configuration
113
-
114
- **Additions:**
115
- ```python
116
- # Support environment variables
117
- host = os.getenv("UVICORN_HOST", "0.0.0.0")
118
- port = int(os.getenv("UVICORN_PORT", "8000"))
119
- ```
120
-
121
- **Allows:**
122
- - Port configuration through environment variables
123
- - Usage by the subprocess in app.py
124
-
125
- ---
126
-
127
- ### 3. Documentation
128
-
129
- **New files:**
130
- - ✨ `START.md` - Complete quick start guide
131
- - ✨ `UNIFIED_ARCHITECTURE.md` - This document
132
- - ✨ `test_unified_architecture.py` - Validation tests
133
-
134
- **Updated files:**
135
- - 📝 `README.md` - Updated Quick Start section
136
- - 📝 `README.md` - Updated HF Spaces section
137
-
138
- ---
139
-
140
- ## 🚀 How to Use
141
-
142
- ### Mode 1: Automatic Launch (Recommended)
143
-
144
- **One command:**
145
- ```bash
146
- python app.py
147
- ```
148
-
149
- **What happens:**
150
- 1. Starts the API as a subprocess (port 8000)
151
- 2. Waits for the health check
152
- 3. Launches the Gradio UI (port 7860)
153
- 4. Both communicate via HTTP
154
-
155
- **Clean shutdown:**
156
- - Ctrl+C stops the UI AND the API automatically
157
-
158
- ---
159
-
160
- ### Mode 2: Manual Launch (Debug)
161
-
162
- **Two terminals:**
163
- ```bash
164
- # Terminal 1
165
- python app_api.py
166
-
167
- # Terminal 2
168
- python app_ui.py
169
- ```
170
-
171
- **Useful for:**
172
- - Viewing logs separately
173
- - Restarting the UI without restarting the API
174
- - Advanced debugging
175
-
176
- ---
177
-
178
- ### Mode 3: API Only
179
-
180
- ```bash
181
- python app_api.py
182
- ```
183
-
184
- **Good for:**
185
- - External integrations
186
- - Python scripts
187
- - API tests
188
-
189
- ---
190
-
191
- ## 🧪 Tests and Validation
192
-
193
- ### Automated Test Script
194
-
195
- ```bash
196
- python test_unified_architecture.py
197
- ```
198
-
199
- **Checks:**
200
- - ✅ All required files exist
201
- - ✅ Valid Python syntax
202
- - ✅ `app.py` uses `detect_with_api`
203
- - ✅ No direct service access from the UI
204
- - ✅ Consistent architecture
205
-
206
- ### Test Results
207
-
208
- ```
209
- ✅✅✅ ALL TESTS PASS!
210
-
211
- 📊 Unified architecture summary:
212
- - ✅ `app.py` launches the API as a subprocess
213
- - ✅ All interfaces use `detect_with_api`
214
- - ✅ Consistent architecture everywhere
215
- - ✅ No direct service access from the UI
216
- ```
217
-
218
- ---
219
-
220
- ## 🔄 Unified Request Flow
221
-
222
- ### Before (Dual Mode)
223
-
224
- **HF Spaces Mode:**
225
- ```
226
- User → Gradio → detect_with_service() → DetectionService.analyze()
227
- ```
228
-
229
- **Production Mode:**
230
- ```
231
- User → Gradio → detect_with_api() → HTTP → API → DetectionService.analyze()
232
- ```
233
-
234
- ### After (Unified Mode)
235
-
236
- **All modes:**
237
- ```
238
- User → Gradio → detect_with_api() → HTTP → API → DetectionService.analyze()
239
- ```
240
-
241
- ---
242
-
243
- ## 📊 Technical Benefits
244
-
245
- ### 1. Maintainability
246
-
247
- **BEFORE:**
248
- - 2 code paths to maintain
249
- - Tests to run for each mode
250
- - Regression risk in one mode
251
-
252
- **AFTER:**
253
- - Only 1 code path
254
- - Unified tests
255
- - Guaranteed identical behavior
256
-
257
- ---
258
-
259
- ### 2. Debugging
260
-
261
- **BEFORE:**
262
- - Bug in `app.py`? Check `detect_with_service`
263
- - Bug in `app_ui.py`? Check `detect_with_api`
264
- - Different per mode
265
-
266
- **AFTER:**
267
- - All bugs go through the API
268
- - Logs centralized in the API
269
- - A single place to debug
270
-
271
- ---
272
-
273
- ### 3. Scalability
274
-
275
- **BEFORE:**
276
- - HF Spaces mode: monolithic
277
- - Production mode: scalable
278
- - Different behaviors
279
-
280
- **AFTER:**
281
- - Same architecture everywhere
282
- - Can easily separate API/UI on different servers
283
- - Load balancing possible
284
-
285
- ---
286
-
287
- ### 4. Testing
288
-
289
- **BEFORE:**
290
- ```bash
291
- # Test HF Spaces
292
- pytest test_app.py
293
-
294
- # Test Production
295
- pytest test_api.py
296
- pytest test_ui.py
297
- ```
298
-
299
- **AFTER:**
300
- ```bash
301
- # Single test suite
302
- pytest test_api.py # Tests the entire logic
303
- ```
304
-
305
- ---
306
-
307
- ## 🔧 Configuration
308
-
309
- ### Environment Variables
310
-
311
- ```bash
312
- # API Server
313
- export UVICORN_HOST="0.0.0.0"
314
- export UVICORN_PORT="8000"
315
-
316
- # Gradio UI
317
- export GRADIO_SERVER_NAME="0.0.0.0"
318
- export GRADIO_SERVER_PORT="7860"
319
- export CU1_API_URL="http://localhost:8000"
320
- ```
321
-
322
- ### Example: Custom Ports
323
-
324
- ```bash
325
- # API on port 9000, UI on port 9001
326
- export UVICORN_PORT="9000"
327
- export GRADIO_SERVER_PORT="9001"
328
- export CU1_API_URL="http://localhost:9000"
329
-
330
- python app.py
331
- ```
332
-
333
- ---
334
-
335
- ## 🎯 Impact on Existing Code
336
-
337
- ### No Breaking Changes
338
-
339
- - ✅ `app_api.py` still works on its own
340
- - ✅ `app_ui.py` still works on its own
341
- - ✅ Python APIs (`DetectionService`) are unchanged
342
- - ✅ Existing scripts keep working
343
-
344
- ### What’s New
345
-
346
- - ✨ `app.py` now launches the API automatically
347
- - ✨ Consistent architecture everywhere
348
- - ✨ Better documentation
349
-
350
- ---
351
-
352
- ## 📈 Metrics
353
-
354
- | Metric | Before | After | Improvement |
355
- |----------|-------|-------|--------------|
356
- | **Code paths** | 2 | 1 | -50% |
357
- | **Testing complexity** | High | Low | -60% |
358
- | **Bug risk** | Medium | Low | -70% |
359
- | **Debugging ease** | Medium | High | +80% |
360
-
361
- ---
362
-
363
- ## 🚨 Points to Watch
364
-
365
- ### 1. Performance
366
-
367
- **Impact:** Negligible (~10-50ms of extra HTTP latency)
368
-
369
- **Why it’s OK:**
370
- - Models take 30-60 seconds
371
- - 50ms HTTP latency = 0.1% of total time
372
- - Negligible compared to processing
373
-
374
- ---
375
-
376
- ### 2. Memory
377
-
378
- **Before (HF Spaces mode):** 1 process
379
- **After:** 2 processes (API + UI)
380
-
381
- **Impact:** +100-200 MB (Gradio UI overhead)
382
-
383
- **Why it’s OK:**
384
- - Models already use 2-3 GB
385
- - +200 MB = 7% overhead
386
- - Acceptable for architectural consistency
387
-
388
- ---
389
-
390
- ### 3. Deployment
391
-
392
- **HF Spaces:** No change
393
- - The `app.py` file handles everything
394
- - Automatically launches API + UI
395
- - Works out of the box
396
-
397
- **Docker:** Possible update
398
- - See `DEPLOYMENT.md` for details
399
- - May require 2 containers or a supervisor
400
-
401
- ---
402
-
403
- ## 🎓 Lessons Learned
404
-
405
- ### 1. Dual Architecture = Bad Idea
406
-
407
- Having two modes (HF Spaces vs Production) seemed convenient at first but created more problems than it solved.
408
-
409
- ### 2. HTTP Overhead Is Negligible
410
-
411
- The HTTP overhead is so small compared to ML processing that it’s negligible. The clean architecture is worth the cost.
412
-
413
- ### 3. Unified Tests = Better Quality
414
-
415
- Having a single code path makes testing much easier and reduces bugs.
416
-
417
- ---
418
-
419
- ## ✅ Conclusion
420
-
421
- Unifying the architecture to a 100% API model is a **success**:
422
-
423
- ✅ **Cleaner code** - Single path
424
- ✅ **Easier to maintain** - Less complexity
425
- ✅ **Easier to test** - Unified tests
426
- ✅ **Consistent behavior** - Same results everywhere
427
- ✅ **No breaking changes** - Backward compatible
428
-
429
- **Result:** Professional, scalable, and maintainable architecture! 🚀
430
-
431
- ---
432
-
433
- ## 📚 Related Documentation
434
-
435
- - 📖 [START.md](START.md) - Quick start guide
436
- - 📖 [README.md](README.md) - Main documentation
437
- - 📖 [DEPLOYMENT.md](DEPLOYMENT.md) - Deployment guide
438
- - 🧪 [test_unified_architecture.py](test_unified_architecture.py) - Tests
439
-
440
- ---
441
-
442
- **Questions?** Check [START.md](START.md) or open an issue on GitHub.
443
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
examples/api_example.py DELETED
@@ -1,94 +0,0 @@
1
- """
2
- Example: Using CU1-X API from Hugging Face Space
3
-
4
- This example shows how to call the CU1-X API deployed on Hugging Face Spaces.
5
- """
6
-
7
- from gradio_client import Client
8
- import json
9
-
10
- # Configuration
11
- SPACE_URL = "AI-DrivenTesting/CU1-X" # Remplacez par votre Space URL
12
-
13
- def detect_ui_elements(image_path: str):
14
- """
15
- Détecte les éléments UI dans une image via l'API HF Space
16
-
17
- Args:
18
- image_path: Chemin vers l'image à analyser
19
-
20
- Returns:
21
- Tuple (annotated_image, summary, detections_json)
22
- """
23
- # Créer le client Gradio
24
- client = Client(SPACE_URL)
25
-
26
- # Appeler l'API
27
- result = client.predict(
28
- image_path, # image
29
- 0.35, # confidence_threshold
30
- 2, # thickness
31
- True, # enable_clip (classification)
32
- True, # enable_ocr (extraction texte)
33
- False, # enable_blip (descriptions)
34
- False, # ocr_only
35
- "Only image & button", # blip_scope
36
- False, # preprocess
37
- "RF-DETR Optimized (Recommended)", # preprocess_mode
38
- "standard", # preprocess_preset
39
- api_name="/predict"
40
- )
41
-
42
- # Déballer les résultats
43
- annotated_image, summary, detections_json = result
44
-
45
- return annotated_image, summary, detections_json
46
-
47
-
48
- def main():
49
- """Exemple d'utilisation"""
50
-
51
- print("🚀 CU1-X API Example")
52
- print("=" * 50)
53
-
54
- # Chemin vers votre image de test
55
- test_image = "screenshot.png" # Remplacez par votre image
56
-
57
- try:
58
- print(f"\n📤 Uploading image: {test_image}")
59
- print("⏳ Processing... (this may take 30-60 seconds)")
60
-
61
- # Appeler l'API
62
- annotated_image, summary, detections = detect_ui_elements(test_image)
63
-
64
- # Afficher les résultats
65
- print("\n✅ Detection completed!")
66
- print("\n📊 Summary:")
67
- print(summary)
68
-
69
- print("\n🔍 Detections:")
70
- if isinstance(detections, str):
71
- detections = json.loads(detections)
72
-
73
- print(f" Total: {detections.get('total_detections', 0)} elements")
74
-
75
- if 'type_distribution' in detections:
76
- print("\n📈 Type Distribution:")
77
- for elem_type, count in detections['type_distribution'].items():
78
- print(f" {elem_type}: {count}")
79
-
80
- print("\n💾 Saving annotated image...")
81
- # annotated_image est un fichier temporaire, vous pouvez le copier
82
- print(f" Annotated image saved at: {annotated_image}")
83
-
84
- except Exception as e:
85
- print(f"\n❌ Error: {e}")
86
- print("\nTroubleshooting:")
87
- print("1. Vérifiez que votre Space est déployé et en ligne")
88
- print("2. Vérifiez que SPACE_URL est correct")
89
- print("3. Assurez-vous d'avoir installé: pip install gradio_client")
90
-
91
-
92
- if __name__ == "__main__":
93
- main()
94
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements-api-client.txt DELETED
@@ -1,8 +0,0 @@
1
- # Requirements for accessing HF Spaces API
2
- # Install this if you want to use the API client examples
3
-
4
- gradio_client>=0.10.0
5
- requests>=2.31.0
6
- pillow>=10.0.0
7
- aiohttp>=3.9.0 # For async examples
8
-
 
 
 
 
 
 
 
 
 
requirements-full.txt DELETED
@@ -1,40 +0,0 @@
1
- # Full requirements for CU1-X UI Element Detector
2
- # Use this file for deployment to Hugging Face Spaces or production
3
-
4
- # Core dependencies
5
- gradio[oauth]==4.44.1
6
-
7
- # Deep Learning frameworks
8
- torch==2.4.1
9
- torchvision==0.19.1
10
-
11
- # Computer Vision & Image Processing
12
- opencv-python-headless==4.10.0.84
13
- pillow==10.4.0
14
- numpy==1.26.4
15
- supervision==0.23.0
16
-
17
- # OCR & Text Recognition
18
- easyocr==1.7.1
19
-
20
- # Transformers & AI Models
21
- transformers==4.44.2
22
-
23
- # RF-DETR Detection Model
24
- rfdetr==1.0.4
25
-
26
- # API Framework
27
- fastapi==0.115.0
28
- uvicorn[standard]==0.30.6
29
-
30
- # HTTP Clients
31
- requests==2.32.3
32
- aiohttp==3.10.5
33
-
34
- # Testing
35
- pytest==8.3.3
36
-
37
- # Utilities
38
- python-multipart==0.0.9 # For FastAPI file uploads
39
- python-dotenv==1.0.1 # For environment variables
40
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -18,7 +18,13 @@ transformers==4.35.2
18
  peft==0.6.2
19
  accelerate==0.25.0
20
 
21
- # API
 
 
 
 
 
 
22
  fastapi==0.115.0
23
  uvicorn[standard]==0.30.6
24
 
 
18
  peft==0.6.2
19
  accelerate==0.25.0
20
 
21
+ # RF-DETR Detection Model
22
+ rfdetr==1.0.4
23
+
24
+ # COCO evaluation tools (required by RF-DETR)
25
+ pycocotools==2.0.8
26
+
27
+ # API Framework
28
  fastapi==0.115.0
29
  uvicorn[standard]==0.30.6
30
 
ui/detection_wrapper.py CHANGED
@@ -61,21 +61,28 @@ def detect_with_service(
61
  return_format="pil"
62
  )
63
 
64
- json_payload = response_builder.build_ocr_only_response(
65
- detections=detections,
66
- image_width=image.width,
67
- image_height=image.height,
 
 
 
 
 
68
  annotated_image=None,
69
  confidence_threshold=confidence_threshold,
70
- line_thickness=line_thickness
71
- )
72
-
73
- summary_text = response_builder.format_summary_text(
74
- detections=detections,
75
- parameters=json_payload["parameters"],
76
  ocr_only=True
77
  )
78
 
 
 
 
79
  return annotated, summary_text, json_payload
80
 
81
  # Standard detection path
@@ -105,30 +112,29 @@ def detect_with_service(
105
  analysis=analysis
106
  )
107
 
108
- # Build JSON response
109
- json_payload = {
110
- "success": True,
111
- "detections": analysis["detections"],
112
- "total_detections": len(analysis["detections"]),
113
- "image_size": analysis["image_size"],
114
- "parameters": {
115
- "confidence_threshold": confidence_threshold,
116
- "enable_clip": enable_clip,
117
- "enable_ocr": enable_ocr,
118
- "enable_blip": enable_blip,
119
- "blip_scope": scope_value if enable_blip else None,
120
- "ocr_only": False,
121
- "line_thickness": line_thickness
122
- },
123
- "type_distribution": response_builder.build_type_distribution(analysis["detections"]) if enable_clip else None
124
- }
125
-
126
- # Build summary text
127
- summary_text = response_builder.format_summary_text(
128
- detections=analysis["detections"],
129
- parameters=json_payload["parameters"],
130
  ocr_only=False
131
  )
 
 
 
 
 
 
 
 
 
 
132
 
133
  return annotated, summary_text, json_payload
134
 
@@ -199,9 +205,9 @@ def detect_with_api(
199
  'preprocess_preset': preprocess_preset
200
  }
201
 
202
- # Call API
203
- # Use configurable timeout (default 300s = 5min for CPU processing and model loading)
204
- timeout_seconds = int(os.getenv("CU1_API_TIMEOUT", "300"))
205
  try:
206
  response = requests.post(
207
  f"{api_url}/detect",
@@ -227,21 +233,22 @@ Cannot connect to API server at `{api_url}`
227
  You can change this by setting the `CU1_API_URL` environment variable.
228
  """, None
229
  except requests.exceptions.Timeout:
230
- timeout_seconds = int(os.getenv("CU1_API_TIMEOUT", "300"))
231
  return None, f"""❌ **Timeout Error**
232
 
233
  The API request timed out after {timeout_seconds} seconds.
234
 
235
- This might happen with:
236
- - Very large images
237
- - First run (models need to download - can take 2-5 minutes)
238
- - CPU-only processing (slower than GPU)
 
 
239
 
240
- **Try:**
241
- - Using a smaller image
242
- - Waiting for model downloads to complete (check API server logs)
243
- - Checking API server logs for errors
244
- - Increasing timeout: export CU1_API_TIMEOUT=600 (10 minutes)
245
  """, None
246
  except requests.exceptions.HTTPError as e:
247
  error_detail = "Unknown error"
 
61
  return_format="pil"
62
  )
63
 
64
+ # Build analysis structure for simplified response
65
+ analysis = {
66
+ "detections": detections,
67
+ "image_size": {"width": image.width, "height": image.height}
68
+ }
69
+
70
+ json_payload = response_builder.build_simplified_response(
71
+ analysis=analysis,
72
+ image=image,
73
  annotated_image=None,
74
  confidence_threshold=confidence_threshold,
75
+ line_thickness=line_thickness,
76
+ enable_clip=False,
77
+ enable_ocr=True,
78
+ enable_blip=False,
79
+ blip_scope=None,
 
80
  ocr_only=True
81
  )
82
 
83
+ detections_list = list(json_payload.get("detections", {}).values())
84
+ summary_text = f"**OCR-only mode**\n**Total OCR texts:** {len(detections_list)}"
85
+
86
  return annotated, summary_text, json_payload
87
 
88
  # Standard detection path
 
112
  analysis=analysis
113
  )
114
 
115
+ # Build JSON response using simplified format
116
+ json_payload = response_builder.build_simplified_response(
117
+ analysis=analysis,
118
+ image=image,
119
+ annotated_image=None, # Don't include in JSON (already have PIL image)
120
+ confidence_threshold=confidence_threshold,
121
+ line_thickness=line_thickness,
122
+ enable_clip=enable_clip,
123
+ enable_ocr=enable_ocr,
124
+ enable_blip=enable_blip,
125
+ blip_scope=scope_value,
 
 
 
 
 
 
 
 
 
 
 
126
  ocr_only=False
127
  )
128
+
129
+ # Build summary text from detections
130
+ detections_list = list(json_payload.get("detections", {}).values())
131
+ summary_lines = [f"**Total detections:** {len(detections_list)}", ""]
132
+ summary_lines.append("**Settings:**")
133
+ summary_lines.append(f"- Confidence threshold: {confidence_threshold:.2f}")
134
+ summary_lines.append(f"- CLIP classification: {'✅ Enabled' if enable_clip else '❌ Disabled'}")
135
+ summary_lines.append(f"- OCR text extraction: {'✅ Enabled' if enable_ocr else '❌ Disabled'}")
136
+ summary_lines.append(f"- BLIP description: {'✅ Enabled' if enable_blip else '❌ Disabled'}")
137
+ summary_text = "\n".join(summary_lines)
138
 
139
  return annotated, summary_text, json_payload
140
 
 
205
  'preprocess_preset': preprocess_preset
206
  }
207
 
208
+ # Call API with extended timeout for HuggingFace Spaces CPU processing
209
+ # Default: 600s (10 minutes) to handle model loading on first run
210
+ timeout_seconds = int(os.getenv("CU1_API_TIMEOUT", "600"))
211
  try:
212
  response = requests.post(
213
  f"{api_url}/detect",
 
233
  You can change this by setting the `CU1_API_URL` environment variable.
234
  """, None
235
  except requests.exceptions.Timeout:
236
+ timeout_seconds = int(os.getenv("CU1_API_TIMEOUT", "600"))
237
  return None, f"""❌ **Timeout Error**
238
 
239
  The API request timed out after {timeout_seconds} seconds.
240
 
241
+ **Most likely cause:** First-time model initialization on HuggingFace Spaces
242
+
243
+ **What to do:**
244
+ 1. Wait 2-3 minutes and try again (models are loading in background)
245
+ 2. Check the "Logs" tab in HuggingFace Spaces to see progress
246
+ 3. If you see "[API] Starting detection..." in logs, the API is working
247
 
248
+ **For debugging:**
249
+ - Check if you see initialization messages in logs
250
+ - Look for "Loading RF-DETR model..." or "Loading OCR reader..."
251
+ - These operations can take 2-5 minutes on CPU the first time
 
252
  """, None
253
  except requests.exceptions.HTTPError as e:
254
  error_detail = "Unknown error"
ui/shared_interface.py CHANGED
@@ -262,8 +262,6 @@ def create_interface(
262
 
263
  # Connect detection button
264
  # api_name exposes this function as /api/predict endpoint for Hugging Face Spaces
265
- # max_time increases Gradio's function timeout (default is 60s, we set to 300s = 5min)
266
- max_time_seconds = int(os.getenv("GRADIO_MAX_TIME", "300")) # 5 minutes default
267
  detect_button.click(
268
  fn=detection_fn,
269
  inputs=[
@@ -281,7 +279,7 @@ def create_interface(
281
  ],
282
  outputs=[output_image, summary_output, json_output],
283
  api_name="predict", # Expose as /api/predict endpoint
284
- max_time=max_time_seconds # Increase Gradio function timeout
285
  )
286
 
287
  # Build footer markdown
 
262
 
263
  # Connect detection button
264
  # api_name exposes this function as /api/predict endpoint for Hugging Face Spaces
 
 
265
  detect_button.click(
266
  fn=detection_fn,
267
  inputs=[
 
279
  ],
280
  outputs=[output_image, summary_output, json_output],
281
  api_name="predict", # Expose as /api/predict endpoint
282
+ show_progress="full" # Show progress to user during long operations
283
  )
284
 
285
  # Build footer markdown