Ali Mohsin commited on
Commit
45b7274
·
1 Parent(s): aa9a482

10000 final fixes hopefully

Browse files
API_DOCUMENTATION.md DELETED
@@ -1,412 +0,0 @@
1
- # Dressify API Documentation
2
-
3
- ## Overview
4
-
5
- The Dressify API provides personalized outfit recommendations using advanced deep learning models. The API supports an expanded tag system for fine-grained control over recommendations.
6
-
7
- ## Base URL
8
-
9
- ```
10
- https://your-domain.com/api
11
- ```
12
-
13
- ## Authentication
14
-
15
- All endpoints (except `/health` and `/tags`) require an API key in the `X-API-Key` header:
16
-
17
- ```http
18
- X-API-Key: your-api-key-here
19
- ```
20
-
21
- ## Endpoints
22
-
23
- ### 1. Health Check
24
-
25
- **GET** `/health`
26
-
27
- Check API health and model status.
28
-
29
- **Response:**
30
- ```json
31
- {
32
- "status": "ok",
33
- "device": "cuda",
34
- "resnet": "resnet_v1",
35
- "vit": "vit_v1"
36
- }
37
- ```
38
-
39
- ---
40
-
41
- ### 2. Get Available Tags
42
-
43
- **GET** `/tags`
44
-
45
- Get all available tag options for API integration.
46
-
47
- **Response:**
48
- ```json
49
- {
50
- "tag_categories": {
51
- "occasion": ["casual", "business", "formal", ...],
52
- "weather": ["any", "hot", "warm", "cold", ...],
53
- "style": ["casual", "smart_casual", "formal", ...],
54
- "color_preference": ["neutral", "monochromatic", ...],
55
- ...
56
- },
57
- "description": "Available tags for personalized outfit recommendations",
58
- "usage": {
59
- "primary_tags": ["occasion", "weather", "style"],
60
- "optional_tags": ["color_preference", "fit_preference", ...]
61
- }
62
- }
63
- ```
64
-
65
- ---
66
-
67
- ### 3. Validate Tags
68
-
69
- **POST** `/tags/validate`
70
-
71
- Validate tag values before making a recommendation request.
72
-
73
- **Request Body:**
74
- ```json
75
- {
76
- "occasion": "formal",
77
- "weather": "cold",
78
- "style": "elegant",
79
- "color_preference": "monochromatic"
80
- }
81
- ```
82
-
83
- **Response:**
84
- ```json
85
- {
86
- "valid": true,
87
- "errors": [],
88
- "validated_tags": {
89
- "occasion": "formal",
90
- "weather": "cold",
91
- "style": "elegant",
92
- "color_preference": "monochromatic"
93
- }
94
- }
95
- ```
96
-
97
- ---
98
-
99
- ### 4. Generate Embeddings
100
-
101
- **POST** `/embed`
102
-
103
- Generate embeddings for clothing item images.
104
-
105
- **Request Body:**
106
- ```json
107
- {
108
- "image_urls": ["https://example.com/image1.jpg"],
109
- "images_base64": []
110
- }
111
- ```
112
-
113
- **Response:**
114
- ```json
115
- {
116
- "embeddings": [[0.123, 0.456, ...]],
117
- "model_version": "resnet_v1"
118
- }
119
- ```
120
-
121
- ---
122
-
123
- ### 5. Compose Outfits (Enhanced with Tags)
124
-
125
- **POST** `/compose`
126
-
127
- Generate personalized outfit recommendations with expanded tag support.
128
-
129
- #### Request Format 1: Tag-Based (Recommended)
130
-
131
- **Request Body:**
132
- ```json
133
- {
134
- "items": [
135
- {
136
- "id": "item_1",
137
- "image_url": "https://example.com/shirt.jpg",
138
- "category": "shirt",
139
- "embedding": null
140
- },
141
- {
142
- "id": "item_2",
143
- "image_url": "https://example.com/pants.jpg",
144
- "category": "pants",
145
- "embedding": null
146
- }
147
- ],
148
- "occasion": "formal",
149
- "weather": "cold",
150
- "style": "elegant",
151
- "num_outfits": 5,
152
- "color_preference": "monochromatic",
153
- "fit_preference": "tailored",
154
- "material_preference": "wool",
155
- "season": "winter",
156
- "time_of_day": "evening",
157
- "personal_style": "sophisticated"
158
- }
159
- ```
160
-
161
- #### Request Format 2: Context Dict (Legacy)
162
-
163
- **Request Body:**
164
- ```json
165
- {
166
- "items": [...],
167
- "context": {
168
- "occasion": "formal",
169
- "weather": "cold",
170
- "style": "elegant",
171
- "num_outfits": 5
172
- }
173
- }
174
- ```
175
-
176
- #### Response:
177
- ```json
178
- {
179
- "outfits": [
180
- {
181
- "item_ids": ["item_1", "item_2", "item_3"],
182
- "items": [
183
- {
184
- "id": "item_1",
185
- "category": "jacket",
186
- "category_type": "outerwear"
187
- },
188
- {
189
- "id": "item_2",
190
- "category": "shirt",
191
- "category_type": "upper"
192
- },
193
- {
194
- "id": "item_3",
195
- "category": "pants",
196
- "category_type": "bottom"
197
- }
198
- ],
199
- "score": 1.85,
200
- "base_score": 0.25,
201
- "categories": ["jacket", "shirt", "pants"],
202
- "category_types": ["outerwear", "upper", "bottom"],
203
- "outfit_size": 3,
204
- "is_valid": true,
205
- "template": {
206
- "name": "formal",
207
- "style": "professional, elegant, sophisticated",
208
- "style_score": 0.95,
209
- "color_score": 0.88,
210
- "colors": ["navy", "white", "gray"],
211
- "accessory_limit": 4
212
- }
213
- }
214
- ],
215
- "version": "vit_v1",
216
- "tags_processed": true,
217
- "context_used": {
218
- "occasion": "formal",
219
- "weather": "cold",
220
- "style": "elegant",
221
- ...
222
- }
223
- }
224
- ```
225
-
226
- ---
227
-
228
- ## Tag System
229
-
230
- ### Primary Tags (High Priority)
231
-
232
- These tags have the highest influence on recommendations:
233
-
234
- - **occasion**: Event type (casual, business, formal, wedding, date, etc.)
235
- - **weather**: Weather conditions (any, hot, warm, cold, rain, snow, etc.)
236
- - **style**: Fashion aesthetic (casual, smart_casual, formal, elegant, etc.)
237
-
238
- ### Secondary Tags (Medium Priority)
239
-
240
- These tags refine recommendations:
241
-
242
- - **color_preference**: Color scheme (neutral, monochromatic, bold, etc.)
243
- - **fit_preference**: Fit type (fitted, loose, tailored, etc.)
244
- - **material_preference**: Fabric type (cotton, wool, silk, etc.)
245
- - **personal_style**: Personal style (conservative, bold, timeless, etc.)
246
-
247
- ### Tertiary Tags (Lower Priority)
248
-
249
- These provide additional context:
250
-
251
- - **season**: Current season (spring, summer, fall, winter)
252
- - **time_of_day**: When outfit will be worn (morning, afternoon, evening, night)
253
- - **budget**: Price range preference (luxury, premium, affordable, budget)
254
- - **age_group**: Age group (teen, young_adult, adult, mature)
255
- - **gender**: Gender preference (male, female, non_binary, unisex)
256
-
257
- ---
258
-
259
- ## Tag Processing
260
-
261
- The API automatically:
262
-
263
- 1. **Validates** all tag values
264
- 2. **Resolves conflicts** between conflicting tags
265
- 3. **Applies synergies** between complementary tags
266
- 4. **Prioritizes** tags based on importance
267
- 5. **Generates preferences** for the recommendation engine
268
-
269
- ### Tag Conflicts
270
-
271
- Some tags conflict and cannot be used together:
272
- - `hot` conflicts with `cold`, `freezing`, `snow`
273
- - `formal` conflicts with `casual`, `sporty`
274
- - `loose` conflicts with `fitted`, `tight`
275
-
276
- ### Tag Synergies
277
-
278
- Some tags work well together:
279
- - `formal` + `elegant` + `sophisticated` + `tailored`
280
- - `casual` + `comfortable` + `relaxed` + `practical`
281
- - `sporty` + `athletic` + `comfortable` + `moisture_wicking`
282
-
283
- ---
284
-
285
- ## Example Usage
286
-
287
- ### Python Example
288
-
289
- ```python
290
- import requests
291
-
292
- API_KEY = "your-api-key"
293
- BASE_URL = "https://your-domain.com/api"
294
-
295
- # Prepare items
296
- items = [
297
- {
298
- "id": "shirt_1",
299
- "image_url": "https://example.com/shirt.jpg",
300
- "category": "shirt"
301
- },
302
- {
303
- "id": "pants_1",
304
- "image_url": "https://example.com/pants.jpg",
305
- "category": "pants"
306
- }
307
- ]
308
-
309
- # Make recommendation request
310
- response = requests.post(
311
- f"{BASE_URL}/compose",
312
- json={
313
- "items": items,
314
- "occasion": "formal",
315
- "weather": "cold",
316
- "style": "elegant",
317
- "num_outfits": 5,
318
- "color_preference": "monochromatic",
319
- "fit_preference": "tailored",
320
- "material_preference": "wool"
321
- },
322
- headers={"X-API-Key": API_KEY}
323
- )
324
-
325
- result = response.json()
326
- outfits = result["outfits"]
327
- ```
328
-
329
- ### JavaScript Example
330
-
331
- ```javascript
332
- const API_KEY = 'your-api-key';
333
- const BASE_URL = 'https://your-domain.com/api';
334
-
335
- const items = [
336
- {
337
- id: 'shirt_1',
338
- image_url: 'https://example.com/shirt.jpg',
339
- category: 'shirt'
340
- },
341
- {
342
- id: 'pants_1',
343
- image_url: 'https://example.com/pants.jpg',
344
- category: 'pants'
345
- }
346
- ];
347
-
348
- fetch(`${BASE_URL}/compose`, {
349
- method: 'POST',
350
- headers: {
351
- 'Content-Type': 'application/json',
352
- 'X-API-Key': API_KEY
353
- },
354
- body: JSON.stringify({
355
- items: items,
356
- occasion: 'formal',
357
- weather: 'cold',
358
- style: 'elegant',
359
- num_outfits: 5,
360
- color_preference: 'monochromatic',
361
- fit_preference: 'tailored',
362
- material_preference: 'wool'
363
- })
364
- })
365
- .then(response => response.json())
366
- .then(data => {
367
- console.log('Outfits:', data.outfits);
368
- });
369
- ```
370
-
371
- ---
372
-
373
- ## Error Handling
374
-
375
- ### Invalid Tags
376
-
377
- ```json
378
- {
379
- "error": "Invalid tags provided",
380
- "errors": [
381
- "Invalid value 'invalid_occasion' for category 'occasion'"
382
- ],
383
- "valid_tag_options": {
384
- "occasion": ["casual", "business", "formal", ...]
385
- }
386
- }
387
- ```
388
-
389
- ### Models Not Loaded
390
-
391
- ```json
392
- {
393
- "error": "Models not trained or loaded properly",
394
- "details": ["ResNet: No trained weights found"],
395
- "message": "Please ensure models are trained..."
396
- }
397
- ```
398
-
399
- ---
400
-
401
- ## Rate Limits
402
-
403
- - Default: 100 requests per minute
404
- - Burst: 10 requests per second
405
-
406
- ---
407
-
408
- ## Support
409
-
410
- For API support, please contact: support@dressify.com
411
-
412
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
PRODUCTION_DEPLOYMENT.md DELETED
@@ -1,310 +0,0 @@
1
- # 🚀 Production Deployment Guide for Dressify
2
-
3
- ## Overview
4
-
5
- This guide explains how to deploy Dressify as a production-ready outfit recommendation service using the official Polyvore dataset splits.
6
-
7
- ## 🎯 Key Changes Made
8
-
9
- ### 1. **Official Split Usage** ✅
10
- - **Before**: System tried to create random 70/15/15 splits
11
- - **After**: System uses official splits from `nondisjoint/` and `disjoint/` folders
12
- - **Benefit**: Reproducible, research-grade results
13
-
14
- ### 2. **Robust Dataset Detection** 🔍
15
- - Automatically detects official splits in multiple locations
16
- - Falls back to metadata extraction if needed
17
- - No more random split creation by default
18
-
19
- ### 3. **Production-Ready Startup** 🚀
20
- - Comprehensive error handling and diagnostics
21
- - Clear status reporting
22
- - Automatic dataset verification
23
-
24
- ## 📁 Dataset Structure
25
-
26
- The system expects this structure after download:
27
-
28
- ```
29
- data/Polyvore/
30
- ├── images/ # Extracted from images.zip
31
- ├── nondisjoint/ # Official splits (preferred)
32
- │ ├── train.json # 31.8 MB - Training outfits
33
- │ ├── valid.json # 2.99 MB - Validation outfits
34
- │ └── test.json # 5.97 MB - Test outfits
35
- ├── disjoint/ # Alternative official splits
36
- │ ├── train.json # 9.65 MB - Training outfits
37
- │ ├── valid.json # 1.72 MB - Validation outfits
38
- │ └── test.json # 8.36 MB - Test outfits
39
- ├── polyvore_item_metadata.json # 105 MB - Item metadata
40
- ├── polyvore_outfit_titles.json # 6.97 MB - Outfit information
41
- └── categories.csv # 4.91 KB - Category mappings
42
- ```
43
-
44
- ## 🚀 Deployment Steps
45
-
46
- ### Step 1: Initial Setup
47
- ```bash
48
- # Clone the repository
49
- git clone <your-repo>
50
- cd recomendation
51
-
52
- # Install dependencies
53
- pip install -r requirements.txt
54
- ```
55
-
56
- ### Step 2: Dataset Preparation
57
- ```bash
58
- # Run the startup fix script
59
- python startup_fix.py
60
- ```
61
-
62
- This script will:
63
- 1. ✅ Download the Polyvore dataset from Hugging Face
64
- 2. ✅ Extract images from images.zip
65
- 3. ✅ Detect official splits in nondisjoint/ and disjoint/
66
- 4. ✅ Create training splits from official data
67
- 5. ✅ Verify all components are ready
68
-
69
- ### Step 3: Verify Dataset
70
- ```bash
71
- # Check dataset status
72
- python -c "
73
- from utils.data_fetch import check_dataset_structure
74
- import json
75
- structure = check_dataset_structure('data/Polyvore')
76
- print(json.dumps(structure, indent=2))
77
- "
78
- ```
79
-
80
- Expected output:
81
- ```json
82
- {
83
- "status": "ready",
84
- "images": {
85
- "exists": true,
86
- "count": 100000,
87
- "extensions": [".jpg", ".jpeg", ".png"]
88
- },
89
- "splits": {
90
- "nondisjoint": {
91
- "train.json": {"exists": true, "size_mb": 31.8},
92
- "valid.json": {"exists": true, "size_mb": 2.99},
93
- "test.json": {"exists": true, "size_mb": 5.97}
94
- }
95
- }
96
- }
97
- ```
98
-
99
- ### Step 4: Launch Application
100
- ```bash
101
- # Start the main application
102
- python app.py
103
- ```
104
-
105
- The system will:
106
- 1. 🔍 Check dataset status
107
- 2. ✅ Load official splits
108
- 3. 🚀 Launch Gradio interface
109
- 4. 🎯 Be ready for training and inference
110
-
111
- ## 🔧 Troubleshooting
112
-
113
- ### Issue: "No official splits found"
114
-
115
- **Cause**: The dataset download didn't include the split files.
116
-
117
- **Solution**:
118
- ```bash
119
- # Check what was downloaded
120
- ls -la data/Polyvore/
121
-
122
- # Re-run data fetcher
123
- python -c "
124
- from utils.data_fetch import ensure_dataset_ready
125
- ensure_dataset_ready()
126
- "
127
- ```
128
-
129
- ### Issue: "Dataset preparation failed"
130
-
131
- **Cause**: The prepare script couldn't parse the official splits.
132
-
133
- **Solution**:
134
- ```bash
135
- # Check split file format
136
- head -20 data/Polyvore/nondisjoint/train.json
137
-
138
- # Run preparation manually
139
- python scripts/prepare_polyvore.py --root data/Polyvore
140
- ```
141
-
142
- ### Issue: "Out of memory during training"
143
-
144
- **Cause**: GPU memory insufficient for default batch sizes.
145
-
146
- **Solution**: Use the Advanced Training interface to reduce batch sizes:
147
- - ResNet: Reduce from 64 to 16-32
148
- - ViT: Reduce from 32 to 8-16
149
- - Enable mixed precision (AMP)
150
-
151
- ## 🎯 Production Configuration
152
-
153
- ### Environment Variables
154
- ```bash
155
- export EXPORT_DIR="models/exports"
156
- export POLYVORE_ROOT="data/Polyvore"
157
- export CUDA_VISIBLE_DEVICES="0" # Specify GPU
158
- ```
159
-
160
- ### Docker Deployment
161
- ```bash
162
- # Build image
163
- docker build -t dressify .
164
-
165
- # Run container
166
- docker run -p 7860:7860 -p 8000:8000 \
167
- -v $(pwd)/data:/app/data \
168
- -v $(pwd)/models:/app/models \
169
- dressify
170
- ```
171
-
172
- ### Hugging Face Space
173
- 1. Upload the entire `recomendation/` folder
174
- 2. Set Space type to "Gradio"
175
- 3. The system auto-bootstraps on first run
176
- 4. Uses official splits for production-quality results
177
-
178
- ## 📊 Expected Performance
179
-
180
- ### Dataset Statistics
181
- - **Total Images**: ~100,000 fashion items
182
- - **Training Outfits**: ~50,000 (nondisjoint) or ~20,000 (disjoint)
183
- - **Validation Outfits**: ~5,000 (nondisjoint) or ~2,000 (disjoint)
184
- - **Test Outfits**: ~10,000 (nondisjoint) or ~4,000 (disjoint)
185
-
186
- ### Training Times (L4 GPU)
187
- - **ResNet Item Embedder**: 2-4 hours (20 epochs)
188
- - **ViT Outfit Encoder**: 1-2 hours (30 epochs)
189
- - **Total**: 3-6 hours for full training
190
-
191
- ### Inference Performance
192
- - **Item Embedding**: < 50ms per image
193
- - **Outfit Generation**: < 100ms per outfit
194
- - **Memory Usage**: ~2-4 GB GPU VRAM
195
-
196
- ## 🔬 Research vs Production
197
-
198
- ### Research Mode
199
- ```bash
200
- # Use disjoint splits (smaller, more challenging)
201
- python scripts/prepare_polyvore.py --root data/Polyvore
202
- # Automatically uses disjoint/ splits
203
- ```
204
-
205
- ### Production Mode
206
- ```bash
207
- # Use nondisjoint splits (larger, more robust)
208
- python scripts/prepare_polyvore.py --root data/Polyvore
209
- # Automatically uses nondisjoint/ splits (default)
210
- ```
211
-
212
- ## 📝 Monitoring & Logging
213
-
214
- ### Training Logs
215
- ```bash
216
- # Check training progress
217
- tail -f models/exports/training.log
218
-
219
- # Monitor GPU usage
220
- nvidia-smi -l 1
221
- ```
222
-
223
- ### System Health
224
- ```bash
225
- # Health check endpoint
226
- curl http://localhost:8000/health
227
-
228
- # Expected response
229
- {
230
- "status": "ok",
231
- "device": "cuda:0",
232
- "resnet": "resnet50_v2",
233
- "vit": "vit_outfit_v1"
234
- }
235
- ```
236
-
237
- ## 🚨 Emergency Procedures
238
-
239
- ### Dataset Corruption
240
- ```bash
241
- # Remove corrupted data
242
- rm -rf data/Polyvore/splits/
243
-
244
- # Re-run preparation
245
- python startup_fix.py
246
- ```
247
-
248
- ### Model Issues
249
- ```bash
250
- # Remove corrupted models
251
- rm -rf models/exports/*.pth
252
-
253
- # Re-train from scratch
254
- python train_resnet.py --data_root data/Polyvore --epochs 20
255
- python train_vit_triplet.py --data_root data/Polyvore --epochs 30
256
- ```
257
-
258
- ### System Recovery
259
- ```bash
260
- # Full system reset
261
- rm -rf data/Polyvore/
262
- rm -rf models/exports/
263
-
264
- # Fresh start
265
- python startup_fix.py
266
- ```
267
-
268
- ## ✅ Production Checklist
269
-
270
- - [ ] Dataset downloaded successfully (2.5GB+ images)
271
- - [ ] Official splits detected in nondisjoint/ or disjoint/
272
- - [ ] Training splits created in data/Polyvore/splits/
273
- - [ ] Models can be trained without errors
274
- - [ ] Inference service responds to health checks
275
- - [ ] Gradio interface loads successfully
276
- - [ ] Advanced training controls work
277
- - [ ] Model checkpoints can be saved/loaded
278
-
279
- ## 🎉 Success Indicators
280
-
281
- When everything is working correctly, you should see:
282
-
283
- ```
284
- ✅ Dataset ready at: data/Polyvore
285
- 📊 Images: 100000 files
286
- 📋 polyvore_item_metadata.json: 105.0 MB
287
- 📋 polyvore_outfit_titles.json: 6.97 MB
288
- 🎯 Official splits found:
289
- ✅ nondisjoint/train.json (31.8 MB)
290
- ✅ nondisjoint/valid.json (2.99 MB)
291
- ✅ nondisjoint/test.json (5.97 MB)
292
- 🎉 Using official splits from dataset!
293
- ✅ Dataset preparation completed successfully!
294
- ✅ All required splits verified!
295
- 🚀 Your Dressify system is ready to use!
296
- ```
297
-
298
- ## 📞 Support
299
-
300
- If you encounter issues:
301
-
302
- 1. **Check the logs** for specific error messages
303
- 2. **Verify dataset structure** matches expected layout
304
- 3. **Run startup_fix.py** for automated diagnostics
305
- 4. **Check GPU memory** and reduce batch sizes if needed
306
- 5. **Ensure official splits** are present in nondisjoint/ or disjoint/
307
-
308
- ---
309
-
310
- **🎯 Your Dressify system is now production-ready with official dataset splits!**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
PROJECT_SUMMARY.md DELETED
@@ -1,261 +0,0 @@
1
- # Dressify - Complete Project Summary
2
-
3
- ## 🎯 Project Overview
4
-
5
- **Dressify** is a **production-ready, research-grade** outfit recommendation system that automatically downloads the Polyvore dataset, trains state-of-the-art models, and provides a sophisticated Gradio interface for wardrobe uploads and outfit generation.
6
-
7
- ## 🏗️ System Architecture
8
-
9
- ### Core Components
10
-
11
- 1. **Data Pipeline** (`utils/data_fetch.py`)
12
- - Automatic download of Stylique/Polyvore dataset from HF Hub
13
- - Smart image extraction and organization
14
- - Robust split detection (root, nondisjoint, disjoint)
15
- - Fallback to deterministic 70/15/15 splits if official splits missing
16
-
17
- 2. **Model Architecture**
18
- - **ResNet Item Embedder** (`models/resnet_embedder.py`)
19
- - ImageNet-pretrained ResNet50 backbone
20
- - 512D projection head with L2 normalization
21
- - Triplet loss training for item compatibility
22
-
23
- - **ViT Outfit Encoder** (`models/vit_outfit.py`)
24
- - 6-layer transformer encoder
25
- - 8 attention heads, 4x feed-forward multiplier
26
- - Outfit-level compatibility scoring
27
- - Cosine distance triplet loss
28
-
29
- 3. **Training Pipeline**
30
- - **ResNet Training** (`train_resnet.py`)
31
- - Semi-hard negative mining
32
- - Mixed precision training with autocast
33
- - Channels-last memory format for CUDA
34
- - Automatic checkpointing and best model saving
35
-
36
- - **ViT Training** (`train_vit_triplet.py`)
37
- - Frozen ResNet embeddings as input
38
- - Outfit-level triplet mining
39
- - Validation with early stopping
40
- - Comprehensive metrics logging
41
-
42
- 4. **Inference Service** (`inference.py`)
43
- - On-the-fly image embedding
44
- - Slot-aware outfit composition
45
- - Candidate generation with category constraints
46
- - Compatibility scoring and ranking
47
-
48
- 5. **Web Interface** (`app.py`)
49
- - **Gradio UI**: Wardrobe upload, outfit generation, preview stitching
50
- - **FastAPI**: REST endpoints for embedding and composition
51
- - **Auto-bootstrap**: Background dataset prep and training
52
- - **Status Dashboard**: Real-time progress monitoring
53
-
54
- ## 🚀 Key Features
55
-
56
- ### Research-Grade Training
57
- - **Triplet Loss**: Semi-hard negative mining for better embeddings
58
- - **Mixed Precision**: CUDA-optimized training with autocast
59
- - **Advanced Augmentation**: Random crop, flip, color jitter, random erasing
60
- - **Curriculum Learning**: Progressive difficulty increase (configurable)
61
-
62
- ### Production-Ready Infrastructure
63
- - **Self-Contained**: No external dependencies or environment variables
64
- - **Auto-Recovery**: Handles missing splits, corrupted data gracefully
65
- - **Background Processing**: Non-blocking dataset preparation and training
66
- - **Model Versioning**: Automatic checkpoint management and best model saving
67
-
68
- ### Advanced UI/UX
69
- - **Multi-File Upload**: Drag & drop wardrobe images with previews
70
- - **Category Editing**: Manual category assignment for better slot awareness
71
- - **Context Awareness**: Occasion, weather, style preferences
72
- - **Visual Output**: Stitched outfit previews + structured JSON data
73
-
74
- ## 📊 Expected Performance
75
-
76
- ### Training Metrics
77
- - **Item Embedder**: Triplet accuracy > 85%, validation loss < 0.1
78
- - **Outfit Encoder**: Compatibility AUC > 0.8, precision > 0.75
79
- - **Training Time**: ResNet ~2-4h, ViT ~1-2h on L4 GPU
80
-
81
- ### Inference Performance
82
- - **Latency**: < 100ms per outfit on GPU, < 500ms on CPU
83
- - **Throughput**: 100+ outfits/second on modern GPU
84
- - **Memory**: ~2GB VRAM for full models, ~500MB for lightweight variants
85
-
86
- ## 🔧 Configuration & Customization
87
-
88
- ### Training Configs
89
- - **Item Training** (`configs/item.yaml`): Backbone, embedding dim, loss params
90
- - **Outfit Training** (`configs/outfit.yaml`): Transformer layers, attention heads
91
- - **Hardware Settings**: Mixed precision, channels-last, gradient clipping
92
-
93
- ### Model Variants
94
- - **Lightweight**: MobileNetV3 + small transformer (CPU-friendly)
95
- - **Standard**: ResNet50 + medium transformer (balanced)
96
- - **Research**: ResNet101 + large transformer (high performance)
97
-
98
- ## 🚀 Deployment Options
99
-
100
- ### 1. Hugging Face Space (Recommended)
101
- ```bash
102
- # Deploy to HF Space
103
- ./scripts/deploy_space.sh
104
-
105
- # Customize Space settings
106
- SPACE_NAME=my-dressify SPACE_HARDWARE=gpu-t4 ./scripts/deploy_space.sh
107
- ```
108
-
109
- ### 2. Local Development
110
- ```bash
111
- # Setup environment
112
- pip install -r requirements.txt
113
-
114
- # Launch app (auto-downloads dataset)
115
- python app.py
116
-
117
- # Manual training
118
- ./scripts/train_item.sh
119
- ./scripts/train_outfit.sh
120
- ```
121
-
122
- ### 3. Docker Deployment
123
- ```bash
124
- # Build and run
125
- docker build -t dressify .
126
- docker run -p 7860:7860 -p 8000:8000 dressify
127
- ```
128
-
129
- ## 📁 Project Structure
130
-
131
- ```
132
- recomendation/
133
- ├── app.py # Main FastAPI + Gradio app
134
- ├── inference.py # Inference service
135
- ├── models/
136
- │ ├── resnet_embedder.py # ResNet50 + projection
137
- │ └── vit_outfit.py # Transformer encoder
138
- ├── data/
139
- │ └── polyvore.py # PyTorch datasets
140
- ├── scripts/
141
- │ ├── prepare_polyvore.py # Dataset preparation
142
- │ ├── train_item.sh # ResNet training script
143
- │ ├── train_outfit.sh # ViT training script
144
- │ └── deploy_space.sh # HF Space deployment
145
- ├── utils/
146
- │ ├── data_fetch.py # HF dataset downloader
147
- │ ├── transforms.py # Image transforms
148
- │ ├── triplet_mining.py # Semi-hard negative mining
149
- │ ├── hf_utils.py # HF Hub integration
150
- │ └── export.py # Model export utilities
151
- ├── configs/
152
- │ ├── item.yaml # ResNet training config
153
- │ └── outfit.yaml # ViT training config
154
- ├── tests/
155
- │ └── test_system.py # Comprehensive tests
156
- ├── requirements.txt # Dependencies
157
- ├── Dockerfile # Container deployment
158
- └── README.md # Documentation
159
- ```
160
-
161
- ## 🧪 Testing & Validation
162
-
163
- ### Smoke Tests
164
- ```bash
165
- # Run comprehensive tests
166
- python -m pytest tests/test_system.py -v
167
-
168
- # Test individual components
169
- python -c "from models.resnet_embedder import ResNetItemEmbedder; print('✅ ResNet OK')"
170
- python -c "from models.vit_outfit import OutfitCompatibilityModel; print('✅ ViT OK')"
171
- ```
172
-
173
- ### Training Validation
174
- ```bash
175
- # Quick training runs
176
- EPOCHS=1 BATCH_SIZE=8 ./scripts/train_item.sh
177
- EPOCHS=1 BATCH_SIZE=4 ./scripts/train_outfit.sh
178
- ```
179
-
180
- ## 🔬 Research Contributions
181
-
182
- ### Novel Approaches
183
- 1. **Hybrid Architecture**: ResNet embeddings + Transformer compatibility
184
- 2. **Semi-Hard Mining**: Intelligent negative sample selection
185
- 3. **Slot Awareness**: Category-constrained outfit composition
186
- 4. **Auto-Bootstrap**: Self-contained dataset preparation and training
187
-
188
- ### Technical Innovations
189
- - **Mixed Precision Training**: CUDA-optimized with autocast
190
- - **Channels-Last Memory**: Improved GPU memory efficiency
191
- - **Background Processing**: Non-blocking system initialization
192
- - **Robust Data Handling**: Graceful fallback for missing splits
193
-
194
- ## 📈 Future Enhancements
195
-
196
- ### Model Improvements
197
- - **Multi-Modal**: Text descriptions + visual features
198
- - **Attention Visualization**: Interpretable outfit compatibility
199
- - **Style Transfer**: Generate outfit variations
200
- - **Personalization**: User preference learning
201
-
202
- ### System Features
203
- - **Real-Time Training**: Continuous model improvement
204
- - **A/B Testing**: Multiple model variants
205
- - **Performance Monitoring**: Automated quality metrics
206
- - **Scalable Deployment**: Multi-GPU, distributed training
207
-
208
- ## 🤝 Integration Examples
209
-
210
- ### Next.js + Supabase
211
- ```typescript
212
- // Complete integration example in README.md
213
- // Database schema with RLS policies
214
- // API endpoints for wardrobe management
215
- // Real-time outfit recommendations
216
- ```
217
-
218
- ### API Usage
219
- ```bash
220
- # Health check
221
- curl http://localhost:8000/health
222
-
223
- # Image embedding
224
- curl -X POST http://localhost:8000/embed \
225
- -H "Content-Type: application/json" \
226
- -d '{"images": ["base64_image_1"]}'
227
-
228
- # Outfit composition
229
- curl -X POST http://localhost:8000/compose \
230
- -H "Content-Type: application/json" \
231
- -d '{"items": [{"id": "item1", "embedding": [0.1, ...]}], "context": {"occasion": "casual"}}'
232
- ```
233
-
234
- ## 📚 Academic References
235
-
236
- ### Core Technologies
237
- - **Triplet Loss**: FaceNet, Deep Metric Learning
238
- - **Transformer Architecture**: Attention Is All You Need, ViT
239
- - **Outfit Compatibility**: Fashion Recommendation Systems
240
- - **Dataset Preparation**: Polyvore, Fashion-MNIST
241
-
242
- ### Research Papers
243
- - ResNet: Deep Residual Learning for Image Recognition
244
- - ViT: An Image is Worth 16x16 Words
245
- - Triplet Loss: FaceNet: A Unified Embedding for Face Recognition
246
- - Fashion AI: Learning Fashion Compatibility with Visual Similarity
247
-
248
- ## 🎉 Conclusion
249
-
250
- **Dressify** represents a **complete, production-ready** outfit recommendation system that combines:
251
-
252
- - **Research Excellence**: State-of-the-art deep learning architectures
253
- - **Production Quality**: Robust error handling, auto-recovery, monitoring
254
- - **User Experience**: Intuitive interface, real-time feedback, visual output
255
- - **Developer Experience**: Comprehensive testing, clear documentation, easy deployment
256
-
257
- The system is designed to be **self-contained**, **scalable**, and **research-grade**, making it suitable for both academic research and commercial deployment. With automatic dataset preparation, intelligent training, and sophisticated inference, Dressify provides a complete solution for outfit recommendation that requires minimal setup and maintenance.
258
-
259
- ---
260
-
261
- **Built with ❤️ for the fashion AI community**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
QUICK_START_TRAINING.md DELETED
@@ -1,229 +0,0 @@
1
- # 🚀 Quick Start: Advanced Training Interface
2
-
3
- ## Overview
4
-
5
- The Dressify system now provides **comprehensive parameter control** for both ResNet and ViT training directly from the Gradio interface. You can tweak every aspect of model training without editing code!
6
-
7
- ## 🎯 What You Can Control
8
-
9
- ### ResNet Item Embedder
10
- - **Architecture**: Backbone (ResNet50/101), embedding dimension, dropout
11
- - **Training**: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
12
- - **Hardware**: Mixed precision, memory format, gradient clipping
13
-
14
- ### ViT Outfit Encoder
15
- - **Architecture**: Transformer layers, attention heads, feed-forward multiplier, dropout
16
- - **Training**: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
17
- - **Strategy**: Mining strategy, augmentation level, random seed
18
-
19
- ### Advanced Settings
20
- - **Learning Rate**: Warmup epochs, scheduler type, early stopping patience
21
- - **Optimization**: Mixed precision, channels-last memory, gradient clipping
22
- - **Reproducibility**: Random seed, deterministic training
23
-
24
- ## 🚀 Quick Start Steps
25
-
26
- ### 1. Launch the App
27
- ```bash
28
- python app.py
29
- ```
30
-
31
- ### 2. Go to Advanced Training Tab
32
- - Click on the **"🔬 Advanced Training"** tab
33
- - You'll see comprehensive parameter controls organized in sections
34
-
35
- ### 3. Choose Your Training Mode
36
-
37
- #### Quick Training (Basic)
38
- - Set ResNet epochs: 5-10
39
- - Set ViT epochs: 10-20
40
- - Click **"🚀 Start Quick Training"**
41
-
42
- #### Advanced Training (Custom)
43
- - Adjust **all parameters** to your liking
44
- - Click **"🎯 Start Advanced Training"**
45
-
46
- ### 4. Monitor Progress
47
- - Watch the training log for real-time updates
48
- - Check the Status tab for system health
49
- - Download models from the Downloads tab when complete
50
-
51
- ## 🔬 Parameter Tuning Examples
52
-
53
- ### Fast Experimentation
54
- ```yaml
55
- # Quick test (5-10 minutes)
56
- ResNet: epochs=5, batch_size=16, lr=1e-3
57
- ViT: epochs=10, batch_size=16, lr=5e-4
58
- ```
59
-
60
- ### Standard Training
61
- ```yaml
62
- # Balanced quality (1-2 hours)
63
- ResNet: epochs=20, batch_size=64, lr=1e-3
64
- ViT: epochs=30, batch_size=32, lr=5e-4
65
- ```
66
-
67
- ### High Quality Training
68
- ```yaml
69
- # Production models (4-6 hours)
70
- ResNet: epochs=50, batch_size=32, lr=5e-4
71
- ViT: epochs=100, batch_size=16, lr=1e-4
72
- ```
73
-
74
- ### Research Experiments
75
- ```yaml
76
- # Maximum capacity
77
- ResNet: backbone=resnet101, embedding_dim=768
78
- ViT: layers=8, heads=12, mining_strategy=hardest
79
- ```
80
-
81
- ## 🎯 Key Parameters to Experiment With
82
-
83
- ### High Impact (Try First)
84
- 1. **Learning Rate**: 1e-4 to 1e-2
85
- 2. **Batch Size**: 16 to 128
86
- 3. **Triplet Margin**: 0.1 to 0.5
87
- 4. **Epochs**: 5 to 100
88
-
89
- ### Medium Impact
90
- 1. **Embedding Dimension**: 256, 512, 768, 1024
91
- 2. **Transformer Layers**: 4, 6, 8, 12
92
- 3. **Optimizer**: AdamW, Adam, SGD, RMSprop
93
-
94
- ### Fine-tuning
95
- 1. **Weight Decay**: 1e-6 to 1e-1
96
- 2. **Dropout**: 0.0 to 0.5
97
- 3. **Attention Heads**: 4, 8, 16
98
-
99
- ## 📊 Training Workflow
100
-
101
- ### 1. **Start Simple** 🚀
102
- - Use default parameters first
103
- - Run quick training (5-10 epochs)
104
- - Verify system works
105
-
106
- ### 2. **Experiment Systematically** 🔍
107
- - Change **one parameter at a time**
108
- - Start with learning rate and batch size
109
- - Document every change
110
-
111
- ### 3. **Validate Results** ✅
112
- - Compare training curves
113
- - Check validation metrics
114
- - Ensure improvements are consistent
115
-
116
- ### 4. **Scale Up** 📈
117
- - Use best parameters for longer training
118
- - Increase epochs gradually
119
- - Monitor for overfitting
120
-
121
- ## 🧪 Monitoring Training
122
-
123
- ### What to Watch
124
- - **Training Loss**: Should decrease steadily
125
- - **Validation Loss**: Should decrease without overfitting
126
- - **Training Time**: Per epoch timing
127
- - **GPU Memory**: VRAM usage
128
-
129
- ### Success Signs
130
- - Smooth loss curves
131
- - Consistent improvement
132
- - Good generalization
133
-
134
- ### Warning Signs
135
- - Loss spikes or plateaus
136
- - Validation loss increases
137
- - Training becomes unstable
138
-
139
- ## 🔧 Advanced Features
140
-
141
- ### Mixed Precision Training
142
- - **Enable**: Faster training, less memory
143
- - **Disable**: More stable, higher precision
144
- - **Default**: Enabled (recommended)
145
-
146
- ### Triplet Mining Strategies
147
- - **Semi-hard**: Balanced difficulty (default)
148
- - **Hardest**: Maximum challenge
149
- - **Random**: Simple but less effective
150
-
151
- ### Data Augmentation
152
- - **Minimal**: Basic transforms
153
- - **Standard**: Balanced augmentation (default)
154
- - **Aggressive**: Heavy augmentation
155
-
156
- ## 📝 Best Practices
157
-
158
- ### 1. **Document Everything** 📚
159
- - Save parameter combinations
160
- - Record training results
161
- - Note hardware specifications
162
-
163
- ### 2. **Start Small** 🔬
164
- - Test with few epochs first
165
- - Validate promising combinations
166
- - Scale up gradually
167
-
168
- ### 3. **Monitor Resources** 💻
169
- - Watch GPU memory usage
170
- - Check training time per epoch
171
- - Balance quality vs. speed
172
-
173
- ### 4. **Save Checkpoints** 💾
174
- - Models are saved automatically
175
- - Keep intermediate checkpoints
176
- - Download final models
177
-
178
- ## 🚨 Common Issues & Solutions
179
-
180
- ### Training Too Slow
181
- - **Reduce batch size**
182
- - **Increase learning rate**
183
- - **Use mixed precision**
184
- - **Reduce embedding dimension**
185
-
186
- ### Training Unstable
187
- - **Reduce learning rate**
188
- - **Increase batch size**
189
- - **Enable gradient clipping**
190
- - **Check data quality**
191
-
192
- ### Out of Memory
193
- - **Reduce batch size**
194
- - **Reduce embedding dimension**
195
- - **Use mixed precision**
196
- - **Reduce transformer layers**
197
-
198
- ### Poor Results
199
- - **Increase epochs**
200
- - **Adjust learning rate**
201
- - **Try different optimizers**
202
- - **Check data preprocessing**
203
-
204
- ## 📚 Next Steps
205
-
206
- ### 1. **Read the Full Guide**
207
- - See `TRAINING_PARAMETERS.md` for detailed explanations
208
- - Understand parameter impact and trade-offs
209
-
210
- ### 2. **Run Experiments**
211
- - Start with quick training
212
- - Experiment with different parameters
213
- - Document your findings
214
-
215
- ### 3. **Optimize for Your Use Case**
216
- - Balance quality vs. speed
217
- - Consider hardware constraints
218
- - Aim for reproducible results
219
-
220
- ### 4. **Share Results**
221
- - Document successful configurations
222
- - Share insights with the community
223
- - Contribute to best practices
224
-
225
- ---
226
-
227
- **🎉 You're ready to start experimenting!**
228
-
229
- *Remember: Start simple, change one thing at a time, and document everything. Happy training! 🚀*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README_HF_SETUP.md DELETED
@@ -1,60 +0,0 @@
1
- # Hugging Face Setup Guide
2
-
3
- ## 🔐 Setting Up Hugging Face Authentication
4
-
5
- ### 1. Get Your HF Token
6
- - Go to https://huggingface.co/settings/tokens
7
- - Create a new token with **Write** permissions
8
- - Copy the token (starts with `hf_...`)
9
-
10
- ### 2. Set Environment Variables
11
-
12
- #### Option A: In Hugging Face Spaces (Recommended)
13
- 1. Go to your Space settings
14
- 2. Add these secrets:
15
- - `HF_TOKEN`: Your Hugging Face token
16
- - `HF_USERNAME`: Your Hugging Face username (e.g., "Stylique")
17
-
18
- #### Option B: Local Development
19
- ```bash
20
- export HF_TOKEN="hf_your_token_here"
21
- export HF_USERNAME="your_username"
22
- ```
23
-
24
- ### 3. Verify Setup
25
- ```bash
26
- source setup_hf.sh
27
- ```
28
-
29
- ## 🚀 What Happens Next
30
-
31
- Once environment variables are set, the system will automatically:
32
- - ✅ Authenticate with Hugging Face
33
- - ✅ Upload trained models to `{HF_USERNAME}/dressify-models`
34
- - ✅ Upload datasets to `{HF_USERNAME}/Dressify-Helper`
35
- - ✅ Create repositories if they don't exist
36
-
37
- ## 🔒 Security Notes
38
-
39
- - **Never commit tokens to git**
40
- - **Use environment variables or HF Spaces secrets**
41
- - **Tokens are automatically masked in logs**
42
-
43
- ## 📁 Repository Structure
44
-
45
- After successful upload:
46
- ```
47
- {HF_USERNAME}/dressify-models/
48
- ├── resnet_item_embedder_best.pth
49
- ├── vit_outfit_model_best.pth
50
- ├── resnet_metrics.json
51
- └── vit_metrics.json
52
-
53
- {HF_USERNAME}/Dressify-Helper/
54
- ├── train.json
55
- ├── valid.json
56
- ├── test.json
57
- ├── outfit_triplets_train.json
58
- ├── outfit_triplets_valid.json
59
- └── outfit_triplets_test.json
60
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
RECOMMENDATION_PIPELINE_EXPLAINED.md ADDED
@@ -0,0 +1,340 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎯 How Dressify Recommendations Actually Work
2
+
3
+ ## ✅ **YES - Both ResNet and ViT are used during inference!**
4
+
5
+ This document explains the complete recommendation pipeline and proves that both deep learning models are actively used.
6
+
7
+ ---
8
+
9
+ ## 📊 **Complete Recommendation Pipeline**
10
+
11
+ ### **Step 1: Image Input & Category Detection**
12
+ **Location:** `inference.py:356-384`
13
+
14
+ ```python
15
+ # User uploads wardrobe images
16
+ items = [
17
+ {"id": "item_0", "image": <PIL.Image>, "category": None},
18
+ {"id": "item_1", "image": <PIL.Image>, "category": None},
19
+ ...
20
+ ]
21
+
22
+ # For each item:
23
+ for item in items:
24
+ # 1. Auto-detect category using CLIP (if available)
25
+ category = self._detect_category_with_clip(item["image"])
26
+ # OR fallback to filename-based detection
27
+
28
+ # 2. Generate embedding if not provided
29
+ if embedding is None:
30
+ embedding = self.embed_images([item["image"]])[0]
31
+ ```
32
+
33
+ **What happens:**
34
+ - Each clothing item image is processed
35
+ - Category is detected (shirt, pants, shoes, etc.) using CLIP or filename
36
+ - If no embedding exists, it's generated using **ResNet**
37
+
38
+ ---
39
+
40
+ ### **Step 2: ResNet Generates Item Embeddings** ⭐
41
+ **Location:** `inference.py:313-337` → `embed_images()`
42
+
43
+ ```python
44
+ @torch.inference_mode()
45
+ def embed_images(self, images: List[Image.Image]) -> List[np.ndarray]:
46
+ # Transform images to tensor
47
+ batch = torch.stack([self.transform(img) for img in images])
48
+ batch = batch.to(self.device, memory_format=torch.channels_last)
49
+
50
+ # ✅ RESNET IS CALLED HERE!
51
+ use_amp = (self.device == "cuda")
52
+ with torch.autocast(device_type=("cuda" if use_amp else "cpu"), enabled=use_amp):
53
+ emb = self.resnet(batch) # <-- RESNET FORWARD PASS
54
+
55
+ # Normalize embeddings
56
+ emb = nn.functional.normalize(emb, dim=-1)
57
+ result = [e.detach().cpu().numpy().astype(np.float32) for e in emb]
58
+ return result
59
+ ```
60
+
61
+ **What ResNet does:**
62
+ - Takes raw clothing item images (224x224 RGB)
63
+ - Passes through ResNet50 backbone (pretrained on ImageNet)
64
+ - Generates **512-dimensional embeddings** for each item
65
+ - These embeddings capture visual features (color, texture, style, pattern)
66
+
67
+ **Example:**
68
+ - Input: Image of a blue shirt → ResNet → Output: `[0.123, -0.456, 0.789, ...]` (512-dim vector)
69
+
70
+ ---
71
+
72
+ ### **Step 3: Tag Processing & Context Building**
73
+ **Location:** `inference.py:490-545`
74
+
75
+ ```python
76
+ # Process user tags (occasion, weather, style, etc.)
77
+ processed_tags = self.tag_processor.process_tags(context)
78
+
79
+ # Build outfit template based on tags
80
+ template = outfit_templates[outfit_style].copy()
81
+ # Apply weather/occasion modifications
82
+ # Generate constraints (min_items, max_items, accessory_limit)
83
+ ```
84
+
85
+ **What happens:**
86
+ - User preferences (formal, cold weather, elegant style) are processed
87
+ - Outfit templates are selected and modified
88
+ - Constraints are generated (e.g., formal requires 4-5 items, needs outerwear)
89
+
90
+ ---
91
+
92
+ ### **Step 4: Candidate Outfit Generation**
93
+ **Location:** `inference.py:910-1092`
94
+
95
+ ```python
96
+ # Generate many candidate outfit combinations
97
+ candidates = []
98
+ for _ in range(num_samples): # Typically 50-100+ candidates
99
+ subset = []
100
+
101
+ # Strategy-based generation:
102
+ # - Strategy 0: Core outfit (shirt + pants + shoes + accessories)
103
+ # - Strategy 1: Accessory-focused
104
+ # - Strategy 2: Flexible combination
105
+
106
+ # Add items based on context (formal, casual, etc.)
107
+ if occasion == "formal" and outerwear:
108
+ subset.append(jacket)
109
+ subset.append(shirt)
110
+ subset.append(pants)
111
+ subset.append(shoes)
112
+
113
+ candidates.append(subset)
114
+ ```
115
+
116
+ **What happens:**
117
+ - System generates **50-100+ candidate outfit combinations**
118
+ - Each candidate is a list of item indices (e.g., `[0, 3, 7, 12]`)
119
+ - Candidates are generated using:
120
+ - Category pools (uppers, bottoms, shoes, outerwear, accessories)
121
+ - Context-aware strategies (formal vs casual)
122
+ - Randomization for variety
123
+
124
+ ---
125
+
126
+ ### **Step 5: ViT Scores Outfit Compatibility** ⭐⭐
127
+ **Location:** `inference.py:1094-1103` → `score_subset()`
128
+
129
+ ```python
130
+ def score_subset(idx_subset: List[int]) -> float:
131
+ # Get embeddings for items in this outfit
132
+ embs = torch.tensor(
133
+ np.stack([proc_items[i]["embedding"] for i in idx_subset], axis=0),
134
+ dtype=torch.float32,
135
+ device=self.device,
136
+ ) # Shape: (N, 512) where N = number of items in outfit
137
+
138
+ embs = embs.unsqueeze(0) # Shape: (1, N, 512) - batch dimension
139
+
140
+ # ✅ VIT IS CALLED HERE!
141
+ s = self.vit.score_compatibility(embs).item() # <-- VIT FORWARD PASS
142
+ return float(s)
143
+ ```
144
+
145
+ **What ViT does:**
146
+ - Takes **multiple item embeddings** (e.g., jacket, shirt, pants, shoes)
147
+ - Passes through **Vision Transformer encoder**:
148
+ - Transformer processes the sequence of item embeddings
149
+ - Learns relationships between items (do they go together?)
150
+ - Outputs a **compatibility score** (higher = better match)
151
+
152
+ **ViT Architecture:**
153
+ ```python
154
+ # From models/vit_outfit.py
155
+ class OutfitCompatibilityModel(nn.Module):
156
+ def forward(self, tokens: torch.Tensor) -> torch.Tensor:
157
+ # tokens: (B, N, D) - batch of outfits, each with N items, D-dim embeddings
158
+ h = self.encoder(tokens) # Transformer encoder
159
+ pooled = h.mean(dim=1) # Average pooling across items
160
+ score = self.compatibility_head(pooled) # Final compatibility score
161
+ return score.squeeze(-1)
162
+ ```
163
+
164
+ **Example:**
165
+ - Input: `[jacket_emb, shirt_emb, pants_emb, shoes_emb]` (4 items × 512 dims)
166
+ - ViT Processing: Transformer analyzes relationships
167
+ - Output: `0.85` (high compatibility score)
168
+
169
+ ---
170
+
171
+ ### **Step 6: Scoring & Ranking**
172
+ **Location:** `inference.py:1266-1274`
173
+
174
+ ```python
175
+ # Score all valid candidates
176
+ scored = []
177
+ for subset in valid_candidates:
178
+ base_score = score_subset(subset) # <-- ViT score (0.0 to 1.0+)
179
+
180
+ # Apply penalties and bonuses
181
+ adjusted_score = calculate_outfit_penalty(subset, base_score)
182
+ # - Penalties: missing categories, duplicates, wrong context
183
+ # - Bonuses: color harmony, style coherence, complete sets
184
+
185
+ scored.append((subset, adjusted_score, base_score))
186
+
187
+ # Sort by adjusted score (highest first)
188
+ scored.sort(key=lambda x: x[1], reverse=True)
189
+ ```
190
+
191
+ **What happens:**
192
+ - Each candidate outfit gets:
193
+ 1. **Base score from ViT** (0.0 to ~1.0+)
194
+ 2. **Penalties** (e.g., -500 if formal without jacket)
195
+ 3. **Bonuses** (e.g., +0.6 for color harmony, +0.4 for style coherence)
196
+ - Final score = base_score + penalties + bonuses
197
+ - Outfits are ranked by final score
198
+
199
+ ---
200
+
201
+ ### **Step 7: Final Selection & Deduplication**
202
+ **Location:** `inference.py:1276-1300`
203
+
204
+ ```python
205
+ # Remove duplicate outfits
206
+ seen_outfits = set()
207
+ unique_scored = []
208
+ for subset, adjusted_score, base_score in scored:
209
+ normalized = normalize_outfit(subset) # Sort item IDs
210
+ if normalized not in seen_outfits:
211
+ seen_outfits.add(normalized)
212
+ unique_scored.append((subset, adjusted_score, base_score))
213
+
214
+ # Select top N with randomization
215
+ topk = unique_scored[:num_outfits]
216
+ ```
217
+
218
+ **What happens:**
219
+ - Duplicate outfits (same items, different order) are removed
220
+ - Top N outfits are selected
221
+ - Some randomization is added for variety
222
+
223
+ ---
224
+
225
+ ## 🔍 **Proof: Both Models Are Used**
226
+
227
+ ### **Evidence 1: ResNet Usage**
228
+ ```python
229
+ # Line 330 in inference.py
230
+ emb = self.resnet(batch) # ✅ ResNet forward pass
231
+ ```
232
+ - Called in `embed_images()` method
233
+ - Generates embeddings for every clothing item
234
+ - **Called during inference** when items don't have pre-computed embeddings
235
+
236
+ ### **Evidence 2: ViT Usage**
237
+ ```python
238
+ # Line 1102 in inference.py
239
+ s = self.vit.score_compatibility(embs).item() # ✅ ViT forward pass
240
+ ```
241
+ - Called in `score_subset()` function
242
+ - Scores **every candidate outfit** (50-100+ times per recommendation request)
243
+ - **Called during inference** to rank outfit combinations
244
+
245
+ ### **Evidence 3: Model Loading**
246
+ ```python
247
+ # Lines 49-50, 285-286 in inference.py
248
+ self.resnet, self.resnet_loaded = self._load_resnet()
249
+ self.vit, self.vit_loaded = self._load_vit()
250
+
251
+ # Models are loaded and set to eval mode
252
+ if self.resnet_loaded:
253
+ self.resnet = self.resnet.to(self.device).eval()
254
+ if self.vit_loaded:
255
+ self.vit = self.vit.to(self.device).eval()
256
+ ```
257
+
258
+ ---
259
+
260
+ ## 📈 **Complete Flow Diagram**
261
+
262
+ ```
263
+ User Input
264
+
265
+ [Upload Images] → [CLIP Category Detection]
266
+
267
+ [ResNet Embedding Generation] ← ✅ RESNET USED HERE
268
+
269
+ [512-dim Embeddings for Each Item]
270
+
271
+ [Tag Processing] → [Context Building]
272
+
273
+ [Candidate Generation] → [50-100+ Outfit Combinations]
274
+
275
+ [ViT Compatibility Scoring] ← ✅ VIT USED HERE (50-100+ times)
276
+
277
+ [Penalty/Bonus Adjustment]
278
+
279
+ [Ranking & Deduplication]
280
+
281
+ [Top N Recommendations]
282
+ ```
283
+
284
+ ---
285
+
286
+ ## 🎯 **Key Points**
287
+
288
+ 1. **ResNet is used:**
289
+ - Generates embeddings for each clothing item
290
+ - Called once per item (or uses cached embeddings)
291
+ - Output: 512-dimensional feature vectors
292
+
293
+ 2. **ViT is used:**
294
+ - Scores compatibility of outfit combinations
295
+ - Called **50-100+ times** per recommendation request (once per candidate)
296
+ - Output: Compatibility score (0.0 to ~1.0+)
297
+
298
+ 3. **Both models work together:**
299
+ - ResNet provides item-level understanding
300
+ - ViT provides outfit-level compatibility
301
+ - Together they create personalized, context-aware recommendations
302
+
303
+ 4. **The system is NOT just rule-based:**
304
+ - Deep learning models (ResNet + ViT) provide the core intelligence
305
+ - Rules and heuristics (penalties/bonuses) refine the results
306
+ - Tags and context guide the generation process
307
+
308
+ ---
309
+
310
+ ## 🔬 **Technical Details**
311
+
312
+ ### **ResNet Architecture:**
313
+ - **Backbone:** ResNet50 (pretrained on ImageNet)
314
+ - **Input:** 224×224 RGB images
315
+ - **Output:** 512-dimensional embeddings
316
+ - **Purpose:** Extract visual features from clothing items
317
+
318
+ ### **ViT Architecture:**
319
+ - **Encoder:** Transformer with 4-6 layers, 8 attention heads
320
+ - **Input:** Sequence of item embeddings (variable length, 2-6 items)
321
+ - **Output:** Single compatibility score
322
+ - **Purpose:** Learn which items go well together
323
+
324
+ ### **Training:**
325
+ - **ResNet:** Trained with triplet loss on item pairs
326
+ - **ViT:** Trained with triplet loss on outfit triplets (positive, anchor, negative)
327
+ - **Both:** Use early stopping, best model checkpointing
328
+
329
+ ---
330
+
331
+ ## ✅ **Conclusion**
332
+
333
+ **YES - Both ResNet and ViT are actively used during inference!**
334
+
335
+ - **ResNet** generates item embeddings (visual understanding)
336
+ - **ViT** scores outfit compatibility (relationship learning)
337
+ - Together they create intelligent, personalized recommendations
338
+
339
+ The system is a **true deep learning pipeline**, not just rule-based filtering!
340
+
app.py CHANGED
@@ -17,6 +17,15 @@ import json
17
  from inference import InferenceService
18
  from utils.data_fetch import ensure_dataset_ready
19
  from utils.tag_system import get_all_tag_options, validate_tags, TagProcessor
 
 
 
 
 
 
 
 
 
20
 
21
  # Global state
22
  BOOT_STATUS = "starting"
@@ -335,6 +344,18 @@ def get_tags() -> dict:
335
  }
336
  }
337
 
 
 
 
 
 
 
 
 
 
 
 
 
338
  @app.post("/tags/validate")
339
  def validate_request_tags(tags: Dict[str, Any], x_api_key: Optional[str] = Header(None)) -> dict:
340
  """
@@ -389,20 +410,52 @@ def test_recommend() -> dict:
389
 
390
  @app.post("/embed")
391
  def embed(req: EmbedRequest, x_api_key: Optional[str] = Header(None)) -> dict:
 
 
 
 
392
  require_api_key(x_api_key)
393
  images: List[Image.Image] = []
 
 
 
394
  if req.image_urls:
395
  for url in req.image_urls:
396
- resp = requests.get(url, timeout=20)
397
- resp.raise_for_status()
398
- images.append(Image.open(io.BytesIO(resp.content)).convert("RGB"))
 
 
 
 
399
  if req.images_base64:
400
  for b64 in req.images_base64:
401
- images.append(Image.open(io.BytesIO(base64.b64decode(b64))).convert("RGB"))
 
 
 
 
 
 
 
 
 
402
  if not images:
403
- raise HTTPException(status_code=400, detail="No images provided")
 
 
 
 
 
 
 
404
  embs = service.embed_images(images)
405
- return {"embeddings": [e.tolist() for e in embs], "model_version": service.resnet_version}
 
 
 
 
 
406
 
407
 
408
  @app.post("/compose")
@@ -498,14 +551,11 @@ def artifacts() -> dict:
498
  # --------- Gradio UI ---------
499
 
500
  def _load_images_from_files(files: List[str]) -> List[Image.Image]:
501
- images: List[Image.Image] = []
502
- for fp in files:
503
- try:
504
- with Image.open(fp) as im:
505
- images.append(im.convert("RGB"))
506
- except Exception:
507
- continue
508
- return images
509
 
510
 
511
  def gradio_embed(files: List[str]):
@@ -870,9 +920,9 @@ def start_training_simple(dataset_size: str, res_epochs: int, vit_epochs: int):
870
  # Train ResNet first and wait for completion
871
  log_message += f"\n🚀 Starting ResNet training on {dataset_size} samples...\n"
872
  resnet_result = subprocess.run([
873
- "python", "train_resnet.py", "--data_root", DATASET_ROOT, "--epochs", str(res_epochs),
874
  "--batch_size", "4", "--lr", "1e-3", "--early_stopping_patience", "3",
875
- "--out", os.path.join(export_dir, "resnet_item_embedder.pth")
876
  ] + dataset_args, capture_output=True, text=True, check=False)
877
 
878
  if resnet_result.returncode == 0:
@@ -897,7 +947,7 @@ def start_training_simple(dataset_size: str, res_epochs: int, vit_epochs: int):
897
 
898
  log_message += f"\n🚀 Starting ViT training on {dataset_size} samples...\n"
899
  vit_result = subprocess.run([
900
- "python", "train_vit_triplet.py", "--data_root", DATASET_ROOT, "--epochs", str(vit_epochs),
901
  "--batch_size", "4", "--lr", "5e-4", "--early_stopping_patience", "5",
902
  "--max_samples", "5000", "--triplet_margin", "0.5", "--gradient_clip", "1.0",
903
  "--warmup_epochs", "2", "--export", os.path.join(export_dir, "vit_outfit_model.pth")
@@ -956,8 +1006,14 @@ with gr.Blocks(fill_height=True, title="Dressify - Advanced Outfit Recommendatio
956
 
957
  with gr.Tab("🎨 Recommend"):
958
  gr.Markdown("### 🎯 Personalized Outfit Recommendations\n*Upload your wardrobe and customize recommendations with advanced tag preferences*")
 
959
 
960
- inp2 = gr.Files(label="Upload wardrobe images", file_types=["image"], file_count="multiple")
 
 
 
 
 
961
 
962
  with gr.Accordion("🎯 Primary Tags (Required)", open=True):
963
  with gr.Row():
 
17
  from inference import InferenceService
18
  from utils.data_fetch import ensure_dataset_ready
19
  from utils.tag_system import get_all_tag_options, validate_tags, TagProcessor
20
+ from utils.image_utils import (
21
+ load_images_from_files,
22
+ load_image_from_bytes,
23
+ load_image_from_url,
24
+ is_image_file,
25
+ get_supported_formats,
26
+ get_supported_extensions,
27
+ ensure_rgb_image
28
+ )
29
 
30
  # Global state
31
  BOOT_STATUS = "starting"
 
344
  }
345
  }
346
 
347
+ @app.get("/image-formats")
348
+ def get_image_formats() -> dict:
349
+ """
350
+ Get all supported image formats for API integration.
351
+ """
352
+ return {
353
+ "supported_formats": get_supported_formats(),
354
+ "supported_extensions": get_supported_extensions(),
355
+ "description": "All major image formats are supported including JPG, PNG, WEBP, GIF, BMP, TIFF, and more",
356
+ "note": "Images are automatically converted to RGB mode for model processing"
357
+ }
358
+
359
  @app.post("/tags/validate")
360
  def validate_request_tags(tags: Dict[str, Any], x_api_key: Optional[str] = Header(None)) -> dict:
361
  """
 
410
 
411
  @app.post("/embed")
412
  def embed(req: EmbedRequest, x_api_key: Optional[str] = Header(None)) -> dict:
413
+ """
414
+ Generate embeddings for images with comprehensive format support.
415
+ Supports JPG, PNG, WEBP, GIF, BMP, TIFF, and other major formats.
416
+ """
417
  require_api_key(x_api_key)
418
  images: List[Image.Image] = []
419
+ errors = []
420
+
421
+ # Load from URLs
422
  if req.image_urls:
423
  for url in req.image_urls:
424
+ img = load_image_from_url(url, timeout=20, convert_to_rgb=True, raise_on_error=False)
425
+ if img is not None:
426
+ images.append(img)
427
+ else:
428
+ errors.append(f"Failed to load image from URL: {url}")
429
+
430
+ # Load from base64
431
  if req.images_base64:
432
  for b64 in req.images_base64:
433
+ try:
434
+ image_bytes = base64.b64decode(b64)
435
+ img = load_image_from_bytes(image_bytes, convert_to_rgb=True, raise_on_error=False)
436
+ if img is not None:
437
+ images.append(img)
438
+ else:
439
+ errors.append("Failed to load image from base64")
440
+ except Exception as e:
441
+ errors.append(f"Error decoding base64 image: {str(e)}")
442
+
443
  if not images:
444
+ error_msg = "No images provided or all images failed to load"
445
+ if errors:
446
+ error_msg += f". Errors: {', '.join(errors[:3])}"
447
+ raise HTTPException(status_code=400, detail=error_msg)
448
+
449
+ # Ensure all images are RGB
450
+ images = [ensure_rgb_image(img) for img in images]
451
+
452
  embs = service.embed_images(images)
453
+ return {
454
+ "embeddings": [e.tolist() for e in embs],
455
+ "model_version": service.resnet_version,
456
+ "images_loaded": len(images),
457
+ "errors": errors if errors else None
458
+ }
459
 
460
 
461
  @app.post("/compose")
 
551
  # --------- Gradio UI ---------
552
 
553
  def _load_images_from_files(files: List[str]) -> List[Image.Image]:
554
+ """
555
+ Load images from file paths with comprehensive format support.
556
+ Supports JPG, PNG, WEBP, GIF, BMP, TIFF, and other major formats.
557
+ """
558
+ return load_images_from_files(files, convert_to_rgb=True, skip_errors=True)
 
 
 
559
 
560
 
561
  def gradio_embed(files: List[str]):
 
920
  # Train ResNet first and wait for completion
921
  log_message += f"\n🚀 Starting ResNet training on {dataset_size} samples...\n"
922
  resnet_result = subprocess.run([
923
+ "python", "train_resnet.py", "--data_root", DATASET_ROOT, "--epochs", str(res_epochs),
924
  "--batch_size", "4", "--lr", "1e-3", "--early_stopping_patience", "3",
925
+ "--out", os.path.join(export_dir, "resnet_item_embedder.pth")
926
  ] + dataset_args, capture_output=True, text=True, check=False)
927
 
928
  if resnet_result.returncode == 0:
 
947
 
948
  log_message += f"\n🚀 Starting ViT training on {dataset_size} samples...\n"
949
  vit_result = subprocess.run([
950
+ "python", "train_vit_triplet.py", "--data_root", DATASET_ROOT, "--epochs", str(vit_epochs),
951
  "--batch_size", "4", "--lr", "5e-4", "--early_stopping_patience", "5",
952
  "--max_samples", "5000", "--triplet_margin", "0.5", "--gradient_clip", "1.0",
953
  "--warmup_epochs", "2", "--export", os.path.join(export_dir, "vit_outfit_model.pth")
 
1006
 
1007
  with gr.Tab("🎨 Recommend"):
1008
  gr.Markdown("### 🎯 Personalized Outfit Recommendations\n*Upload your wardrobe and customize recommendations with advanced tag preferences*")
1009
+ gr.Markdown(f"**Supported Formats:** {', '.join(get_supported_extensions())} (JPG, PNG, WEBP, GIF, BMP, TIFF, and more)")
1010
 
1011
+ inp2 = gr.Files(
1012
+ label="Upload wardrobe images",
1013
+ file_types=["image"],
1014
+ file_count="multiple",
1015
+ type="filepath"
1016
+ )
1017
 
1018
  with gr.Accordion("🎯 Primary Tags (Required)", open=True):
1019
  with gr.Row():
inference.py CHANGED
@@ -16,6 +16,7 @@ from utils.transforms import build_inference_transform
16
  from models.resnet_embedder import ResNetItemEmbedder
17
  from models.vit_outfit import OutfitCompatibilityModel
18
  from utils.tag_system import TagProcessor, get_all_tag_options, validate_tags
 
19
 
20
 
21
  def _get_device() -> str:
@@ -312,6 +313,10 @@ class InferenceService:
312
 
313
  @torch.inference_mode()
314
  def embed_images(self, images: List[Image.Image]) -> List[np.ndarray]:
 
 
 
 
315
  print(f"🔍 DEBUG: embed_images called with {len(images)} images")
316
  if len(images) == 0:
317
  print("🔍 DEBUG: No images provided, returning empty list")
@@ -321,9 +326,27 @@ class InferenceService:
321
  if self.resnet is None:
322
  print("🔍 DEBUG: ResNet model is None, returning empty list")
323
  return []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
324
 
325
  try:
326
- batch = torch.stack([self.transform(img) for img in images])
327
  batch = batch.to(self.device, memory_format=torch.channels_last)
328
  use_amp = (self.device == "cuda")
329
  with torch.autocast(device_type=("cuda" if use_amp else "cpu"), enabled=use_amp):
@@ -334,6 +357,8 @@ class InferenceService:
334
  return result
335
  except Exception as e:
336
  print(f"🔍 DEBUG: Error in embed_images: {e}")
 
 
337
  return []
338
 
339
  @torch.inference_mode()
 
16
  from models.resnet_embedder import ResNetItemEmbedder
17
  from models.vit_outfit import OutfitCompatibilityModel
18
  from utils.tag_system import TagProcessor, get_all_tag_options, validate_tags
19
+ from utils.image_utils import ensure_rgb_image, validate_image_format
20
 
21
 
22
  def _get_device() -> str:
 
313
 
314
  @torch.inference_mode()
315
  def embed_images(self, images: List[Image.Image]) -> List[np.ndarray]:
316
+ """
317
+ Generate embeddings for images with comprehensive format support.
318
+ All images are validated and converted to RGB before processing.
319
+ """
320
  print(f"🔍 DEBUG: embed_images called with {len(images)} images")
321
  if len(images) == 0:
322
  print("🔍 DEBUG: No images provided, returning empty list")
 
326
  if self.resnet is None:
327
  print("🔍 DEBUG: ResNet model is None, returning empty list")
328
  return []
329
+
330
+ # Validate and convert all images to RGB
331
+ processed_images = []
332
+ for i, img in enumerate(images):
333
+ is_valid, error_msg = validate_image_format(img)
334
+ if not is_valid:
335
+ print(f"⚠️ Skipping invalid image {i}: {error_msg}")
336
+ continue
337
+
338
+ # Ensure RGB mode (required for ResNet)
339
+ rgb_img = ensure_rgb_image(img)
340
+ processed_images.append(rgb_img)
341
+
342
+ if len(processed_images) == 0:
343
+ print("⚠️ No valid images after processing")
344
+ return []
345
+
346
+ print(f"🔍 DEBUG: Processing {len(processed_images)} valid images")
347
 
348
  try:
349
+ batch = torch.stack([self.transform(img) for img in processed_images])
350
  batch = batch.to(self.device, memory_format=torch.channels_last)
351
  use_amp = (self.device == "cuda")
352
  with torch.autocast(device_type=("cuda" if use_amp else "cpu"), enabled=use_amp):
 
357
  return result
358
  except Exception as e:
359
  print(f"🔍 DEBUG: Error in embed_images: {e}")
360
+ import traceback
361
+ traceback.print_exc()
362
  return []
363
 
364
  @torch.inference_mode()
utils/artifact_manager.py CHANGED
@@ -90,7 +90,10 @@ class ArtifactManager:
90
  images_dir = os.path.join(self.data_dir, "images")
91
  if os.path.exists(images_dir):
92
  try:
93
- image_files = [f for f in os.listdir(images_dir) if f.lower().endswith(('.jpg', '.jpeg', '.png', '.webp'))]
 
 
 
94
  info["images_count"] = len(image_files)
95
  except:
96
  pass
 
90
  images_dir = os.path.join(self.data_dir, "images")
91
  if os.path.exists(images_dir):
92
  try:
93
+ # Support all major image formats
94
+ from utils.image_utils import get_supported_extensions
95
+ supported_exts = tuple(ext.lower() for ext in get_supported_extensions())
96
+ image_files = [f for f in os.listdir(images_dir) if f.lower().endswith(supported_exts)]
97
  info["images_count"] = len(image_files)
98
  except:
99
  pass
utils/image_utils.py ADDED
@@ -0,0 +1,374 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Comprehensive Image Format Support Utilities
3
+
4
+ This module provides robust image loading and processing that supports
5
+ all major image formats including JPG, PNG, WEBP, GIF, BMP, TIFF, etc.
6
+ """
7
+
8
+ import io
9
+ from typing import List, Optional, Tuple, Union
10
+ from pathlib import Path
11
+
12
+ from PIL import Image, ImageFile, UnidentifiedImageError
13
+ import requests
14
+
15
+
16
+ # Enable PIL to load truncated images
17
+ ImageFile.LOAD_TRUNCATED_IMAGES = True
18
+
19
+ # Supported image formats
20
+ SUPPORTED_FORMATS = {
21
+ # Raster formats
22
+ 'JPEG', 'JPG', # JPEG
23
+ 'PNG', # PNG
24
+ 'WEBP', # WebP
25
+ 'GIF', # GIF (static frames)
26
+ 'BMP', # Bitmap
27
+ 'TIFF', 'TIF', # TIFF
28
+ 'ICO', # Icon
29
+ 'PCX', # PC Paintbrush
30
+ 'PPM', # Portable Pixmap
31
+ 'PBM', # Portable Bitmap
32
+ 'PGM', # Portable Graymap
33
+ 'XBM', # X Bitmap
34
+ 'XPM', # X Pixmap
35
+ # Additional formats if available
36
+ 'HEIF', 'HEIC', # HEIF/HEIC (if pillow-heif installed)
37
+ 'AVIF', # AVIF (if pillow-avif-plugin installed)
38
+ }
39
+
40
+ # File extensions mapping
41
+ EXTENSION_TO_FORMAT = {
42
+ '.jpg': 'JPEG',
43
+ '.jpeg': 'JPEG',
44
+ '.png': 'PNG',
45
+ '.webp': 'WEBP',
46
+ '.gif': 'GIF',
47
+ '.bmp': 'BMP',
48
+ '.tiff': 'TIFF',
49
+ '.tif': 'TIFF',
50
+ '.ico': 'ICO',
51
+ '.pcx': 'PCX',
52
+ '.ppm': 'PPM',
53
+ '.pbm': 'PBM',
54
+ '.pgm': 'PGM',
55
+ '.xbm': 'XBM',
56
+ '.xpm': 'XPM',
57
+ '.heif': 'HEIF',
58
+ '.heic': 'HEIC',
59
+ '.avif': 'AVIF',
60
+ }
61
+
62
+
63
+ def is_image_file(filepath: Union[str, Path]) -> bool:
64
+ """
65
+ Check if a file is a supported image format based on extension.
66
+
67
+ Args:
68
+ filepath: Path to the file
69
+
70
+ Returns:
71
+ True if the file appears to be a supported image format
72
+ """
73
+ path = Path(filepath)
74
+ ext = path.suffix.lower()
75
+ return ext in EXTENSION_TO_FORMAT
76
+
77
+
78
+ def get_image_format(filepath: Union[str, Path]) -> Optional[str]:
79
+ """
80
+ Get the image format from file extension.
81
+
82
+ Args:
83
+ filepath: Path to the file
84
+
85
+ Returns:
86
+ Format name (e.g., 'JPEG', 'PNG') or None if unknown
87
+ """
88
+ path = Path(filepath)
89
+ ext = path.suffix.lower()
90
+ return EXTENSION_TO_FORMAT.get(ext)
91
+
92
+
93
+ def load_image_from_file(
94
+ filepath: Union[str, Path],
95
+ convert_to_rgb: bool = True,
96
+ raise_on_error: bool = False
97
+ ) -> Optional[Image.Image]:
98
+ """
99
+ Load an image from a file path, supporting all major formats.
100
+
101
+ Args:
102
+ filepath: Path to the image file
103
+ convert_to_rgb: Convert image to RGB mode (required for models)
104
+ raise_on_error: If True, raise exception on error; if False, return None
105
+
106
+ Returns:
107
+ PIL Image object or None if loading failed
108
+ """
109
+ try:
110
+ path = Path(filepath)
111
+
112
+ # Check if file exists
113
+ if not path.exists():
114
+ if raise_on_error:
115
+ raise FileNotFoundError(f"Image file not found: {filepath}")
116
+ return None
117
+
118
+ # Check if it's a supported format
119
+ if not is_image_file(path):
120
+ if raise_on_error:
121
+ raise ValueError(f"Unsupported image format: {filepath}")
122
+ print(f"⚠️ Skipping unsupported format: {filepath}")
123
+ return None
124
+
125
+ # Open and load image
126
+ with Image.open(path) as img:
127
+ # Verify it's actually an image
128
+ img.verify()
129
+
130
+ # Re-open for actual use (verify() closes the file)
131
+ img = Image.open(path)
132
+
133
+ # Convert to RGB if needed (required for deep learning models)
134
+ if convert_to_rgb:
135
+ if img.mode != 'RGB':
136
+ # Handle different modes
137
+ if img.mode in ('RGBA', 'LA', 'P'):
138
+ # Create white background for transparency
139
+ background = Image.new('RGB', img.size, (255, 255, 255))
140
+ if img.mode == 'P':
141
+ img = img.convert('RGBA')
142
+ if img.mode in ('RGBA', 'LA'):
143
+ background.paste(img, mask=img.split()[-1]) # Use alpha channel as mask
144
+ img = background
145
+ else:
146
+ img = img.convert('RGB')
147
+
148
+ return img
149
+
150
+ except UnidentifiedImageError:
151
+ error_msg = f"❌ Cannot identify image format: {filepath}"
152
+ if raise_on_error:
153
+ raise ValueError(error_msg)
154
+ print(error_msg)
155
+ return None
156
+ except Exception as e:
157
+ error_msg = f"❌ Error loading image {filepath}: {str(e)}"
158
+ if raise_on_error:
159
+ raise
160
+ print(error_msg)
161
+ return None
162
+
163
+
164
+ def load_image_from_bytes(
165
+ image_bytes: bytes,
166
+ convert_to_rgb: bool = True,
167
+ raise_on_error: bool = False
168
+ ) -> Optional[Image.Image]:
169
+ """
170
+ Load an image from bytes, supporting all major formats.
171
+
172
+ Args:
173
+ image_bytes: Image data as bytes
174
+ convert_to_rgb: Convert image to RGB mode (required for models)
175
+ raise_on_error: If True, raise exception on error; if False, return None
176
+
177
+ Returns:
178
+ PIL Image object or None if loading failed
179
+ """
180
+ try:
181
+ # Open from bytes
182
+ img = Image.open(io.BytesIO(image_bytes))
183
+
184
+ # Verify it's actually an image
185
+ img.verify()
186
+
187
+ # Re-open for actual use
188
+ img = Image.open(io.BytesIO(image_bytes))
189
+
190
+ # Convert to RGB if needed
191
+ if convert_to_rgb:
192
+ if img.mode != 'RGB':
193
+ if img.mode in ('RGBA', 'LA', 'P'):
194
+ background = Image.new('RGB', img.size, (255, 255, 255))
195
+ if img.mode == 'P':
196
+ img = img.convert('RGBA')
197
+ if img.mode in ('RGBA', 'LA'):
198
+ background.paste(img, mask=img.split()[-1])
199
+ img = background
200
+ else:
201
+ img = img.convert('RGB')
202
+
203
+ return img
204
+
205
+ except UnidentifiedImageError:
206
+ error_msg = "❌ Cannot identify image format from bytes"
207
+ if raise_on_error:
208
+ raise ValueError(error_msg)
209
+ print(error_msg)
210
+ return None
211
+ except Exception as e:
212
+ error_msg = f"❌ Error loading image from bytes: {str(e)}"
213
+ if raise_on_error:
214
+ raise
215
+ print(error_msg)
216
+ return None
217
+
218
+
219
+ def load_image_from_url(
220
+ url: str,
221
+ timeout: int = 20,
222
+ convert_to_rgb: bool = True,
223
+ raise_on_error: bool = False
224
+ ) -> Optional[Image.Image]:
225
+ """
226
+ Load an image from a URL, supporting all major formats.
227
+
228
+ Args:
229
+ url: URL to the image
230
+ timeout: Request timeout in seconds
231
+ convert_to_rgb: Convert image to RGB mode (required for models)
232
+ raise_on_error: If True, raise exception on error; if False, return None
233
+
234
+ Returns:
235
+ PIL Image object or None if loading failed
236
+ """
237
+ try:
238
+ resp = requests.get(url, timeout=timeout, stream=True)
239
+ resp.raise_for_status()
240
+
241
+ # Check content type
242
+ content_type = resp.headers.get('Content-Type', '').lower()
243
+ if not any(fmt in content_type for fmt in ['image', 'jpeg', 'png', 'webp', 'gif']):
244
+ if raise_on_error:
245
+ raise ValueError(f"URL does not point to an image: {url}")
246
+ print(f"⚠️ URL does not appear to be an image: {url}")
247
+ return None
248
+
249
+ # Load from bytes
250
+ return load_image_from_bytes(resp.content, convert_to_rgb, raise_on_error)
251
+
252
+ except requests.RequestException as e:
253
+ error_msg = f"❌ Error fetching image from URL {url}: {str(e)}"
254
+ if raise_on_error:
255
+ raise
256
+ print(error_msg)
257
+ return None
258
+ except Exception as e:
259
+ error_msg = f"❌ Error loading image from URL {url}: {str(e)}"
260
+ if raise_on_error:
261
+ raise
262
+ print(error_msg)
263
+ return None
264
+
265
+
266
+ def load_images_from_files(
267
+ filepaths: List[Union[str, Path]],
268
+ convert_to_rgb: bool = True,
269
+ skip_errors: bool = True
270
+ ) -> List[Image.Image]:
271
+ """
272
+ Load multiple images from file paths, supporting all major formats.
273
+
274
+ Args:
275
+ filepaths: List of paths to image files
276
+ convert_to_rgb: Convert images to RGB mode (required for models)
277
+ skip_errors: If True, skip files that fail to load; if False, raise on first error
278
+
279
+ Returns:
280
+ List of PIL Image objects (only successfully loaded images)
281
+ """
282
+ images = []
283
+ loaded_count = 0
284
+ failed_count = 0
285
+
286
+ for fp in filepaths:
287
+ img = load_image_from_file(fp, convert_to_rgb, raise_on_error=not skip_errors)
288
+ if img is not None:
289
+ images.append(img)
290
+ loaded_count += 1
291
+ else:
292
+ failed_count += 1
293
+
294
+ if failed_count > 0:
295
+ print(f"⚠️ Loaded {loaded_count} images, {failed_count} failed")
296
+
297
+ return images
298
+
299
+
300
+ def validate_image_format(img: Image.Image) -> Tuple[bool, Optional[str]]:
301
+ """
302
+ Validate that an image is in a supported format and ready for processing.
303
+
304
+ Args:
305
+ img: PIL Image object
306
+
307
+ Returns:
308
+ Tuple of (is_valid, error_message)
309
+ """
310
+ if img is None:
311
+ return False, "Image is None"
312
+
313
+ if not hasattr(img, 'mode'):
314
+ return False, "Invalid image object"
315
+
316
+ # Check if format is supported
317
+ if hasattr(img, 'format') and img.format:
318
+ if img.format not in SUPPORTED_FORMATS:
319
+ return False, f"Unsupported format: {img.format}"
320
+
321
+ # Check if image has valid size
322
+ if img.size[0] == 0 or img.size[1] == 0:
323
+ return False, "Image has zero dimensions"
324
+
325
+ return True, None
326
+
327
+
328
+ def ensure_rgb_image(img: Image.Image) -> Image.Image:
329
+ """
330
+ Ensure an image is in RGB mode, converting if necessary.
331
+
332
+ Args:
333
+ img: PIL Image object
334
+
335
+ Returns:
336
+ RGB mode PIL Image
337
+ """
338
+ if img.mode == 'RGB':
339
+ return img
340
+
341
+ if img.mode in ('RGBA', 'LA', 'P'):
342
+ # Handle transparency
343
+ background = Image.new('RGB', img.size, (255, 255, 255))
344
+ if img.mode == 'P':
345
+ img = img.convert('RGBA')
346
+ if img.mode in ('RGBA', 'LA'):
347
+ if img.mode == 'RGBA':
348
+ background.paste(img, mask=img.split()[-1])
349
+ else:
350
+ background.paste(img, mask=img.split()[-1])
351
+ return background
352
+ else:
353
+ return img.convert('RGB')
354
+
355
+
356
+ def get_supported_formats() -> List[str]:
357
+ """
358
+ Get list of all supported image formats.
359
+
360
+ Returns:
361
+ List of format names
362
+ """
363
+ return sorted(list(SUPPORTED_FORMATS))
364
+
365
+
366
+ def get_supported_extensions() -> List[str]:
367
+ """
368
+ Get list of all supported file extensions.
369
+
370
+ Returns:
371
+ List of file extensions (with dots)
372
+ """
373
+ return sorted(list(EXTENSION_TO_FORMAT.keys()))
374
+