Spaces:
Paused
Paused
Ali Mohsin commited on
Commit ·
45b7274
1
Parent(s): aa9a482
10000 final fixes hopefully
Browse files- API_DOCUMENTATION.md +0 -412
- PRODUCTION_DEPLOYMENT.md +0 -310
- PROJECT_SUMMARY.md +0 -261
- QUICK_START_TRAINING.md +0 -229
- README_HF_SETUP.md +0 -60
- RECOMMENDATION_PIPELINE_EXPLAINED.md +340 -0
- app.py +74 -18
- inference.py +26 -1
- utils/artifact_manager.py +4 -1
- utils/image_utils.py +374 -0
API_DOCUMENTATION.md
DELETED
|
@@ -1,412 +0,0 @@
|
|
| 1 |
-
# Dressify API Documentation
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
|
| 5 |
-
The Dressify API provides personalized outfit recommendations using advanced deep learning models. The API supports an expanded tag system for fine-grained control over recommendations.
|
| 6 |
-
|
| 7 |
-
## Base URL
|
| 8 |
-
|
| 9 |
-
```
|
| 10 |
-
https://your-domain.com/api
|
| 11 |
-
```
|
| 12 |
-
|
| 13 |
-
## Authentication
|
| 14 |
-
|
| 15 |
-
All endpoints (except `/health` and `/tags`) require an API key in the `X-API-Key` header:
|
| 16 |
-
|
| 17 |
-
```http
|
| 18 |
-
X-API-Key: your-api-key-here
|
| 19 |
-
```
|
| 20 |
-
|
| 21 |
-
## Endpoints
|
| 22 |
-
|
| 23 |
-
### 1. Health Check
|
| 24 |
-
|
| 25 |
-
**GET** `/health`
|
| 26 |
-
|
| 27 |
-
Check API health and model status.
|
| 28 |
-
|
| 29 |
-
**Response:**
|
| 30 |
-
```json
|
| 31 |
-
{
|
| 32 |
-
"status": "ok",
|
| 33 |
-
"device": "cuda",
|
| 34 |
-
"resnet": "resnet_v1",
|
| 35 |
-
"vit": "vit_v1"
|
| 36 |
-
}
|
| 37 |
-
```
|
| 38 |
-
|
| 39 |
-
---
|
| 40 |
-
|
| 41 |
-
### 2. Get Available Tags
|
| 42 |
-
|
| 43 |
-
**GET** `/tags`
|
| 44 |
-
|
| 45 |
-
Get all available tag options for API integration.
|
| 46 |
-
|
| 47 |
-
**Response:**
|
| 48 |
-
```json
|
| 49 |
-
{
|
| 50 |
-
"tag_categories": {
|
| 51 |
-
"occasion": ["casual", "business", "formal", ...],
|
| 52 |
-
"weather": ["any", "hot", "warm", "cold", ...],
|
| 53 |
-
"style": ["casual", "smart_casual", "formal", ...],
|
| 54 |
-
"color_preference": ["neutral", "monochromatic", ...],
|
| 55 |
-
...
|
| 56 |
-
},
|
| 57 |
-
"description": "Available tags for personalized outfit recommendations",
|
| 58 |
-
"usage": {
|
| 59 |
-
"primary_tags": ["occasion", "weather", "style"],
|
| 60 |
-
"optional_tags": ["color_preference", "fit_preference", ...]
|
| 61 |
-
}
|
| 62 |
-
}
|
| 63 |
-
```
|
| 64 |
-
|
| 65 |
-
---
|
| 66 |
-
|
| 67 |
-
### 3. Validate Tags
|
| 68 |
-
|
| 69 |
-
**POST** `/tags/validate`
|
| 70 |
-
|
| 71 |
-
Validate tag values before making a recommendation request.
|
| 72 |
-
|
| 73 |
-
**Request Body:**
|
| 74 |
-
```json
|
| 75 |
-
{
|
| 76 |
-
"occasion": "formal",
|
| 77 |
-
"weather": "cold",
|
| 78 |
-
"style": "elegant",
|
| 79 |
-
"color_preference": "monochromatic"
|
| 80 |
-
}
|
| 81 |
-
```
|
| 82 |
-
|
| 83 |
-
**Response:**
|
| 84 |
-
```json
|
| 85 |
-
{
|
| 86 |
-
"valid": true,
|
| 87 |
-
"errors": [],
|
| 88 |
-
"validated_tags": {
|
| 89 |
-
"occasion": "formal",
|
| 90 |
-
"weather": "cold",
|
| 91 |
-
"style": "elegant",
|
| 92 |
-
"color_preference": "monochromatic"
|
| 93 |
-
}
|
| 94 |
-
}
|
| 95 |
-
```
|
| 96 |
-
|
| 97 |
-
---
|
| 98 |
-
|
| 99 |
-
### 4. Generate Embeddings
|
| 100 |
-
|
| 101 |
-
**POST** `/embed`
|
| 102 |
-
|
| 103 |
-
Generate embeddings for clothing item images.
|
| 104 |
-
|
| 105 |
-
**Request Body:**
|
| 106 |
-
```json
|
| 107 |
-
{
|
| 108 |
-
"image_urls": ["https://example.com/image1.jpg"],
|
| 109 |
-
"images_base64": []
|
| 110 |
-
}
|
| 111 |
-
```
|
| 112 |
-
|
| 113 |
-
**Response:**
|
| 114 |
-
```json
|
| 115 |
-
{
|
| 116 |
-
"embeddings": [[0.123, 0.456, ...]],
|
| 117 |
-
"model_version": "resnet_v1"
|
| 118 |
-
}
|
| 119 |
-
```
|
| 120 |
-
|
| 121 |
-
---
|
| 122 |
-
|
| 123 |
-
### 5. Compose Outfits (Enhanced with Tags)
|
| 124 |
-
|
| 125 |
-
**POST** `/compose`
|
| 126 |
-
|
| 127 |
-
Generate personalized outfit recommendations with expanded tag support.
|
| 128 |
-
|
| 129 |
-
#### Request Format 1: Tag-Based (Recommended)
|
| 130 |
-
|
| 131 |
-
**Request Body:**
|
| 132 |
-
```json
|
| 133 |
-
{
|
| 134 |
-
"items": [
|
| 135 |
-
{
|
| 136 |
-
"id": "item_1",
|
| 137 |
-
"image_url": "https://example.com/shirt.jpg",
|
| 138 |
-
"category": "shirt",
|
| 139 |
-
"embedding": null
|
| 140 |
-
},
|
| 141 |
-
{
|
| 142 |
-
"id": "item_2",
|
| 143 |
-
"image_url": "https://example.com/pants.jpg",
|
| 144 |
-
"category": "pants",
|
| 145 |
-
"embedding": null
|
| 146 |
-
}
|
| 147 |
-
],
|
| 148 |
-
"occasion": "formal",
|
| 149 |
-
"weather": "cold",
|
| 150 |
-
"style": "elegant",
|
| 151 |
-
"num_outfits": 5,
|
| 152 |
-
"color_preference": "monochromatic",
|
| 153 |
-
"fit_preference": "tailored",
|
| 154 |
-
"material_preference": "wool",
|
| 155 |
-
"season": "winter",
|
| 156 |
-
"time_of_day": "evening",
|
| 157 |
-
"personal_style": "sophisticated"
|
| 158 |
-
}
|
| 159 |
-
```
|
| 160 |
-
|
| 161 |
-
#### Request Format 2: Context Dict (Legacy)
|
| 162 |
-
|
| 163 |
-
**Request Body:**
|
| 164 |
-
```json
|
| 165 |
-
{
|
| 166 |
-
"items": [...],
|
| 167 |
-
"context": {
|
| 168 |
-
"occasion": "formal",
|
| 169 |
-
"weather": "cold",
|
| 170 |
-
"style": "elegant",
|
| 171 |
-
"num_outfits": 5
|
| 172 |
-
}
|
| 173 |
-
}
|
| 174 |
-
```
|
| 175 |
-
|
| 176 |
-
#### Response:
|
| 177 |
-
```json
|
| 178 |
-
{
|
| 179 |
-
"outfits": [
|
| 180 |
-
{
|
| 181 |
-
"item_ids": ["item_1", "item_2", "item_3"],
|
| 182 |
-
"items": [
|
| 183 |
-
{
|
| 184 |
-
"id": "item_1",
|
| 185 |
-
"category": "jacket",
|
| 186 |
-
"category_type": "outerwear"
|
| 187 |
-
},
|
| 188 |
-
{
|
| 189 |
-
"id": "item_2",
|
| 190 |
-
"category": "shirt",
|
| 191 |
-
"category_type": "upper"
|
| 192 |
-
},
|
| 193 |
-
{
|
| 194 |
-
"id": "item_3",
|
| 195 |
-
"category": "pants",
|
| 196 |
-
"category_type": "bottom"
|
| 197 |
-
}
|
| 198 |
-
],
|
| 199 |
-
"score": 1.85,
|
| 200 |
-
"base_score": 0.25,
|
| 201 |
-
"categories": ["jacket", "shirt", "pants"],
|
| 202 |
-
"category_types": ["outerwear", "upper", "bottom"],
|
| 203 |
-
"outfit_size": 3,
|
| 204 |
-
"is_valid": true,
|
| 205 |
-
"template": {
|
| 206 |
-
"name": "formal",
|
| 207 |
-
"style": "professional, elegant, sophisticated",
|
| 208 |
-
"style_score": 0.95,
|
| 209 |
-
"color_score": 0.88,
|
| 210 |
-
"colors": ["navy", "white", "gray"],
|
| 211 |
-
"accessory_limit": 4
|
| 212 |
-
}
|
| 213 |
-
}
|
| 214 |
-
],
|
| 215 |
-
"version": "vit_v1",
|
| 216 |
-
"tags_processed": true,
|
| 217 |
-
"context_used": {
|
| 218 |
-
"occasion": "formal",
|
| 219 |
-
"weather": "cold",
|
| 220 |
-
"style": "elegant",
|
| 221 |
-
...
|
| 222 |
-
}
|
| 223 |
-
}
|
| 224 |
-
```
|
| 225 |
-
|
| 226 |
-
---
|
| 227 |
-
|
| 228 |
-
## Tag System
|
| 229 |
-
|
| 230 |
-
### Primary Tags (High Priority)
|
| 231 |
-
|
| 232 |
-
These tags have the highest influence on recommendations:
|
| 233 |
-
|
| 234 |
-
- **occasion**: Event type (casual, business, formal, wedding, date, etc.)
|
| 235 |
-
- **weather**: Weather conditions (any, hot, warm, cold, rain, snow, etc.)
|
| 236 |
-
- **style**: Fashion aesthetic (casual, smart_casual, formal, elegant, etc.)
|
| 237 |
-
|
| 238 |
-
### Secondary Tags (Medium Priority)
|
| 239 |
-
|
| 240 |
-
These tags refine recommendations:
|
| 241 |
-
|
| 242 |
-
- **color_preference**: Color scheme (neutral, monochromatic, bold, etc.)
|
| 243 |
-
- **fit_preference**: Fit type (fitted, loose, tailored, etc.)
|
| 244 |
-
- **material_preference**: Fabric type (cotton, wool, silk, etc.)
|
| 245 |
-
- **personal_style**: Personal style (conservative, bold, timeless, etc.)
|
| 246 |
-
|
| 247 |
-
### Tertiary Tags (Lower Priority)
|
| 248 |
-
|
| 249 |
-
These provide additional context:
|
| 250 |
-
|
| 251 |
-
- **season**: Current season (spring, summer, fall, winter)
|
| 252 |
-
- **time_of_day**: When outfit will be worn (morning, afternoon, evening, night)
|
| 253 |
-
- **budget**: Price range preference (luxury, premium, affordable, budget)
|
| 254 |
-
- **age_group**: Age group (teen, young_adult, adult, mature)
|
| 255 |
-
- **gender**: Gender preference (male, female, non_binary, unisex)
|
| 256 |
-
|
| 257 |
-
---
|
| 258 |
-
|
| 259 |
-
## Tag Processing
|
| 260 |
-
|
| 261 |
-
The API automatically:
|
| 262 |
-
|
| 263 |
-
1. **Validates** all tag values
|
| 264 |
-
2. **Resolves conflicts** between conflicting tags
|
| 265 |
-
3. **Applies synergies** between complementary tags
|
| 266 |
-
4. **Prioritizes** tags based on importance
|
| 267 |
-
5. **Generates preferences** for the recommendation engine
|
| 268 |
-
|
| 269 |
-
### Tag Conflicts
|
| 270 |
-
|
| 271 |
-
Some tags conflict and cannot be used together:
|
| 272 |
-
- `hot` conflicts with `cold`, `freezing`, `snow`
|
| 273 |
-
- `formal` conflicts with `casual`, `sporty`
|
| 274 |
-
- `loose` conflicts with `fitted`, `tight`
|
| 275 |
-
|
| 276 |
-
### Tag Synergies
|
| 277 |
-
|
| 278 |
-
Some tags work well together:
|
| 279 |
-
- `formal` + `elegant` + `sophisticated` + `tailored`
|
| 280 |
-
- `casual` + `comfortable` + `relaxed` + `practical`
|
| 281 |
-
- `sporty` + `athletic` + `comfortable` + `moisture_wicking`
|
| 282 |
-
|
| 283 |
-
---
|
| 284 |
-
|
| 285 |
-
## Example Usage
|
| 286 |
-
|
| 287 |
-
### Python Example
|
| 288 |
-
|
| 289 |
-
```python
|
| 290 |
-
import requests
|
| 291 |
-
|
| 292 |
-
API_KEY = "your-api-key"
|
| 293 |
-
BASE_URL = "https://your-domain.com/api"
|
| 294 |
-
|
| 295 |
-
# Prepare items
|
| 296 |
-
items = [
|
| 297 |
-
{
|
| 298 |
-
"id": "shirt_1",
|
| 299 |
-
"image_url": "https://example.com/shirt.jpg",
|
| 300 |
-
"category": "shirt"
|
| 301 |
-
},
|
| 302 |
-
{
|
| 303 |
-
"id": "pants_1",
|
| 304 |
-
"image_url": "https://example.com/pants.jpg",
|
| 305 |
-
"category": "pants"
|
| 306 |
-
}
|
| 307 |
-
]
|
| 308 |
-
|
| 309 |
-
# Make recommendation request
|
| 310 |
-
response = requests.post(
|
| 311 |
-
f"{BASE_URL}/compose",
|
| 312 |
-
json={
|
| 313 |
-
"items": items,
|
| 314 |
-
"occasion": "formal",
|
| 315 |
-
"weather": "cold",
|
| 316 |
-
"style": "elegant",
|
| 317 |
-
"num_outfits": 5,
|
| 318 |
-
"color_preference": "monochromatic",
|
| 319 |
-
"fit_preference": "tailored",
|
| 320 |
-
"material_preference": "wool"
|
| 321 |
-
},
|
| 322 |
-
headers={"X-API-Key": API_KEY}
|
| 323 |
-
)
|
| 324 |
-
|
| 325 |
-
result = response.json()
|
| 326 |
-
outfits = result["outfits"]
|
| 327 |
-
```
|
| 328 |
-
|
| 329 |
-
### JavaScript Example
|
| 330 |
-
|
| 331 |
-
```javascript
|
| 332 |
-
const API_KEY = 'your-api-key';
|
| 333 |
-
const BASE_URL = 'https://your-domain.com/api';
|
| 334 |
-
|
| 335 |
-
const items = [
|
| 336 |
-
{
|
| 337 |
-
id: 'shirt_1',
|
| 338 |
-
image_url: 'https://example.com/shirt.jpg',
|
| 339 |
-
category: 'shirt'
|
| 340 |
-
},
|
| 341 |
-
{
|
| 342 |
-
id: 'pants_1',
|
| 343 |
-
image_url: 'https://example.com/pants.jpg',
|
| 344 |
-
category: 'pants'
|
| 345 |
-
}
|
| 346 |
-
];
|
| 347 |
-
|
| 348 |
-
fetch(`${BASE_URL}/compose`, {
|
| 349 |
-
method: 'POST',
|
| 350 |
-
headers: {
|
| 351 |
-
'Content-Type': 'application/json',
|
| 352 |
-
'X-API-Key': API_KEY
|
| 353 |
-
},
|
| 354 |
-
body: JSON.stringify({
|
| 355 |
-
items: items,
|
| 356 |
-
occasion: 'formal',
|
| 357 |
-
weather: 'cold',
|
| 358 |
-
style: 'elegant',
|
| 359 |
-
num_outfits: 5,
|
| 360 |
-
color_preference: 'monochromatic',
|
| 361 |
-
fit_preference: 'tailored',
|
| 362 |
-
material_preference: 'wool'
|
| 363 |
-
})
|
| 364 |
-
})
|
| 365 |
-
.then(response => response.json())
|
| 366 |
-
.then(data => {
|
| 367 |
-
console.log('Outfits:', data.outfits);
|
| 368 |
-
});
|
| 369 |
-
```
|
| 370 |
-
|
| 371 |
-
---
|
| 372 |
-
|
| 373 |
-
## Error Handling
|
| 374 |
-
|
| 375 |
-
### Invalid Tags
|
| 376 |
-
|
| 377 |
-
```json
|
| 378 |
-
{
|
| 379 |
-
"error": "Invalid tags provided",
|
| 380 |
-
"errors": [
|
| 381 |
-
"Invalid value 'invalid_occasion' for category 'occasion'"
|
| 382 |
-
],
|
| 383 |
-
"valid_tag_options": {
|
| 384 |
-
"occasion": ["casual", "business", "formal", ...]
|
| 385 |
-
}
|
| 386 |
-
}
|
| 387 |
-
```
|
| 388 |
-
|
| 389 |
-
### Models Not Loaded
|
| 390 |
-
|
| 391 |
-
```json
|
| 392 |
-
{
|
| 393 |
-
"error": "Models not trained or loaded properly",
|
| 394 |
-
"details": ["ResNet: No trained weights found"],
|
| 395 |
-
"message": "Please ensure models are trained..."
|
| 396 |
-
}
|
| 397 |
-
```
|
| 398 |
-
|
| 399 |
-
---
|
| 400 |
-
|
| 401 |
-
## Rate Limits
|
| 402 |
-
|
| 403 |
-
- Default: 100 requests per minute
|
| 404 |
-
- Burst: 10 requests per second
|
| 405 |
-
|
| 406 |
-
---
|
| 407 |
-
|
| 408 |
-
## Support
|
| 409 |
-
|
| 410 |
-
For API support, please contact: support@dressify.com
|
| 411 |
-
|
| 412 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PRODUCTION_DEPLOYMENT.md
DELETED
|
@@ -1,310 +0,0 @@
|
|
| 1 |
-
# 🚀 Production Deployment Guide for Dressify
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
|
| 5 |
-
This guide explains how to deploy Dressify as a production-ready outfit recommendation service using the official Polyvore dataset splits.
|
| 6 |
-
|
| 7 |
-
## 🎯 Key Changes Made
|
| 8 |
-
|
| 9 |
-
### 1. **Official Split Usage** ✅
|
| 10 |
-
- **Before**: System tried to create random 70/15/15 splits
|
| 11 |
-
- **After**: System uses official splits from `nondisjoint/` and `disjoint/` folders
|
| 12 |
-
- **Benefit**: Reproducible, research-grade results
|
| 13 |
-
|
| 14 |
-
### 2. **Robust Dataset Detection** 🔍
|
| 15 |
-
- Automatically detects official splits in multiple locations
|
| 16 |
-
- Falls back to metadata extraction if needed
|
| 17 |
-
- No more random split creation by default
|
| 18 |
-
|
| 19 |
-
### 3. **Production-Ready Startup** 🚀
|
| 20 |
-
- Comprehensive error handling and diagnostics
|
| 21 |
-
- Clear status reporting
|
| 22 |
-
- Automatic dataset verification
|
| 23 |
-
|
| 24 |
-
## 📁 Dataset Structure
|
| 25 |
-
|
| 26 |
-
The system expects this structure after download:
|
| 27 |
-
|
| 28 |
-
```
|
| 29 |
-
data/Polyvore/
|
| 30 |
-
├── images/ # Extracted from images.zip
|
| 31 |
-
├── nondisjoint/ # Official splits (preferred)
|
| 32 |
-
│ ├── train.json # 31.8 MB - Training outfits
|
| 33 |
-
│ ├── valid.json # 2.99 MB - Validation outfits
|
| 34 |
-
│ └── test.json # 5.97 MB - Test outfits
|
| 35 |
-
├── disjoint/ # Alternative official splits
|
| 36 |
-
│ ├── train.json # 9.65 MB - Training outfits
|
| 37 |
-
│ ├── valid.json # 1.72 MB - Validation outfits
|
| 38 |
-
│ └── test.json # 8.36 MB - Test outfits
|
| 39 |
-
├── polyvore_item_metadata.json # 105 MB - Item metadata
|
| 40 |
-
├── polyvore_outfit_titles.json # 6.97 MB - Outfit information
|
| 41 |
-
└── categories.csv # 4.91 KB - Category mappings
|
| 42 |
-
```
|
| 43 |
-
|
| 44 |
-
## 🚀 Deployment Steps
|
| 45 |
-
|
| 46 |
-
### Step 1: Initial Setup
|
| 47 |
-
```bash
|
| 48 |
-
# Clone the repository
|
| 49 |
-
git clone <your-repo>
|
| 50 |
-
cd recomendation
|
| 51 |
-
|
| 52 |
-
# Install dependencies
|
| 53 |
-
pip install -r requirements.txt
|
| 54 |
-
```
|
| 55 |
-
|
| 56 |
-
### Step 2: Dataset Preparation
|
| 57 |
-
```bash
|
| 58 |
-
# Run the startup fix script
|
| 59 |
-
python startup_fix.py
|
| 60 |
-
```
|
| 61 |
-
|
| 62 |
-
This script will:
|
| 63 |
-
1. ✅ Download the Polyvore dataset from Hugging Face
|
| 64 |
-
2. ✅ Extract images from images.zip
|
| 65 |
-
3. ✅ Detect official splits in nondisjoint/ and disjoint/
|
| 66 |
-
4. ✅ Create training splits from official data
|
| 67 |
-
5. ✅ Verify all components are ready
|
| 68 |
-
|
| 69 |
-
### Step 3: Verify Dataset
|
| 70 |
-
```bash
|
| 71 |
-
# Check dataset status
|
| 72 |
-
python -c "
|
| 73 |
-
from utils.data_fetch import check_dataset_structure
|
| 74 |
-
import json
|
| 75 |
-
structure = check_dataset_structure('data/Polyvore')
|
| 76 |
-
print(json.dumps(structure, indent=2))
|
| 77 |
-
"
|
| 78 |
-
```
|
| 79 |
-
|
| 80 |
-
Expected output:
|
| 81 |
-
```json
|
| 82 |
-
{
|
| 83 |
-
"status": "ready",
|
| 84 |
-
"images": {
|
| 85 |
-
"exists": true,
|
| 86 |
-
"count": 100000,
|
| 87 |
-
"extensions": [".jpg", ".jpeg", ".png"]
|
| 88 |
-
},
|
| 89 |
-
"splits": {
|
| 90 |
-
"nondisjoint": {
|
| 91 |
-
"train.json": {"exists": true, "size_mb": 31.8},
|
| 92 |
-
"valid.json": {"exists": true, "size_mb": 2.99},
|
| 93 |
-
"test.json": {"exists": true, "size_mb": 5.97}
|
| 94 |
-
}
|
| 95 |
-
}
|
| 96 |
-
}
|
| 97 |
-
```
|
| 98 |
-
|
| 99 |
-
### Step 4: Launch Application
|
| 100 |
-
```bash
|
| 101 |
-
# Start the main application
|
| 102 |
-
python app.py
|
| 103 |
-
```
|
| 104 |
-
|
| 105 |
-
The system will:
|
| 106 |
-
1. 🔍 Check dataset status
|
| 107 |
-
2. ✅ Load official splits
|
| 108 |
-
3. 🚀 Launch Gradio interface
|
| 109 |
-
4. 🎯 Be ready for training and inference
|
| 110 |
-
|
| 111 |
-
## 🔧 Troubleshooting
|
| 112 |
-
|
| 113 |
-
### Issue: "No official splits found"
|
| 114 |
-
|
| 115 |
-
**Cause**: The dataset download didn't include the split files.
|
| 116 |
-
|
| 117 |
-
**Solution**:
|
| 118 |
-
```bash
|
| 119 |
-
# Check what was downloaded
|
| 120 |
-
ls -la data/Polyvore/
|
| 121 |
-
|
| 122 |
-
# Re-run data fetcher
|
| 123 |
-
python -c "
|
| 124 |
-
from utils.data_fetch import ensure_dataset_ready
|
| 125 |
-
ensure_dataset_ready()
|
| 126 |
-
"
|
| 127 |
-
```
|
| 128 |
-
|
| 129 |
-
### Issue: "Dataset preparation failed"
|
| 130 |
-
|
| 131 |
-
**Cause**: The prepare script couldn't parse the official splits.
|
| 132 |
-
|
| 133 |
-
**Solution**:
|
| 134 |
-
```bash
|
| 135 |
-
# Check split file format
|
| 136 |
-
head -20 data/Polyvore/nondisjoint/train.json
|
| 137 |
-
|
| 138 |
-
# Run preparation manually
|
| 139 |
-
python scripts/prepare_polyvore.py --root data/Polyvore
|
| 140 |
-
```
|
| 141 |
-
|
| 142 |
-
### Issue: "Out of memory during training"
|
| 143 |
-
|
| 144 |
-
**Cause**: GPU memory insufficient for default batch sizes.
|
| 145 |
-
|
| 146 |
-
**Solution**: Use the Advanced Training interface to reduce batch sizes:
|
| 147 |
-
- ResNet: Reduce from 64 to 16-32
|
| 148 |
-
- ViT: Reduce from 32 to 8-16
|
| 149 |
-
- Enable mixed precision (AMP)
|
| 150 |
-
|
| 151 |
-
## 🎯 Production Configuration
|
| 152 |
-
|
| 153 |
-
### Environment Variables
|
| 154 |
-
```bash
|
| 155 |
-
export EXPORT_DIR="models/exports"
|
| 156 |
-
export POLYVORE_ROOT="data/Polyvore"
|
| 157 |
-
export CUDA_VISIBLE_DEVICES="0" # Specify GPU
|
| 158 |
-
```
|
| 159 |
-
|
| 160 |
-
### Docker Deployment
|
| 161 |
-
```bash
|
| 162 |
-
# Build image
|
| 163 |
-
docker build -t dressify .
|
| 164 |
-
|
| 165 |
-
# Run container
|
| 166 |
-
docker run -p 7860:7860 -p 8000:8000 \
|
| 167 |
-
-v $(pwd)/data:/app/data \
|
| 168 |
-
-v $(pwd)/models:/app/models \
|
| 169 |
-
dressify
|
| 170 |
-
```
|
| 171 |
-
|
| 172 |
-
### Hugging Face Space
|
| 173 |
-
1. Upload the entire `recomendation/` folder
|
| 174 |
-
2. Set Space type to "Gradio"
|
| 175 |
-
3. The system auto-bootstraps on first run
|
| 176 |
-
4. Uses official splits for production-quality results
|
| 177 |
-
|
| 178 |
-
## 📊 Expected Performance
|
| 179 |
-
|
| 180 |
-
### Dataset Statistics
|
| 181 |
-
- **Total Images**: ~100,000 fashion items
|
| 182 |
-
- **Training Outfits**: ~50,000 (nondisjoint) or ~20,000 (disjoint)
|
| 183 |
-
- **Validation Outfits**: ~5,000 (nondisjoint) or ~2,000 (disjoint)
|
| 184 |
-
- **Test Outfits**: ~10,000 (nondisjoint) or ~4,000 (disjoint)
|
| 185 |
-
|
| 186 |
-
### Training Times (L4 GPU)
|
| 187 |
-
- **ResNet Item Embedder**: 2-4 hours (20 epochs)
|
| 188 |
-
- **ViT Outfit Encoder**: 1-2 hours (30 epochs)
|
| 189 |
-
- **Total**: 3-6 hours for full training
|
| 190 |
-
|
| 191 |
-
### Inference Performance
|
| 192 |
-
- **Item Embedding**: < 50ms per image
|
| 193 |
-
- **Outfit Generation**: < 100ms per outfit
|
| 194 |
-
- **Memory Usage**: ~2-4 GB GPU VRAM
|
| 195 |
-
|
| 196 |
-
## 🔬 Research vs Production
|
| 197 |
-
|
| 198 |
-
### Research Mode
|
| 199 |
-
```bash
|
| 200 |
-
# Use disjoint splits (smaller, more challenging)
|
| 201 |
-
python scripts/prepare_polyvore.py --root data/Polyvore
|
| 202 |
-
# Automatically uses disjoint/ splits
|
| 203 |
-
```
|
| 204 |
-
|
| 205 |
-
### Production Mode
|
| 206 |
-
```bash
|
| 207 |
-
# Use nondisjoint splits (larger, more robust)
|
| 208 |
-
python scripts/prepare_polyvore.py --root data/Polyvore
|
| 209 |
-
# Automatically uses nondisjoint/ splits (default)
|
| 210 |
-
```
|
| 211 |
-
|
| 212 |
-
## 📝 Monitoring & Logging
|
| 213 |
-
|
| 214 |
-
### Training Logs
|
| 215 |
-
```bash
|
| 216 |
-
# Check training progress
|
| 217 |
-
tail -f models/exports/training.log
|
| 218 |
-
|
| 219 |
-
# Monitor GPU usage
|
| 220 |
-
nvidia-smi -l 1
|
| 221 |
-
```
|
| 222 |
-
|
| 223 |
-
### System Health
|
| 224 |
-
```bash
|
| 225 |
-
# Health check endpoint
|
| 226 |
-
curl http://localhost:8000/health
|
| 227 |
-
|
| 228 |
-
# Expected response
|
| 229 |
-
{
|
| 230 |
-
"status": "ok",
|
| 231 |
-
"device": "cuda:0",
|
| 232 |
-
"resnet": "resnet50_v2",
|
| 233 |
-
"vit": "vit_outfit_v1"
|
| 234 |
-
}
|
| 235 |
-
```
|
| 236 |
-
|
| 237 |
-
## 🚨 Emergency Procedures
|
| 238 |
-
|
| 239 |
-
### Dataset Corruption
|
| 240 |
-
```bash
|
| 241 |
-
# Remove corrupted data
|
| 242 |
-
rm -rf data/Polyvore/splits/
|
| 243 |
-
|
| 244 |
-
# Re-run preparation
|
| 245 |
-
python startup_fix.py
|
| 246 |
-
```
|
| 247 |
-
|
| 248 |
-
### Model Issues
|
| 249 |
-
```bash
|
| 250 |
-
# Remove corrupted models
|
| 251 |
-
rm -rf models/exports/*.pth
|
| 252 |
-
|
| 253 |
-
# Re-train from scratch
|
| 254 |
-
python train_resnet.py --data_root data/Polyvore --epochs 20
|
| 255 |
-
python train_vit_triplet.py --data_root data/Polyvore --epochs 30
|
| 256 |
-
```
|
| 257 |
-
|
| 258 |
-
### System Recovery
|
| 259 |
-
```bash
|
| 260 |
-
# Full system reset
|
| 261 |
-
rm -rf data/Polyvore/
|
| 262 |
-
rm -rf models/exports/
|
| 263 |
-
|
| 264 |
-
# Fresh start
|
| 265 |
-
python startup_fix.py
|
| 266 |
-
```
|
| 267 |
-
|
| 268 |
-
## ✅ Production Checklist
|
| 269 |
-
|
| 270 |
-
- [ ] Dataset downloaded successfully (2.5GB+ images)
|
| 271 |
-
- [ ] Official splits detected in nondisjoint/ or disjoint/
|
| 272 |
-
- [ ] Training splits created in data/Polyvore/splits/
|
| 273 |
-
- [ ] Models can be trained without errors
|
| 274 |
-
- [ ] Inference service responds to health checks
|
| 275 |
-
- [ ] Gradio interface loads successfully
|
| 276 |
-
- [ ] Advanced training controls work
|
| 277 |
-
- [ ] Model checkpoints can be saved/loaded
|
| 278 |
-
|
| 279 |
-
## 🎉 Success Indicators
|
| 280 |
-
|
| 281 |
-
When everything is working correctly, you should see:
|
| 282 |
-
|
| 283 |
-
```
|
| 284 |
-
✅ Dataset ready at: data/Polyvore
|
| 285 |
-
📊 Images: 100000 files
|
| 286 |
-
📋 polyvore_item_metadata.json: 105.0 MB
|
| 287 |
-
📋 polyvore_outfit_titles.json: 6.97 MB
|
| 288 |
-
🎯 Official splits found:
|
| 289 |
-
✅ nondisjoint/train.json (31.8 MB)
|
| 290 |
-
✅ nondisjoint/valid.json (2.99 MB)
|
| 291 |
-
✅ nondisjoint/test.json (5.97 MB)
|
| 292 |
-
🎉 Using official splits from dataset!
|
| 293 |
-
✅ Dataset preparation completed successfully!
|
| 294 |
-
✅ All required splits verified!
|
| 295 |
-
🚀 Your Dressify system is ready to use!
|
| 296 |
-
```
|
| 297 |
-
|
| 298 |
-
## 📞 Support
|
| 299 |
-
|
| 300 |
-
If you encounter issues:
|
| 301 |
-
|
| 302 |
-
1. **Check the logs** for specific error messages
|
| 303 |
-
2. **Verify dataset structure** matches expected layout
|
| 304 |
-
3. **Run startup_fix.py** for automated diagnostics
|
| 305 |
-
4. **Check GPU memory** and reduce batch sizes if needed
|
| 306 |
-
5. **Ensure official splits** are present in nondisjoint/ or disjoint/
|
| 307 |
-
|
| 308 |
-
---
|
| 309 |
-
|
| 310 |
-
**🎯 Your Dressify system is now production-ready with official dataset splits!**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PROJECT_SUMMARY.md
DELETED
|
@@ -1,261 +0,0 @@
|
|
| 1 |
-
# Dressify - Complete Project Summary
|
| 2 |
-
|
| 3 |
-
## 🎯 Project Overview
|
| 4 |
-
|
| 5 |
-
**Dressify** is a **production-ready, research-grade** outfit recommendation system that automatically downloads the Polyvore dataset, trains state-of-the-art models, and provides a sophisticated Gradio interface for wardrobe uploads and outfit generation.
|
| 6 |
-
|
| 7 |
-
## 🏗️ System Architecture
|
| 8 |
-
|
| 9 |
-
### Core Components
|
| 10 |
-
|
| 11 |
-
1. **Data Pipeline** (`utils/data_fetch.py`)
|
| 12 |
-
- Automatic download of Stylique/Polyvore dataset from HF Hub
|
| 13 |
-
- Smart image extraction and organization
|
| 14 |
-
- Robust split detection (root, nondisjoint, disjoint)
|
| 15 |
-
- Fallback to deterministic 70/15/15 splits if official splits missing
|
| 16 |
-
|
| 17 |
-
2. **Model Architecture**
|
| 18 |
-
- **ResNet Item Embedder** (`models/resnet_embedder.py`)
|
| 19 |
-
- ImageNet-pretrained ResNet50 backbone
|
| 20 |
-
- 512D projection head with L2 normalization
|
| 21 |
-
- Triplet loss training for item compatibility
|
| 22 |
-
|
| 23 |
-
- **ViT Outfit Encoder** (`models/vit_outfit.py`)
|
| 24 |
-
- 6-layer transformer encoder
|
| 25 |
-
- 8 attention heads, 4x feed-forward multiplier
|
| 26 |
-
- Outfit-level compatibility scoring
|
| 27 |
-
- Cosine distance triplet loss
|
| 28 |
-
|
| 29 |
-
3. **Training Pipeline**
|
| 30 |
-
- **ResNet Training** (`train_resnet.py`)
|
| 31 |
-
- Semi-hard negative mining
|
| 32 |
-
- Mixed precision training with autocast
|
| 33 |
-
- Channels-last memory format for CUDA
|
| 34 |
-
- Automatic checkpointing and best model saving
|
| 35 |
-
|
| 36 |
-
- **ViT Training** (`train_vit_triplet.py`)
|
| 37 |
-
- Frozen ResNet embeddings as input
|
| 38 |
-
- Outfit-level triplet mining
|
| 39 |
-
- Validation with early stopping
|
| 40 |
-
- Comprehensive metrics logging
|
| 41 |
-
|
| 42 |
-
4. **Inference Service** (`inference.py`)
|
| 43 |
-
- On-the-fly image embedding
|
| 44 |
-
- Slot-aware outfit composition
|
| 45 |
-
- Candidate generation with category constraints
|
| 46 |
-
- Compatibility scoring and ranking
|
| 47 |
-
|
| 48 |
-
5. **Web Interface** (`app.py`)
|
| 49 |
-
- **Gradio UI**: Wardrobe upload, outfit generation, preview stitching
|
| 50 |
-
- **FastAPI**: REST endpoints for embedding and composition
|
| 51 |
-
- **Auto-bootstrap**: Background dataset prep and training
|
| 52 |
-
- **Status Dashboard**: Real-time progress monitoring
|
| 53 |
-
|
| 54 |
-
## 🚀 Key Features
|
| 55 |
-
|
| 56 |
-
### Research-Grade Training
|
| 57 |
-
- **Triplet Loss**: Semi-hard negative mining for better embeddings
|
| 58 |
-
- **Mixed Precision**: CUDA-optimized training with autocast
|
| 59 |
-
- **Advanced Augmentation**: Random crop, flip, color jitter, random erasing
|
| 60 |
-
- **Curriculum Learning**: Progressive difficulty increase (configurable)
|
| 61 |
-
|
| 62 |
-
### Production-Ready Infrastructure
|
| 63 |
-
- **Self-Contained**: No external dependencies or environment variables
|
| 64 |
-
- **Auto-Recovery**: Handles missing splits, corrupted data gracefully
|
| 65 |
-
- **Background Processing**: Non-blocking dataset preparation and training
|
| 66 |
-
- **Model Versioning**: Automatic checkpoint management and best model saving
|
| 67 |
-
|
| 68 |
-
### Advanced UI/UX
|
| 69 |
-
- **Multi-File Upload**: Drag & drop wardrobe images with previews
|
| 70 |
-
- **Category Editing**: Manual category assignment for better slot awareness
|
| 71 |
-
- **Context Awareness**: Occasion, weather, style preferences
|
| 72 |
-
- **Visual Output**: Stitched outfit previews + structured JSON data
|
| 73 |
-
|
| 74 |
-
## 📊 Expected Performance
|
| 75 |
-
|
| 76 |
-
### Training Metrics
|
| 77 |
-
- **Item Embedder**: Triplet accuracy > 85%, validation loss < 0.1
|
| 78 |
-
- **Outfit Encoder**: Compatibility AUC > 0.8, precision > 0.75
|
| 79 |
-
- **Training Time**: ResNet ~2-4h, ViT ~1-2h on L4 GPU
|
| 80 |
-
|
| 81 |
-
### Inference Performance
|
| 82 |
-
- **Latency**: < 100ms per outfit on GPU, < 500ms on CPU
|
| 83 |
-
- **Throughput**: 100+ outfits/second on modern GPU
|
| 84 |
-
- **Memory**: ~2GB VRAM for full models, ~500MB for lightweight variants
|
| 85 |
-
|
| 86 |
-
## 🔧 Configuration & Customization
|
| 87 |
-
|
| 88 |
-
### Training Configs
|
| 89 |
-
- **Item Training** (`configs/item.yaml`): Backbone, embedding dim, loss params
|
| 90 |
-
- **Outfit Training** (`configs/outfit.yaml`): Transformer layers, attention heads
|
| 91 |
-
- **Hardware Settings**: Mixed precision, channels-last, gradient clipping
|
| 92 |
-
|
| 93 |
-
### Model Variants
|
| 94 |
-
- **Lightweight**: MobileNetV3 + small transformer (CPU-friendly)
|
| 95 |
-
- **Standard**: ResNet50 + medium transformer (balanced)
|
| 96 |
-
- **Research**: ResNet101 + large transformer (high performance)
|
| 97 |
-
|
| 98 |
-
## 🚀 Deployment Options
|
| 99 |
-
|
| 100 |
-
### 1. Hugging Face Space (Recommended)
|
| 101 |
-
```bash
|
| 102 |
-
# Deploy to HF Space
|
| 103 |
-
./scripts/deploy_space.sh
|
| 104 |
-
|
| 105 |
-
# Customize Space settings
|
| 106 |
-
SPACE_NAME=my-dressify SPACE_HARDWARE=gpu-t4 ./scripts/deploy_space.sh
|
| 107 |
-
```
|
| 108 |
-
|
| 109 |
-
### 2. Local Development
|
| 110 |
-
```bash
|
| 111 |
-
# Setup environment
|
| 112 |
-
pip install -r requirements.txt
|
| 113 |
-
|
| 114 |
-
# Launch app (auto-downloads dataset)
|
| 115 |
-
python app.py
|
| 116 |
-
|
| 117 |
-
# Manual training
|
| 118 |
-
./scripts/train_item.sh
|
| 119 |
-
./scripts/train_outfit.sh
|
| 120 |
-
```
|
| 121 |
-
|
| 122 |
-
### 3. Docker Deployment
|
| 123 |
-
```bash
|
| 124 |
-
# Build and run
|
| 125 |
-
docker build -t dressify .
|
| 126 |
-
docker run -p 7860:7860 -p 8000:8000 dressify
|
| 127 |
-
```
|
| 128 |
-
|
| 129 |
-
## 📁 Project Structure
|
| 130 |
-
|
| 131 |
-
```
|
| 132 |
-
recomendation/
|
| 133 |
-
├── app.py # Main FastAPI + Gradio app
|
| 134 |
-
├── inference.py # Inference service
|
| 135 |
-
├── models/
|
| 136 |
-
│ ├── resnet_embedder.py # ResNet50 + projection
|
| 137 |
-
│ └── vit_outfit.py # Transformer encoder
|
| 138 |
-
├── data/
|
| 139 |
-
│ └── polyvore.py # PyTorch datasets
|
| 140 |
-
├── scripts/
|
| 141 |
-
│ ├── prepare_polyvore.py # Dataset preparation
|
| 142 |
-
│ ├── train_item.sh # ResNet training script
|
| 143 |
-
│ ├── train_outfit.sh # ViT training script
|
| 144 |
-
│ └── deploy_space.sh # HF Space deployment
|
| 145 |
-
├── utils/
|
| 146 |
-
│ ├── data_fetch.py # HF dataset downloader
|
| 147 |
-
│ ├── transforms.py # Image transforms
|
| 148 |
-
│ ├── triplet_mining.py # Semi-hard negative mining
|
| 149 |
-
│ ├── hf_utils.py # HF Hub integration
|
| 150 |
-
│ └── export.py # Model export utilities
|
| 151 |
-
├── configs/
|
| 152 |
-
│ ├── item.yaml # ResNet training config
|
| 153 |
-
│ └── outfit.yaml # ViT training config
|
| 154 |
-
├── tests/
|
| 155 |
-
│ └── test_system.py # Comprehensive tests
|
| 156 |
-
├── requirements.txt # Dependencies
|
| 157 |
-
├── Dockerfile # Container deployment
|
| 158 |
-
└── README.md # Documentation
|
| 159 |
-
```
|
| 160 |
-
|
| 161 |
-
## 🧪 Testing & Validation
|
| 162 |
-
|
| 163 |
-
### Smoke Tests
|
| 164 |
-
```bash
|
| 165 |
-
# Run comprehensive tests
|
| 166 |
-
python -m pytest tests/test_system.py -v
|
| 167 |
-
|
| 168 |
-
# Test individual components
|
| 169 |
-
python -c "from models.resnet_embedder import ResNetItemEmbedder; print('✅ ResNet OK')"
|
| 170 |
-
python -c "from models.vit_outfit import OutfitCompatibilityModel; print('✅ ViT OK')"
|
| 171 |
-
```
|
| 172 |
-
|
| 173 |
-
### Training Validation
|
| 174 |
-
```bash
|
| 175 |
-
# Quick training runs
|
| 176 |
-
EPOCHS=1 BATCH_SIZE=8 ./scripts/train_item.sh
|
| 177 |
-
EPOCHS=1 BATCH_SIZE=4 ./scripts/train_outfit.sh
|
| 178 |
-
```
|
| 179 |
-
|
| 180 |
-
## 🔬 Research Contributions
|
| 181 |
-
|
| 182 |
-
### Novel Approaches
|
| 183 |
-
1. **Hybrid Architecture**: ResNet embeddings + Transformer compatibility
|
| 184 |
-
2. **Semi-Hard Mining**: Intelligent negative sample selection
|
| 185 |
-
3. **Slot Awareness**: Category-constrained outfit composition
|
| 186 |
-
4. **Auto-Bootstrap**: Self-contained dataset preparation and training
|
| 187 |
-
|
| 188 |
-
### Technical Innovations
|
| 189 |
-
- **Mixed Precision Training**: CUDA-optimized with autocast
|
| 190 |
-
- **Channels-Last Memory**: Improved GPU memory efficiency
|
| 191 |
-
- **Background Processing**: Non-blocking system initialization
|
| 192 |
-
- **Robust Data Handling**: Graceful fallback for missing splits
|
| 193 |
-
|
| 194 |
-
## 📈 Future Enhancements
|
| 195 |
-
|
| 196 |
-
### Model Improvements
|
| 197 |
-
- **Multi-Modal**: Text descriptions + visual features
|
| 198 |
-
- **Attention Visualization**: Interpretable outfit compatibility
|
| 199 |
-
- **Style Transfer**: Generate outfit variations
|
| 200 |
-
- **Personalization**: User preference learning
|
| 201 |
-
|
| 202 |
-
### System Features
|
| 203 |
-
- **Real-Time Training**: Continuous model improvement
|
| 204 |
-
- **A/B Testing**: Multiple model variants
|
| 205 |
-
- **Performance Monitoring**: Automated quality metrics
|
| 206 |
-
- **Scalable Deployment**: Multi-GPU, distributed training
|
| 207 |
-
|
| 208 |
-
## 🤝 Integration Examples
|
| 209 |
-
|
| 210 |
-
### Next.js + Supabase
|
| 211 |
-
```typescript
|
| 212 |
-
// Complete integration example in README.md
|
| 213 |
-
// Database schema with RLS policies
|
| 214 |
-
// API endpoints for wardrobe management
|
| 215 |
-
// Real-time outfit recommendations
|
| 216 |
-
```
|
| 217 |
-
|
| 218 |
-
### API Usage
|
| 219 |
-
```bash
|
| 220 |
-
# Health check
|
| 221 |
-
curl http://localhost:8000/health
|
| 222 |
-
|
| 223 |
-
# Image embedding
|
| 224 |
-
curl -X POST http://localhost:8000/embed \
|
| 225 |
-
-H "Content-Type: application/json" \
|
| 226 |
-
-d '{"images": ["base64_image_1"]}'
|
| 227 |
-
|
| 228 |
-
# Outfit composition
|
| 229 |
-
curl -X POST http://localhost:8000/compose \
|
| 230 |
-
-H "Content-Type: application/json" \
|
| 231 |
-
-d '{"items": [{"id": "item1", "embedding": [0.1, ...]}], "context": {"occasion": "casual"}}'
|
| 232 |
-
```
|
| 233 |
-
|
| 234 |
-
## 📚 Academic References
|
| 235 |
-
|
| 236 |
-
### Core Technologies
|
| 237 |
-
- **Triplet Loss**: FaceNet, Deep Metric Learning
|
| 238 |
-
- **Transformer Architecture**: Attention Is All You Need, ViT
|
| 239 |
-
- **Outfit Compatibility**: Fashion Recommendation Systems
|
| 240 |
-
- **Dataset Preparation**: Polyvore, Fashion-MNIST
|
| 241 |
-
|
| 242 |
-
### Research Papers
|
| 243 |
-
- ResNet: Deep Residual Learning for Image Recognition
|
| 244 |
-
- ViT: An Image is Worth 16x16 Words
|
| 245 |
-
- Triplet Loss: FaceNet: A Unified Embedding for Face Recognition
|
| 246 |
-
- Fashion AI: Learning Fashion Compatibility with Visual Similarity
|
| 247 |
-
|
| 248 |
-
## 🎉 Conclusion
|
| 249 |
-
|
| 250 |
-
**Dressify** represents a **complete, production-ready** outfit recommendation system that combines:
|
| 251 |
-
|
| 252 |
-
- **Research Excellence**: State-of-the-art deep learning architectures
|
| 253 |
-
- **Production Quality**: Robust error handling, auto-recovery, monitoring
|
| 254 |
-
- **User Experience**: Intuitive interface, real-time feedback, visual output
|
| 255 |
-
- **Developer Experience**: Comprehensive testing, clear documentation, easy deployment
|
| 256 |
-
|
| 257 |
-
The system is designed to be **self-contained**, **scalable**, and **research-grade**, making it suitable for both academic research and commercial deployment. With automatic dataset preparation, intelligent training, and sophisticated inference, Dressify provides a complete solution for outfit recommendation that requires minimal setup and maintenance.
|
| 258 |
-
|
| 259 |
-
---
|
| 260 |
-
|
| 261 |
-
**Built with ❤️ for the fashion AI community**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
QUICK_START_TRAINING.md
DELETED
|
@@ -1,229 +0,0 @@
|
|
| 1 |
-
# 🚀 Quick Start: Advanced Training Interface
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
|
| 5 |
-
The Dressify system now provides **comprehensive parameter control** for both ResNet and ViT training directly from the Gradio interface. You can tweak every aspect of model training without editing code!
|
| 6 |
-
|
| 7 |
-
## 🎯 What You Can Control
|
| 8 |
-
|
| 9 |
-
### ResNet Item Embedder
|
| 10 |
-
- **Architecture**: Backbone (ResNet50/101), embedding dimension, dropout
|
| 11 |
-
- **Training**: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
|
| 12 |
-
- **Hardware**: Mixed precision, memory format, gradient clipping
|
| 13 |
-
|
| 14 |
-
### ViT Outfit Encoder
|
| 15 |
-
- **Architecture**: Transformer layers, attention heads, feed-forward multiplier, dropout
|
| 16 |
-
- **Training**: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
|
| 17 |
-
- **Strategy**: Mining strategy, augmentation level, random seed
|
| 18 |
-
|
| 19 |
-
### Advanced Settings
|
| 20 |
-
- **Learning Rate**: Warmup epochs, scheduler type, early stopping patience
|
| 21 |
-
- **Optimization**: Mixed precision, channels-last memory, gradient clipping
|
| 22 |
-
- **Reproducibility**: Random seed, deterministic training
|
| 23 |
-
|
| 24 |
-
## 🚀 Quick Start Steps
|
| 25 |
-
|
| 26 |
-
### 1. Launch the App
|
| 27 |
-
```bash
|
| 28 |
-
python app.py
|
| 29 |
-
```
|
| 30 |
-
|
| 31 |
-
### 2. Go to Advanced Training Tab
|
| 32 |
-
- Click on the **"🔬 Advanced Training"** tab
|
| 33 |
-
- You'll see comprehensive parameter controls organized in sections
|
| 34 |
-
|
| 35 |
-
### 3. Choose Your Training Mode
|
| 36 |
-
|
| 37 |
-
#### Quick Training (Basic)
|
| 38 |
-
- Set ResNet epochs: 5-10
|
| 39 |
-
- Set ViT epochs: 10-20
|
| 40 |
-
- Click **"🚀 Start Quick Training"**
|
| 41 |
-
|
| 42 |
-
#### Advanced Training (Custom)
|
| 43 |
-
- Adjust **all parameters** to your liking
|
| 44 |
-
- Click **"🎯 Start Advanced Training"**
|
| 45 |
-
|
| 46 |
-
### 4. Monitor Progress
|
| 47 |
-
- Watch the training log for real-time updates
|
| 48 |
-
- Check the Status tab for system health
|
| 49 |
-
- Download models from the Downloads tab when complete
|
| 50 |
-
|
| 51 |
-
## 🔬 Parameter Tuning Examples
|
| 52 |
-
|
| 53 |
-
### Fast Experimentation
|
| 54 |
-
```yaml
|
| 55 |
-
# Quick test (5-10 minutes)
|
| 56 |
-
ResNet: epochs=5, batch_size=16, lr=1e-3
|
| 57 |
-
ViT: epochs=10, batch_size=16, lr=5e-4
|
| 58 |
-
```
|
| 59 |
-
|
| 60 |
-
### Standard Training
|
| 61 |
-
```yaml
|
| 62 |
-
# Balanced quality (1-2 hours)
|
| 63 |
-
ResNet: epochs=20, batch_size=64, lr=1e-3
|
| 64 |
-
ViT: epochs=30, batch_size=32, lr=5e-4
|
| 65 |
-
```
|
| 66 |
-
|
| 67 |
-
### High Quality Training
|
| 68 |
-
```yaml
|
| 69 |
-
# Production models (4-6 hours)
|
| 70 |
-
ResNet: epochs=50, batch_size=32, lr=5e-4
|
| 71 |
-
ViT: epochs=100, batch_size=16, lr=1e-4
|
| 72 |
-
```
|
| 73 |
-
|
| 74 |
-
### Research Experiments
|
| 75 |
-
```yaml
|
| 76 |
-
# Maximum capacity
|
| 77 |
-
ResNet: backbone=resnet101, embedding_dim=768
|
| 78 |
-
ViT: layers=8, heads=12, mining_strategy=hardest
|
| 79 |
-
```
|
| 80 |
-
|
| 81 |
-
## 🎯 Key Parameters to Experiment With
|
| 82 |
-
|
| 83 |
-
### High Impact (Try First)
|
| 84 |
-
1. **Learning Rate**: 1e-4 to 1e-2
|
| 85 |
-
2. **Batch Size**: 16 to 128
|
| 86 |
-
3. **Triplet Margin**: 0.1 to 0.5
|
| 87 |
-
4. **Epochs**: 5 to 100
|
| 88 |
-
|
| 89 |
-
### Medium Impact
|
| 90 |
-
1. **Embedding Dimension**: 256, 512, 768, 1024
|
| 91 |
-
2. **Transformer Layers**: 4, 6, 8, 12
|
| 92 |
-
3. **Optimizer**: AdamW, Adam, SGD, RMSprop
|
| 93 |
-
|
| 94 |
-
### Fine-tuning
|
| 95 |
-
1. **Weight Decay**: 1e-6 to 1e-1
|
| 96 |
-
2. **Dropout**: 0.0 to 0.5
|
| 97 |
-
3. **Attention Heads**: 4, 8, 16
|
| 98 |
-
|
| 99 |
-
## 📊 Training Workflow
|
| 100 |
-
|
| 101 |
-
### 1. **Start Simple** 🚀
|
| 102 |
-
- Use default parameters first
|
| 103 |
-
- Run quick training (5-10 epochs)
|
| 104 |
-
- Verify system works
|
| 105 |
-
|
| 106 |
-
### 2. **Experiment Systematically** 🔍
|
| 107 |
-
- Change **one parameter at a time**
|
| 108 |
-
- Start with learning rate and batch size
|
| 109 |
-
- Document every change
|
| 110 |
-
|
| 111 |
-
### 3. **Validate Results** ✅
|
| 112 |
-
- Compare training curves
|
| 113 |
-
- Check validation metrics
|
| 114 |
-
- Ensure improvements are consistent
|
| 115 |
-
|
| 116 |
-
### 4. **Scale Up** 📈
|
| 117 |
-
- Use best parameters for longer training
|
| 118 |
-
- Increase epochs gradually
|
| 119 |
-
- Monitor for overfitting
|
| 120 |
-
|
| 121 |
-
## 🧪 Monitoring Training
|
| 122 |
-
|
| 123 |
-
### What to Watch
|
| 124 |
-
- **Training Loss**: Should decrease steadily
|
| 125 |
-
- **Validation Loss**: Should decrease without overfitting
|
| 126 |
-
- **Training Time**: Per epoch timing
|
| 127 |
-
- **GPU Memory**: VRAM usage
|
| 128 |
-
|
| 129 |
-
### Success Signs
|
| 130 |
-
- Smooth loss curves
|
| 131 |
-
- Consistent improvement
|
| 132 |
-
- Good generalization
|
| 133 |
-
|
| 134 |
-
### Warning Signs
|
| 135 |
-
- Loss spikes or plateaus
|
| 136 |
-
- Validation loss increases
|
| 137 |
-
- Training becomes unstable
|
| 138 |
-
|
| 139 |
-
## 🔧 Advanced Features
|
| 140 |
-
|
| 141 |
-
### Mixed Precision Training
|
| 142 |
-
- **Enable**: Faster training, less memory
|
| 143 |
-
- **Disable**: More stable, higher precision
|
| 144 |
-
- **Default**: Enabled (recommended)
|
| 145 |
-
|
| 146 |
-
### Triplet Mining Strategies
|
| 147 |
-
- **Semi-hard**: Balanced difficulty (default)
|
| 148 |
-
- **Hardest**: Maximum challenge
|
| 149 |
-
- **Random**: Simple but less effective
|
| 150 |
-
|
| 151 |
-
### Data Augmentation
|
| 152 |
-
- **Minimal**: Basic transforms
|
| 153 |
-
- **Standard**: Balanced augmentation (default)
|
| 154 |
-
- **Aggressive**: Heavy augmentation
|
| 155 |
-
|
| 156 |
-
## 📝 Best Practices
|
| 157 |
-
|
| 158 |
-
### 1. **Document Everything** 📚
|
| 159 |
-
- Save parameter combinations
|
| 160 |
-
- Record training results
|
| 161 |
-
- Note hardware specifications
|
| 162 |
-
|
| 163 |
-
### 2. **Start Small** 🔬
|
| 164 |
-
- Test with few epochs first
|
| 165 |
-
- Validate promising combinations
|
| 166 |
-
- Scale up gradually
|
| 167 |
-
|
| 168 |
-
### 3. **Monitor Resources** 💻
|
| 169 |
-
- Watch GPU memory usage
|
| 170 |
-
- Check training time per epoch
|
| 171 |
-
- Balance quality vs. speed
|
| 172 |
-
|
| 173 |
-
### 4. **Save Checkpoints** 💾
|
| 174 |
-
- Models are saved automatically
|
| 175 |
-
- Keep intermediate checkpoints
|
| 176 |
-
- Download final models
|
| 177 |
-
|
| 178 |
-
## 🚨 Common Issues & Solutions
|
| 179 |
-
|
| 180 |
-
### Training Too Slow
|
| 181 |
-
- **Reduce batch size**
|
| 182 |
-
- **Increase learning rate**
|
| 183 |
-
- **Use mixed precision**
|
| 184 |
-
- **Reduce embedding dimension**
|
| 185 |
-
|
| 186 |
-
### Training Unstable
|
| 187 |
-
- **Reduce learning rate**
|
| 188 |
-
- **Increase batch size**
|
| 189 |
-
- **Enable gradient clipping**
|
| 190 |
-
- **Check data quality**
|
| 191 |
-
|
| 192 |
-
### Out of Memory
|
| 193 |
-
- **Reduce batch size**
|
| 194 |
-
- **Reduce embedding dimension**
|
| 195 |
-
- **Use mixed precision**
|
| 196 |
-
- **Reduce transformer layers**
|
| 197 |
-
|
| 198 |
-
### Poor Results
|
| 199 |
-
- **Increase epochs**
|
| 200 |
-
- **Adjust learning rate**
|
| 201 |
-
- **Try different optimizers**
|
| 202 |
-
- **Check data preprocessing**
|
| 203 |
-
|
| 204 |
-
## 📚 Next Steps
|
| 205 |
-
|
| 206 |
-
### 1. **Read the Full Guide**
|
| 207 |
-
- See `TRAINING_PARAMETERS.md` for detailed explanations
|
| 208 |
-
- Understand parameter impact and trade-offs
|
| 209 |
-
|
| 210 |
-
### 2. **Run Experiments**
|
| 211 |
-
- Start with quick training
|
| 212 |
-
- Experiment with different parameters
|
| 213 |
-
- Document your findings
|
| 214 |
-
|
| 215 |
-
### 3. **Optimize for Your Use Case**
|
| 216 |
-
- Balance quality vs. speed
|
| 217 |
-
- Consider hardware constraints
|
| 218 |
-
- Aim for reproducible results
|
| 219 |
-
|
| 220 |
-
### 4. **Share Results**
|
| 221 |
-
- Document successful configurations
|
| 222 |
-
- Share insights with the community
|
| 223 |
-
- Contribute to best practices
|
| 224 |
-
|
| 225 |
-
---
|
| 226 |
-
|
| 227 |
-
**🎉 You're ready to start experimenting!**
|
| 228 |
-
|
| 229 |
-
*Remember: Start simple, change one thing at a time, and document everything. Happy training! 🚀*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README_HF_SETUP.md
DELETED
|
@@ -1,60 +0,0 @@
|
|
| 1 |
-
# Hugging Face Setup Guide
|
| 2 |
-
|
| 3 |
-
## 🔐 Setting Up Hugging Face Authentication
|
| 4 |
-
|
| 5 |
-
### 1. Get Your HF Token
|
| 6 |
-
- Go to https://huggingface.co/settings/tokens
|
| 7 |
-
- Create a new token with **Write** permissions
|
| 8 |
-
- Copy the token (starts with `hf_...`)
|
| 9 |
-
|
| 10 |
-
### 2. Set Environment Variables
|
| 11 |
-
|
| 12 |
-
#### Option A: In Hugging Face Spaces (Recommended)
|
| 13 |
-
1. Go to your Space settings
|
| 14 |
-
2. Add these secrets:
|
| 15 |
-
- `HF_TOKEN`: Your Hugging Face token
|
| 16 |
-
- `HF_USERNAME`: Your Hugging Face username (e.g., "Stylique")
|
| 17 |
-
|
| 18 |
-
#### Option B: Local Development
|
| 19 |
-
```bash
|
| 20 |
-
export HF_TOKEN="hf_your_token_here"
|
| 21 |
-
export HF_USERNAME="your_username"
|
| 22 |
-
```
|
| 23 |
-
|
| 24 |
-
### 3. Verify Setup
|
| 25 |
-
```bash
|
| 26 |
-
source setup_hf.sh
|
| 27 |
-
```
|
| 28 |
-
|
| 29 |
-
## 🚀 What Happens Next
|
| 30 |
-
|
| 31 |
-
Once environment variables are set, the system will automatically:
|
| 32 |
-
- ✅ Authenticate with Hugging Face
|
| 33 |
-
- ✅ Upload trained models to `{HF_USERNAME}/dressify-models`
|
| 34 |
-
- ✅ Upload datasets to `{HF_USERNAME}/Dressify-Helper`
|
| 35 |
-
- ✅ Create repositories if they don't exist
|
| 36 |
-
|
| 37 |
-
## 🔒 Security Notes
|
| 38 |
-
|
| 39 |
-
- **Never commit tokens to git**
|
| 40 |
-
- **Use environment variables or HF Spaces secrets**
|
| 41 |
-
- **Tokens are automatically masked in logs**
|
| 42 |
-
|
| 43 |
-
## 📁 Repository Structure
|
| 44 |
-
|
| 45 |
-
After successful upload:
|
| 46 |
-
```
|
| 47 |
-
{HF_USERNAME}/dressify-models/
|
| 48 |
-
├── resnet_item_embedder_best.pth
|
| 49 |
-
├── vit_outfit_model_best.pth
|
| 50 |
-
├── resnet_metrics.json
|
| 51 |
-
└── vit_metrics.json
|
| 52 |
-
|
| 53 |
-
{HF_USERNAME}/Dressify-Helper/
|
| 54 |
-
├── train.json
|
| 55 |
-
├── valid.json
|
| 56 |
-
├── test.json
|
| 57 |
-
├── outfit_triplets_train.json
|
| 58 |
-
├── outfit_triplets_valid.json
|
| 59 |
-
└── outfit_triplets_test.json
|
| 60 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
RECOMMENDATION_PIPELINE_EXPLAINED.md
ADDED
|
@@ -0,0 +1,340 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🎯 How Dressify Recommendations Actually Work
|
| 2 |
+
|
| 3 |
+
## ✅ **YES - Both ResNet and ViT are used during inference!**
|
| 4 |
+
|
| 5 |
+
This document explains the complete recommendation pipeline and proves that both deep learning models are actively used.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## 📊 **Complete Recommendation Pipeline**
|
| 10 |
+
|
| 11 |
+
### **Step 1: Image Input & Category Detection**
|
| 12 |
+
**Location:** `inference.py:356-384`
|
| 13 |
+
|
| 14 |
+
```python
|
| 15 |
+
# User uploads wardrobe images
|
| 16 |
+
items = [
|
| 17 |
+
{"id": "item_0", "image": <PIL.Image>, "category": None},
|
| 18 |
+
{"id": "item_1", "image": <PIL.Image>, "category": None},
|
| 19 |
+
...
|
| 20 |
+
]
|
| 21 |
+
|
| 22 |
+
# For each item:
|
| 23 |
+
for item in items:
|
| 24 |
+
# 1. Auto-detect category using CLIP (if available)
|
| 25 |
+
category = self._detect_category_with_clip(item["image"])
|
| 26 |
+
# OR fallback to filename-based detection
|
| 27 |
+
|
| 28 |
+
# 2. Generate embedding if not provided
|
| 29 |
+
if embedding is None:
|
| 30 |
+
embedding = self.embed_images([item["image"]])[0]
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
**What happens:**
|
| 34 |
+
- Each clothing item image is processed
|
| 35 |
+
- Category is detected (shirt, pants, shoes, etc.) using CLIP or filename
|
| 36 |
+
- If no embedding exists, it's generated using **ResNet**
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
### **Step 2: ResNet Generates Item Embeddings** ⭐
|
| 41 |
+
**Location:** `inference.py:313-337` → `embed_images()`
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
@torch.inference_mode()
|
| 45 |
+
def embed_images(self, images: List[Image.Image]) -> List[np.ndarray]:
|
| 46 |
+
# Transform images to tensor
|
| 47 |
+
batch = torch.stack([self.transform(img) for img in images])
|
| 48 |
+
batch = batch.to(self.device, memory_format=torch.channels_last)
|
| 49 |
+
|
| 50 |
+
# ✅ RESNET IS CALLED HERE!
|
| 51 |
+
use_amp = (self.device == "cuda")
|
| 52 |
+
with torch.autocast(device_type=("cuda" if use_amp else "cpu"), enabled=use_amp):
|
| 53 |
+
emb = self.resnet(batch) # <-- RESNET FORWARD PASS
|
| 54 |
+
|
| 55 |
+
# Normalize embeddings
|
| 56 |
+
emb = nn.functional.normalize(emb, dim=-1)
|
| 57 |
+
result = [e.detach().cpu().numpy().astype(np.float32) for e in emb]
|
| 58 |
+
return result
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
**What ResNet does:**
|
| 62 |
+
- Takes raw clothing item images (224x224 RGB)
|
| 63 |
+
- Passes through ResNet50 backbone (pretrained on ImageNet)
|
| 64 |
+
- Generates **512-dimensional embeddings** for each item
|
| 65 |
+
- These embeddings capture visual features (color, texture, style, pattern)
|
| 66 |
+
|
| 67 |
+
**Example:**
|
| 68 |
+
- Input: Image of a blue shirt → ResNet → Output: `[0.123, -0.456, 0.789, ...]` (512-dim vector)
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
### **Step 3: Tag Processing & Context Building**
|
| 73 |
+
**Location:** `inference.py:490-545`
|
| 74 |
+
|
| 75 |
+
```python
|
| 76 |
+
# Process user tags (occasion, weather, style, etc.)
|
| 77 |
+
processed_tags = self.tag_processor.process_tags(context)
|
| 78 |
+
|
| 79 |
+
# Build outfit template based on tags
|
| 80 |
+
template = outfit_templates[outfit_style].copy()
|
| 81 |
+
# Apply weather/occasion modifications
|
| 82 |
+
# Generate constraints (min_items, max_items, accessory_limit)
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
**What happens:**
|
| 86 |
+
- User preferences (formal, cold weather, elegant style) are processed
|
| 87 |
+
- Outfit templates are selected and modified
|
| 88 |
+
- Constraints are generated (e.g., formal requires 4-5 items, needs outerwear)
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
### **Step 4: Candidate Outfit Generation**
|
| 93 |
+
**Location:** `inference.py:910-1092`
|
| 94 |
+
|
| 95 |
+
```python
|
| 96 |
+
# Generate many candidate outfit combinations
|
| 97 |
+
candidates = []
|
| 98 |
+
for _ in range(num_samples): # Typically 50-100+ candidates
|
| 99 |
+
subset = []
|
| 100 |
+
|
| 101 |
+
# Strategy-based generation:
|
| 102 |
+
# - Strategy 0: Core outfit (shirt + pants + shoes + accessories)
|
| 103 |
+
# - Strategy 1: Accessory-focused
|
| 104 |
+
# - Strategy 2: Flexible combination
|
| 105 |
+
|
| 106 |
+
# Add items based on context (formal, casual, etc.)
|
| 107 |
+
if occasion == "formal" and outerwear:
|
| 108 |
+
subset.append(jacket)
|
| 109 |
+
subset.append(shirt)
|
| 110 |
+
subset.append(pants)
|
| 111 |
+
subset.append(shoes)
|
| 112 |
+
|
| 113 |
+
candidates.append(subset)
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
**What happens:**
|
| 117 |
+
- System generates **50-100+ candidate outfit combinations**
|
| 118 |
+
- Each candidate is a list of item indices (e.g., `[0, 3, 7, 12]`)
|
| 119 |
+
- Candidates are generated using:
|
| 120 |
+
- Category pools (uppers, bottoms, shoes, outerwear, accessories)
|
| 121 |
+
- Context-aware strategies (formal vs casual)
|
| 122 |
+
- Randomization for variety
|
| 123 |
+
|
| 124 |
+
---
|
| 125 |
+
|
| 126 |
+
### **Step 5: ViT Scores Outfit Compatibility** ⭐⭐
|
| 127 |
+
**Location:** `inference.py:1094-1103` → `score_subset()`
|
| 128 |
+
|
| 129 |
+
```python
|
| 130 |
+
def score_subset(idx_subset: List[int]) -> float:
|
| 131 |
+
# Get embeddings for items in this outfit
|
| 132 |
+
embs = torch.tensor(
|
| 133 |
+
np.stack([proc_items[i]["embedding"] for i in idx_subset], axis=0),
|
| 134 |
+
dtype=torch.float32,
|
| 135 |
+
device=self.device,
|
| 136 |
+
) # Shape: (N, 512) where N = number of items in outfit
|
| 137 |
+
|
| 138 |
+
embs = embs.unsqueeze(0) # Shape: (1, N, 512) - batch dimension
|
| 139 |
+
|
| 140 |
+
# ✅ VIT IS CALLED HERE!
|
| 141 |
+
s = self.vit.score_compatibility(embs).item() # <-- VIT FORWARD PASS
|
| 142 |
+
return float(s)
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
**What ViT does:**
|
| 146 |
+
- Takes **multiple item embeddings** (e.g., jacket, shirt, pants, shoes)
|
| 147 |
+
- Passes through **Vision Transformer encoder**:
|
| 148 |
+
- Transformer processes the sequence of item embeddings
|
| 149 |
+
- Learns relationships between items (do they go together?)
|
| 150 |
+
- Outputs a **compatibility score** (higher = better match)
|
| 151 |
+
|
| 152 |
+
**ViT Architecture:**
|
| 153 |
+
```python
|
| 154 |
+
# From models/vit_outfit.py
|
| 155 |
+
class OutfitCompatibilityModel(nn.Module):
|
| 156 |
+
def forward(self, tokens: torch.Tensor) -> torch.Tensor:
|
| 157 |
+
# tokens: (B, N, D) - batch of outfits, each with N items, D-dim embeddings
|
| 158 |
+
h = self.encoder(tokens) # Transformer encoder
|
| 159 |
+
pooled = h.mean(dim=1) # Average pooling across items
|
| 160 |
+
score = self.compatibility_head(pooled) # Final compatibility score
|
| 161 |
+
return score.squeeze(-1)
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
**Example:**
|
| 165 |
+
- Input: `[jacket_emb, shirt_emb, pants_emb, shoes_emb]` (4 items × 512 dims)
|
| 166 |
+
- ViT Processing: Transformer analyzes relationships
|
| 167 |
+
- Output: `0.85` (high compatibility score)
|
| 168 |
+
|
| 169 |
+
---
|
| 170 |
+
|
| 171 |
+
### **Step 6: Scoring & Ranking**
|
| 172 |
+
**Location:** `inference.py:1266-1274`
|
| 173 |
+
|
| 174 |
+
```python
|
| 175 |
+
# Score all valid candidates
|
| 176 |
+
scored = []
|
| 177 |
+
for subset in valid_candidates:
|
| 178 |
+
base_score = score_subset(subset) # <-- ViT score (0.0 to 1.0+)
|
| 179 |
+
|
| 180 |
+
# Apply penalties and bonuses
|
| 181 |
+
adjusted_score = calculate_outfit_penalty(subset, base_score)
|
| 182 |
+
# - Penalties: missing categories, duplicates, wrong context
|
| 183 |
+
# - Bonuses: color harmony, style coherence, complete sets
|
| 184 |
+
|
| 185 |
+
scored.append((subset, adjusted_score, base_score))
|
| 186 |
+
|
| 187 |
+
# Sort by adjusted score (highest first)
|
| 188 |
+
scored.sort(key=lambda x: x[1], reverse=True)
|
| 189 |
+
```
|
| 190 |
+
|
| 191 |
+
**What happens:**
|
| 192 |
+
- Each candidate outfit gets:
|
| 193 |
+
1. **Base score from ViT** (0.0 to ~1.0+)
|
| 194 |
+
2. **Penalties** (e.g., -500 if formal without jacket)
|
| 195 |
+
3. **Bonuses** (e.g., +0.6 for color harmony, +0.4 for style coherence)
|
| 196 |
+
- Final score = base_score + penalties + bonuses
|
| 197 |
+
- Outfits are ranked by final score
|
| 198 |
+
|
| 199 |
+
---
|
| 200 |
+
|
| 201 |
+
### **Step 7: Final Selection & Deduplication**
|
| 202 |
+
**Location:** `inference.py:1276-1300`
|
| 203 |
+
|
| 204 |
+
```python
|
| 205 |
+
# Remove duplicate outfits
|
| 206 |
+
seen_outfits = set()
|
| 207 |
+
unique_scored = []
|
| 208 |
+
for subset, adjusted_score, base_score in scored:
|
| 209 |
+
normalized = normalize_outfit(subset) # Sort item IDs
|
| 210 |
+
if normalized not in seen_outfits:
|
| 211 |
+
seen_outfits.add(normalized)
|
| 212 |
+
unique_scored.append((subset, adjusted_score, base_score))
|
| 213 |
+
|
| 214 |
+
# Select top N with randomization
|
| 215 |
+
topk = unique_scored[:num_outfits]
|
| 216 |
+
```
|
| 217 |
+
|
| 218 |
+
**What happens:**
|
| 219 |
+
- Duplicate outfits (same items, different order) are removed
|
| 220 |
+
- Top N outfits are selected
|
| 221 |
+
- Some randomization is added for variety
|
| 222 |
+
|
| 223 |
+
---
|
| 224 |
+
|
| 225 |
+
## 🔍 **Proof: Both Models Are Used**
|
| 226 |
+
|
| 227 |
+
### **Evidence 1: ResNet Usage**
|
| 228 |
+
```python
|
| 229 |
+
# Line 330 in inference.py
|
| 230 |
+
emb = self.resnet(batch) # ✅ ResNet forward pass
|
| 231 |
+
```
|
| 232 |
+
- Called in `embed_images()` method
|
| 233 |
+
- Generates embeddings for every clothing item
|
| 234 |
+
- **Called during inference** when items don't have pre-computed embeddings
|
| 235 |
+
|
| 236 |
+
### **Evidence 2: ViT Usage**
|
| 237 |
+
```python
|
| 238 |
+
# Line 1102 in inference.py
|
| 239 |
+
s = self.vit.score_compatibility(embs).item() # ✅ ViT forward pass
|
| 240 |
+
```
|
| 241 |
+
- Called in `score_subset()` function
|
| 242 |
+
- Scores **every candidate outfit** (50-100+ times per recommendation request)
|
| 243 |
+
- **Called during inference** to rank outfit combinations
|
| 244 |
+
|
| 245 |
+
### **Evidence 3: Model Loading**
|
| 246 |
+
```python
|
| 247 |
+
# Lines 49-50, 285-286 in inference.py
|
| 248 |
+
self.resnet, self.resnet_loaded = self._load_resnet()
|
| 249 |
+
self.vit, self.vit_loaded = self._load_vit()
|
| 250 |
+
|
| 251 |
+
# Models are loaded and set to eval mode
|
| 252 |
+
if self.resnet_loaded:
|
| 253 |
+
self.resnet = self.resnet.to(self.device).eval()
|
| 254 |
+
if self.vit_loaded:
|
| 255 |
+
self.vit = self.vit.to(self.device).eval()
|
| 256 |
+
```
|
| 257 |
+
|
| 258 |
+
---
|
| 259 |
+
|
| 260 |
+
## 📈 **Complete Flow Diagram**
|
| 261 |
+
|
| 262 |
+
```
|
| 263 |
+
User Input
|
| 264 |
+
↓
|
| 265 |
+
[Upload Images] → [CLIP Category Detection]
|
| 266 |
+
↓
|
| 267 |
+
[ResNet Embedding Generation] ← ✅ RESNET USED HERE
|
| 268 |
+
↓
|
| 269 |
+
[512-dim Embeddings for Each Item]
|
| 270 |
+
↓
|
| 271 |
+
[Tag Processing] → [Context Building]
|
| 272 |
+
↓
|
| 273 |
+
[Candidate Generation] → [50-100+ Outfit Combinations]
|
| 274 |
+
↓
|
| 275 |
+
[ViT Compatibility Scoring] ← ✅ VIT USED HERE (50-100+ times)
|
| 276 |
+
↓
|
| 277 |
+
[Penalty/Bonus Adjustment]
|
| 278 |
+
↓
|
| 279 |
+
[Ranking & Deduplication]
|
| 280 |
+
↓
|
| 281 |
+
[Top N Recommendations]
|
| 282 |
+
```
|
| 283 |
+
|
| 284 |
+
---
|
| 285 |
+
|
| 286 |
+
## 🎯 **Key Points**
|
| 287 |
+
|
| 288 |
+
1. **ResNet is used:**
|
| 289 |
+
- Generates embeddings for each clothing item
|
| 290 |
+
- Called once per item (or uses cached embeddings)
|
| 291 |
+
- Output: 512-dimensional feature vectors
|
| 292 |
+
|
| 293 |
+
2. **ViT is used:**
|
| 294 |
+
- Scores compatibility of outfit combinations
|
| 295 |
+
- Called **50-100+ times** per recommendation request (once per candidate)
|
| 296 |
+
- Output: Compatibility score (0.0 to ~1.0+)
|
| 297 |
+
|
| 298 |
+
3. **Both models work together:**
|
| 299 |
+
- ResNet provides item-level understanding
|
| 300 |
+
- ViT provides outfit-level compatibility
|
| 301 |
+
- Together they create personalized, context-aware recommendations
|
| 302 |
+
|
| 303 |
+
4. **The system is NOT just rule-based:**
|
| 304 |
+
- Deep learning models (ResNet + ViT) provide the core intelligence
|
| 305 |
+
- Rules and heuristics (penalties/bonuses) refine the results
|
| 306 |
+
- Tags and context guide the generation process
|
| 307 |
+
|
| 308 |
+
---
|
| 309 |
+
|
| 310 |
+
## 🔬 **Technical Details**
|
| 311 |
+
|
| 312 |
+
### **ResNet Architecture:**
|
| 313 |
+
- **Backbone:** ResNet50 (pretrained on ImageNet)
|
| 314 |
+
- **Input:** 224×224 RGB images
|
| 315 |
+
- **Output:** 512-dimensional embeddings
|
| 316 |
+
- **Purpose:** Extract visual features from clothing items
|
| 317 |
+
|
| 318 |
+
### **ViT Architecture:**
|
| 319 |
+
- **Encoder:** Transformer with 4-6 layers, 8 attention heads
|
| 320 |
+
- **Input:** Sequence of item embeddings (variable length, 2-6 items)
|
| 321 |
+
- **Output:** Single compatibility score
|
| 322 |
+
- **Purpose:** Learn which items go well together
|
| 323 |
+
|
| 324 |
+
### **Training:**
|
| 325 |
+
- **ResNet:** Trained with triplet loss on item pairs
|
| 326 |
+
- **ViT:** Trained with triplet loss on outfit triplets (positive, anchor, negative)
|
| 327 |
+
- **Both:** Use early stopping, best model checkpointing
|
| 328 |
+
|
| 329 |
+
---
|
| 330 |
+
|
| 331 |
+
## ✅ **Conclusion**
|
| 332 |
+
|
| 333 |
+
**YES - Both ResNet and ViT are actively used during inference!**
|
| 334 |
+
|
| 335 |
+
- **ResNet** generates item embeddings (visual understanding)
|
| 336 |
+
- **ViT** scores outfit compatibility (relationship learning)
|
| 337 |
+
- Together they create intelligent, personalized recommendations
|
| 338 |
+
|
| 339 |
+
The system is a **true deep learning pipeline**, not just rule-based filtering!
|
| 340 |
+
|
app.py
CHANGED
|
@@ -17,6 +17,15 @@ import json
|
|
| 17 |
from inference import InferenceService
|
| 18 |
from utils.data_fetch import ensure_dataset_ready
|
| 19 |
from utils.tag_system import get_all_tag_options, validate_tags, TagProcessor
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
# Global state
|
| 22 |
BOOT_STATUS = "starting"
|
|
@@ -335,6 +344,18 @@ def get_tags() -> dict:
|
|
| 335 |
}
|
| 336 |
}
|
| 337 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 338 |
@app.post("/tags/validate")
|
| 339 |
def validate_request_tags(tags: Dict[str, Any], x_api_key: Optional[str] = Header(None)) -> dict:
|
| 340 |
"""
|
|
@@ -389,20 +410,52 @@ def test_recommend() -> dict:
|
|
| 389 |
|
| 390 |
@app.post("/embed")
|
| 391 |
def embed(req: EmbedRequest, x_api_key: Optional[str] = Header(None)) -> dict:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 392 |
require_api_key(x_api_key)
|
| 393 |
images: List[Image.Image] = []
|
|
|
|
|
|
|
|
|
|
| 394 |
if req.image_urls:
|
| 395 |
for url in req.image_urls:
|
| 396 |
-
|
| 397 |
-
|
| 398 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 399 |
if req.images_base64:
|
| 400 |
for b64 in req.images_base64:
|
| 401 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 402 |
if not images:
|
| 403 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 404 |
embs = service.embed_images(images)
|
| 405 |
-
return {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 406 |
|
| 407 |
|
| 408 |
@app.post("/compose")
|
|
@@ -498,14 +551,11 @@ def artifacts() -> dict:
|
|
| 498 |
# --------- Gradio UI ---------
|
| 499 |
|
| 500 |
def _load_images_from_files(files: List[str]) -> List[Image.Image]:
|
| 501 |
-
|
| 502 |
-
|
| 503 |
-
|
| 504 |
-
|
| 505 |
-
|
| 506 |
-
except Exception:
|
| 507 |
-
continue
|
| 508 |
-
return images
|
| 509 |
|
| 510 |
|
| 511 |
def gradio_embed(files: List[str]):
|
|
@@ -870,9 +920,9 @@ def start_training_simple(dataset_size: str, res_epochs: int, vit_epochs: int):
|
|
| 870 |
# Train ResNet first and wait for completion
|
| 871 |
log_message += f"\n🚀 Starting ResNet training on {dataset_size} samples...\n"
|
| 872 |
resnet_result = subprocess.run([
|
| 873 |
-
|
| 874 |
"--batch_size", "4", "--lr", "1e-3", "--early_stopping_patience", "3",
|
| 875 |
-
|
| 876 |
] + dataset_args, capture_output=True, text=True, check=False)
|
| 877 |
|
| 878 |
if resnet_result.returncode == 0:
|
|
@@ -897,7 +947,7 @@ def start_training_simple(dataset_size: str, res_epochs: int, vit_epochs: int):
|
|
| 897 |
|
| 898 |
log_message += f"\n🚀 Starting ViT training on {dataset_size} samples...\n"
|
| 899 |
vit_result = subprocess.run([
|
| 900 |
-
|
| 901 |
"--batch_size", "4", "--lr", "5e-4", "--early_stopping_patience", "5",
|
| 902 |
"--max_samples", "5000", "--triplet_margin", "0.5", "--gradient_clip", "1.0",
|
| 903 |
"--warmup_epochs", "2", "--export", os.path.join(export_dir, "vit_outfit_model.pth")
|
|
@@ -956,8 +1006,14 @@ with gr.Blocks(fill_height=True, title="Dressify - Advanced Outfit Recommendatio
|
|
| 956 |
|
| 957 |
with gr.Tab("🎨 Recommend"):
|
| 958 |
gr.Markdown("### 🎯 Personalized Outfit Recommendations\n*Upload your wardrobe and customize recommendations with advanced tag preferences*")
|
|
|
|
| 959 |
|
| 960 |
-
inp2 = gr.Files(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 961 |
|
| 962 |
with gr.Accordion("🎯 Primary Tags (Required)", open=True):
|
| 963 |
with gr.Row():
|
|
|
|
| 17 |
from inference import InferenceService
|
| 18 |
from utils.data_fetch import ensure_dataset_ready
|
| 19 |
from utils.tag_system import get_all_tag_options, validate_tags, TagProcessor
|
| 20 |
+
from utils.image_utils import (
|
| 21 |
+
load_images_from_files,
|
| 22 |
+
load_image_from_bytes,
|
| 23 |
+
load_image_from_url,
|
| 24 |
+
is_image_file,
|
| 25 |
+
get_supported_formats,
|
| 26 |
+
get_supported_extensions,
|
| 27 |
+
ensure_rgb_image
|
| 28 |
+
)
|
| 29 |
|
| 30 |
# Global state
|
| 31 |
BOOT_STATUS = "starting"
|
|
|
|
| 344 |
}
|
| 345 |
}
|
| 346 |
|
| 347 |
+
@app.get("/image-formats")
|
| 348 |
+
def get_image_formats() -> dict:
|
| 349 |
+
"""
|
| 350 |
+
Get all supported image formats for API integration.
|
| 351 |
+
"""
|
| 352 |
+
return {
|
| 353 |
+
"supported_formats": get_supported_formats(),
|
| 354 |
+
"supported_extensions": get_supported_extensions(),
|
| 355 |
+
"description": "All major image formats are supported including JPG, PNG, WEBP, GIF, BMP, TIFF, and more",
|
| 356 |
+
"note": "Images are automatically converted to RGB mode for model processing"
|
| 357 |
+
}
|
| 358 |
+
|
| 359 |
@app.post("/tags/validate")
|
| 360 |
def validate_request_tags(tags: Dict[str, Any], x_api_key: Optional[str] = Header(None)) -> dict:
|
| 361 |
"""
|
|
|
|
| 410 |
|
| 411 |
@app.post("/embed")
|
| 412 |
def embed(req: EmbedRequest, x_api_key: Optional[str] = Header(None)) -> dict:
|
| 413 |
+
"""
|
| 414 |
+
Generate embeddings for images with comprehensive format support.
|
| 415 |
+
Supports JPG, PNG, WEBP, GIF, BMP, TIFF, and other major formats.
|
| 416 |
+
"""
|
| 417 |
require_api_key(x_api_key)
|
| 418 |
images: List[Image.Image] = []
|
| 419 |
+
errors = []
|
| 420 |
+
|
| 421 |
+
# Load from URLs
|
| 422 |
if req.image_urls:
|
| 423 |
for url in req.image_urls:
|
| 424 |
+
img = load_image_from_url(url, timeout=20, convert_to_rgb=True, raise_on_error=False)
|
| 425 |
+
if img is not None:
|
| 426 |
+
images.append(img)
|
| 427 |
+
else:
|
| 428 |
+
errors.append(f"Failed to load image from URL: {url}")
|
| 429 |
+
|
| 430 |
+
# Load from base64
|
| 431 |
if req.images_base64:
|
| 432 |
for b64 in req.images_base64:
|
| 433 |
+
try:
|
| 434 |
+
image_bytes = base64.b64decode(b64)
|
| 435 |
+
img = load_image_from_bytes(image_bytes, convert_to_rgb=True, raise_on_error=False)
|
| 436 |
+
if img is not None:
|
| 437 |
+
images.append(img)
|
| 438 |
+
else:
|
| 439 |
+
errors.append("Failed to load image from base64")
|
| 440 |
+
except Exception as e:
|
| 441 |
+
errors.append(f"Error decoding base64 image: {str(e)}")
|
| 442 |
+
|
| 443 |
if not images:
|
| 444 |
+
error_msg = "No images provided or all images failed to load"
|
| 445 |
+
if errors:
|
| 446 |
+
error_msg += f". Errors: {', '.join(errors[:3])}"
|
| 447 |
+
raise HTTPException(status_code=400, detail=error_msg)
|
| 448 |
+
|
| 449 |
+
# Ensure all images are RGB
|
| 450 |
+
images = [ensure_rgb_image(img) for img in images]
|
| 451 |
+
|
| 452 |
embs = service.embed_images(images)
|
| 453 |
+
return {
|
| 454 |
+
"embeddings": [e.tolist() for e in embs],
|
| 455 |
+
"model_version": service.resnet_version,
|
| 456 |
+
"images_loaded": len(images),
|
| 457 |
+
"errors": errors if errors else None
|
| 458 |
+
}
|
| 459 |
|
| 460 |
|
| 461 |
@app.post("/compose")
|
|
|
|
| 551 |
# --------- Gradio UI ---------
|
| 552 |
|
| 553 |
def _load_images_from_files(files: List[str]) -> List[Image.Image]:
|
| 554 |
+
"""
|
| 555 |
+
Load images from file paths with comprehensive format support.
|
| 556 |
+
Supports JPG, PNG, WEBP, GIF, BMP, TIFF, and other major formats.
|
| 557 |
+
"""
|
| 558 |
+
return load_images_from_files(files, convert_to_rgb=True, skip_errors=True)
|
|
|
|
|
|
|
|
|
|
| 559 |
|
| 560 |
|
| 561 |
def gradio_embed(files: List[str]):
|
|
|
|
| 920 |
# Train ResNet first and wait for completion
|
| 921 |
log_message += f"\n🚀 Starting ResNet training on {dataset_size} samples...\n"
|
| 922 |
resnet_result = subprocess.run([
|
| 923 |
+
"python", "train_resnet.py", "--data_root", DATASET_ROOT, "--epochs", str(res_epochs),
|
| 924 |
"--batch_size", "4", "--lr", "1e-3", "--early_stopping_patience", "3",
|
| 925 |
+
"--out", os.path.join(export_dir, "resnet_item_embedder.pth")
|
| 926 |
] + dataset_args, capture_output=True, text=True, check=False)
|
| 927 |
|
| 928 |
if resnet_result.returncode == 0:
|
|
|
|
| 947 |
|
| 948 |
log_message += f"\n🚀 Starting ViT training on {dataset_size} samples...\n"
|
| 949 |
vit_result = subprocess.run([
|
| 950 |
+
"python", "train_vit_triplet.py", "--data_root", DATASET_ROOT, "--epochs", str(vit_epochs),
|
| 951 |
"--batch_size", "4", "--lr", "5e-4", "--early_stopping_patience", "5",
|
| 952 |
"--max_samples", "5000", "--triplet_margin", "0.5", "--gradient_clip", "1.0",
|
| 953 |
"--warmup_epochs", "2", "--export", os.path.join(export_dir, "vit_outfit_model.pth")
|
|
|
|
| 1006 |
|
| 1007 |
with gr.Tab("🎨 Recommend"):
|
| 1008 |
gr.Markdown("### 🎯 Personalized Outfit Recommendations\n*Upload your wardrobe and customize recommendations with advanced tag preferences*")
|
| 1009 |
+
gr.Markdown(f"**Supported Formats:** {', '.join(get_supported_extensions())} (JPG, PNG, WEBP, GIF, BMP, TIFF, and more)")
|
| 1010 |
|
| 1011 |
+
inp2 = gr.Files(
|
| 1012 |
+
label="Upload wardrobe images",
|
| 1013 |
+
file_types=["image"],
|
| 1014 |
+
file_count="multiple",
|
| 1015 |
+
type="filepath"
|
| 1016 |
+
)
|
| 1017 |
|
| 1018 |
with gr.Accordion("🎯 Primary Tags (Required)", open=True):
|
| 1019 |
with gr.Row():
|
inference.py
CHANGED
|
@@ -16,6 +16,7 @@ from utils.transforms import build_inference_transform
|
|
| 16 |
from models.resnet_embedder import ResNetItemEmbedder
|
| 17 |
from models.vit_outfit import OutfitCompatibilityModel
|
| 18 |
from utils.tag_system import TagProcessor, get_all_tag_options, validate_tags
|
|
|
|
| 19 |
|
| 20 |
|
| 21 |
def _get_device() -> str:
|
|
@@ -312,6 +313,10 @@ class InferenceService:
|
|
| 312 |
|
| 313 |
@torch.inference_mode()
|
| 314 |
def embed_images(self, images: List[Image.Image]) -> List[np.ndarray]:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 315 |
print(f"🔍 DEBUG: embed_images called with {len(images)} images")
|
| 316 |
if len(images) == 0:
|
| 317 |
print("🔍 DEBUG: No images provided, returning empty list")
|
|
@@ -321,9 +326,27 @@ class InferenceService:
|
|
| 321 |
if self.resnet is None:
|
| 322 |
print("🔍 DEBUG: ResNet model is None, returning empty list")
|
| 323 |
return []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 324 |
|
| 325 |
try:
|
| 326 |
-
batch = torch.stack([self.transform(img) for img in
|
| 327 |
batch = batch.to(self.device, memory_format=torch.channels_last)
|
| 328 |
use_amp = (self.device == "cuda")
|
| 329 |
with torch.autocast(device_type=("cuda" if use_amp else "cpu"), enabled=use_amp):
|
|
@@ -334,6 +357,8 @@ class InferenceService:
|
|
| 334 |
return result
|
| 335 |
except Exception as e:
|
| 336 |
print(f"🔍 DEBUG: Error in embed_images: {e}")
|
|
|
|
|
|
|
| 337 |
return []
|
| 338 |
|
| 339 |
@torch.inference_mode()
|
|
|
|
| 16 |
from models.resnet_embedder import ResNetItemEmbedder
|
| 17 |
from models.vit_outfit import OutfitCompatibilityModel
|
| 18 |
from utils.tag_system import TagProcessor, get_all_tag_options, validate_tags
|
| 19 |
+
from utils.image_utils import ensure_rgb_image, validate_image_format
|
| 20 |
|
| 21 |
|
| 22 |
def _get_device() -> str:
|
|
|
|
| 313 |
|
| 314 |
@torch.inference_mode()
|
| 315 |
def embed_images(self, images: List[Image.Image]) -> List[np.ndarray]:
|
| 316 |
+
"""
|
| 317 |
+
Generate embeddings for images with comprehensive format support.
|
| 318 |
+
All images are validated and converted to RGB before processing.
|
| 319 |
+
"""
|
| 320 |
print(f"🔍 DEBUG: embed_images called with {len(images)} images")
|
| 321 |
if len(images) == 0:
|
| 322 |
print("🔍 DEBUG: No images provided, returning empty list")
|
|
|
|
| 326 |
if self.resnet is None:
|
| 327 |
print("🔍 DEBUG: ResNet model is None, returning empty list")
|
| 328 |
return []
|
| 329 |
+
|
| 330 |
+
# Validate and convert all images to RGB
|
| 331 |
+
processed_images = []
|
| 332 |
+
for i, img in enumerate(images):
|
| 333 |
+
is_valid, error_msg = validate_image_format(img)
|
| 334 |
+
if not is_valid:
|
| 335 |
+
print(f"⚠️ Skipping invalid image {i}: {error_msg}")
|
| 336 |
+
continue
|
| 337 |
+
|
| 338 |
+
# Ensure RGB mode (required for ResNet)
|
| 339 |
+
rgb_img = ensure_rgb_image(img)
|
| 340 |
+
processed_images.append(rgb_img)
|
| 341 |
+
|
| 342 |
+
if len(processed_images) == 0:
|
| 343 |
+
print("⚠️ No valid images after processing")
|
| 344 |
+
return []
|
| 345 |
+
|
| 346 |
+
print(f"🔍 DEBUG: Processing {len(processed_images)} valid images")
|
| 347 |
|
| 348 |
try:
|
| 349 |
+
batch = torch.stack([self.transform(img) for img in processed_images])
|
| 350 |
batch = batch.to(self.device, memory_format=torch.channels_last)
|
| 351 |
use_amp = (self.device == "cuda")
|
| 352 |
with torch.autocast(device_type=("cuda" if use_amp else "cpu"), enabled=use_amp):
|
|
|
|
| 357 |
return result
|
| 358 |
except Exception as e:
|
| 359 |
print(f"🔍 DEBUG: Error in embed_images: {e}")
|
| 360 |
+
import traceback
|
| 361 |
+
traceback.print_exc()
|
| 362 |
return []
|
| 363 |
|
| 364 |
@torch.inference_mode()
|
utils/artifact_manager.py
CHANGED
|
@@ -90,7 +90,10 @@ class ArtifactManager:
|
|
| 90 |
images_dir = os.path.join(self.data_dir, "images")
|
| 91 |
if os.path.exists(images_dir):
|
| 92 |
try:
|
| 93 |
-
|
|
|
|
|
|
|
|
|
|
| 94 |
info["images_count"] = len(image_files)
|
| 95 |
except:
|
| 96 |
pass
|
|
|
|
| 90 |
images_dir = os.path.join(self.data_dir, "images")
|
| 91 |
if os.path.exists(images_dir):
|
| 92 |
try:
|
| 93 |
+
# Support all major image formats
|
| 94 |
+
from utils.image_utils import get_supported_extensions
|
| 95 |
+
supported_exts = tuple(ext.lower() for ext in get_supported_extensions())
|
| 96 |
+
image_files = [f for f in os.listdir(images_dir) if f.lower().endswith(supported_exts)]
|
| 97 |
info["images_count"] = len(image_files)
|
| 98 |
except:
|
| 99 |
pass
|
utils/image_utils.py
ADDED
|
@@ -0,0 +1,374 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Comprehensive Image Format Support Utilities
|
| 3 |
+
|
| 4 |
+
This module provides robust image loading and processing that supports
|
| 5 |
+
all major image formats including JPG, PNG, WEBP, GIF, BMP, TIFF, etc.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import io
|
| 9 |
+
from typing import List, Optional, Tuple, Union
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
|
| 12 |
+
from PIL import Image, ImageFile, UnidentifiedImageError
|
| 13 |
+
import requests
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
# Enable PIL to load truncated images
|
| 17 |
+
ImageFile.LOAD_TRUNCATED_IMAGES = True
|
| 18 |
+
|
| 19 |
+
# Supported image formats
|
| 20 |
+
SUPPORTED_FORMATS = {
|
| 21 |
+
# Raster formats
|
| 22 |
+
'JPEG', 'JPG', # JPEG
|
| 23 |
+
'PNG', # PNG
|
| 24 |
+
'WEBP', # WebP
|
| 25 |
+
'GIF', # GIF (static frames)
|
| 26 |
+
'BMP', # Bitmap
|
| 27 |
+
'TIFF', 'TIF', # TIFF
|
| 28 |
+
'ICO', # Icon
|
| 29 |
+
'PCX', # PC Paintbrush
|
| 30 |
+
'PPM', # Portable Pixmap
|
| 31 |
+
'PBM', # Portable Bitmap
|
| 32 |
+
'PGM', # Portable Graymap
|
| 33 |
+
'XBM', # X Bitmap
|
| 34 |
+
'XPM', # X Pixmap
|
| 35 |
+
# Additional formats if available
|
| 36 |
+
'HEIF', 'HEIC', # HEIF/HEIC (if pillow-heif installed)
|
| 37 |
+
'AVIF', # AVIF (if pillow-avif-plugin installed)
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
# File extensions mapping
|
| 41 |
+
EXTENSION_TO_FORMAT = {
|
| 42 |
+
'.jpg': 'JPEG',
|
| 43 |
+
'.jpeg': 'JPEG',
|
| 44 |
+
'.png': 'PNG',
|
| 45 |
+
'.webp': 'WEBP',
|
| 46 |
+
'.gif': 'GIF',
|
| 47 |
+
'.bmp': 'BMP',
|
| 48 |
+
'.tiff': 'TIFF',
|
| 49 |
+
'.tif': 'TIFF',
|
| 50 |
+
'.ico': 'ICO',
|
| 51 |
+
'.pcx': 'PCX',
|
| 52 |
+
'.ppm': 'PPM',
|
| 53 |
+
'.pbm': 'PBM',
|
| 54 |
+
'.pgm': 'PGM',
|
| 55 |
+
'.xbm': 'XBM',
|
| 56 |
+
'.xpm': 'XPM',
|
| 57 |
+
'.heif': 'HEIF',
|
| 58 |
+
'.heic': 'HEIC',
|
| 59 |
+
'.avif': 'AVIF',
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
def is_image_file(filepath: Union[str, Path]) -> bool:
|
| 64 |
+
"""
|
| 65 |
+
Check if a file is a supported image format based on extension.
|
| 66 |
+
|
| 67 |
+
Args:
|
| 68 |
+
filepath: Path to the file
|
| 69 |
+
|
| 70 |
+
Returns:
|
| 71 |
+
True if the file appears to be a supported image format
|
| 72 |
+
"""
|
| 73 |
+
path = Path(filepath)
|
| 74 |
+
ext = path.suffix.lower()
|
| 75 |
+
return ext in EXTENSION_TO_FORMAT
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
def get_image_format(filepath: Union[str, Path]) -> Optional[str]:
|
| 79 |
+
"""
|
| 80 |
+
Get the image format from file extension.
|
| 81 |
+
|
| 82 |
+
Args:
|
| 83 |
+
filepath: Path to the file
|
| 84 |
+
|
| 85 |
+
Returns:
|
| 86 |
+
Format name (e.g., 'JPEG', 'PNG') or None if unknown
|
| 87 |
+
"""
|
| 88 |
+
path = Path(filepath)
|
| 89 |
+
ext = path.suffix.lower()
|
| 90 |
+
return EXTENSION_TO_FORMAT.get(ext)
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
def load_image_from_file(
|
| 94 |
+
filepath: Union[str, Path],
|
| 95 |
+
convert_to_rgb: bool = True,
|
| 96 |
+
raise_on_error: bool = False
|
| 97 |
+
) -> Optional[Image.Image]:
|
| 98 |
+
"""
|
| 99 |
+
Load an image from a file path, supporting all major formats.
|
| 100 |
+
|
| 101 |
+
Args:
|
| 102 |
+
filepath: Path to the image file
|
| 103 |
+
convert_to_rgb: Convert image to RGB mode (required for models)
|
| 104 |
+
raise_on_error: If True, raise exception on error; if False, return None
|
| 105 |
+
|
| 106 |
+
Returns:
|
| 107 |
+
PIL Image object or None if loading failed
|
| 108 |
+
"""
|
| 109 |
+
try:
|
| 110 |
+
path = Path(filepath)
|
| 111 |
+
|
| 112 |
+
# Check if file exists
|
| 113 |
+
if not path.exists():
|
| 114 |
+
if raise_on_error:
|
| 115 |
+
raise FileNotFoundError(f"Image file not found: {filepath}")
|
| 116 |
+
return None
|
| 117 |
+
|
| 118 |
+
# Check if it's a supported format
|
| 119 |
+
if not is_image_file(path):
|
| 120 |
+
if raise_on_error:
|
| 121 |
+
raise ValueError(f"Unsupported image format: {filepath}")
|
| 122 |
+
print(f"⚠️ Skipping unsupported format: {filepath}")
|
| 123 |
+
return None
|
| 124 |
+
|
| 125 |
+
# Open and load image
|
| 126 |
+
with Image.open(path) as img:
|
| 127 |
+
# Verify it's actually an image
|
| 128 |
+
img.verify()
|
| 129 |
+
|
| 130 |
+
# Re-open for actual use (verify() closes the file)
|
| 131 |
+
img = Image.open(path)
|
| 132 |
+
|
| 133 |
+
# Convert to RGB if needed (required for deep learning models)
|
| 134 |
+
if convert_to_rgb:
|
| 135 |
+
if img.mode != 'RGB':
|
| 136 |
+
# Handle different modes
|
| 137 |
+
if img.mode in ('RGBA', 'LA', 'P'):
|
| 138 |
+
# Create white background for transparency
|
| 139 |
+
background = Image.new('RGB', img.size, (255, 255, 255))
|
| 140 |
+
if img.mode == 'P':
|
| 141 |
+
img = img.convert('RGBA')
|
| 142 |
+
if img.mode in ('RGBA', 'LA'):
|
| 143 |
+
background.paste(img, mask=img.split()[-1]) # Use alpha channel as mask
|
| 144 |
+
img = background
|
| 145 |
+
else:
|
| 146 |
+
img = img.convert('RGB')
|
| 147 |
+
|
| 148 |
+
return img
|
| 149 |
+
|
| 150 |
+
except UnidentifiedImageError:
|
| 151 |
+
error_msg = f"❌ Cannot identify image format: {filepath}"
|
| 152 |
+
if raise_on_error:
|
| 153 |
+
raise ValueError(error_msg)
|
| 154 |
+
print(error_msg)
|
| 155 |
+
return None
|
| 156 |
+
except Exception as e:
|
| 157 |
+
error_msg = f"❌ Error loading image {filepath}: {str(e)}"
|
| 158 |
+
if raise_on_error:
|
| 159 |
+
raise
|
| 160 |
+
print(error_msg)
|
| 161 |
+
return None
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
def load_image_from_bytes(
|
| 165 |
+
image_bytes: bytes,
|
| 166 |
+
convert_to_rgb: bool = True,
|
| 167 |
+
raise_on_error: bool = False
|
| 168 |
+
) -> Optional[Image.Image]:
|
| 169 |
+
"""
|
| 170 |
+
Load an image from bytes, supporting all major formats.
|
| 171 |
+
|
| 172 |
+
Args:
|
| 173 |
+
image_bytes: Image data as bytes
|
| 174 |
+
convert_to_rgb: Convert image to RGB mode (required for models)
|
| 175 |
+
raise_on_error: If True, raise exception on error; if False, return None
|
| 176 |
+
|
| 177 |
+
Returns:
|
| 178 |
+
PIL Image object or None if loading failed
|
| 179 |
+
"""
|
| 180 |
+
try:
|
| 181 |
+
# Open from bytes
|
| 182 |
+
img = Image.open(io.BytesIO(image_bytes))
|
| 183 |
+
|
| 184 |
+
# Verify it's actually an image
|
| 185 |
+
img.verify()
|
| 186 |
+
|
| 187 |
+
# Re-open for actual use
|
| 188 |
+
img = Image.open(io.BytesIO(image_bytes))
|
| 189 |
+
|
| 190 |
+
# Convert to RGB if needed
|
| 191 |
+
if convert_to_rgb:
|
| 192 |
+
if img.mode != 'RGB':
|
| 193 |
+
if img.mode in ('RGBA', 'LA', 'P'):
|
| 194 |
+
background = Image.new('RGB', img.size, (255, 255, 255))
|
| 195 |
+
if img.mode == 'P':
|
| 196 |
+
img = img.convert('RGBA')
|
| 197 |
+
if img.mode in ('RGBA', 'LA'):
|
| 198 |
+
background.paste(img, mask=img.split()[-1])
|
| 199 |
+
img = background
|
| 200 |
+
else:
|
| 201 |
+
img = img.convert('RGB')
|
| 202 |
+
|
| 203 |
+
return img
|
| 204 |
+
|
| 205 |
+
except UnidentifiedImageError:
|
| 206 |
+
error_msg = "❌ Cannot identify image format from bytes"
|
| 207 |
+
if raise_on_error:
|
| 208 |
+
raise ValueError(error_msg)
|
| 209 |
+
print(error_msg)
|
| 210 |
+
return None
|
| 211 |
+
except Exception as e:
|
| 212 |
+
error_msg = f"❌ Error loading image from bytes: {str(e)}"
|
| 213 |
+
if raise_on_error:
|
| 214 |
+
raise
|
| 215 |
+
print(error_msg)
|
| 216 |
+
return None
|
| 217 |
+
|
| 218 |
+
|
| 219 |
+
def load_image_from_url(
|
| 220 |
+
url: str,
|
| 221 |
+
timeout: int = 20,
|
| 222 |
+
convert_to_rgb: bool = True,
|
| 223 |
+
raise_on_error: bool = False
|
| 224 |
+
) -> Optional[Image.Image]:
|
| 225 |
+
"""
|
| 226 |
+
Load an image from a URL, supporting all major formats.
|
| 227 |
+
|
| 228 |
+
Args:
|
| 229 |
+
url: URL to the image
|
| 230 |
+
timeout: Request timeout in seconds
|
| 231 |
+
convert_to_rgb: Convert image to RGB mode (required for models)
|
| 232 |
+
raise_on_error: If True, raise exception on error; if False, return None
|
| 233 |
+
|
| 234 |
+
Returns:
|
| 235 |
+
PIL Image object or None if loading failed
|
| 236 |
+
"""
|
| 237 |
+
try:
|
| 238 |
+
resp = requests.get(url, timeout=timeout, stream=True)
|
| 239 |
+
resp.raise_for_status()
|
| 240 |
+
|
| 241 |
+
# Check content type
|
| 242 |
+
content_type = resp.headers.get('Content-Type', '').lower()
|
| 243 |
+
if not any(fmt in content_type for fmt in ['image', 'jpeg', 'png', 'webp', 'gif']):
|
| 244 |
+
if raise_on_error:
|
| 245 |
+
raise ValueError(f"URL does not point to an image: {url}")
|
| 246 |
+
print(f"⚠️ URL does not appear to be an image: {url}")
|
| 247 |
+
return None
|
| 248 |
+
|
| 249 |
+
# Load from bytes
|
| 250 |
+
return load_image_from_bytes(resp.content, convert_to_rgb, raise_on_error)
|
| 251 |
+
|
| 252 |
+
except requests.RequestException as e:
|
| 253 |
+
error_msg = f"❌ Error fetching image from URL {url}: {str(e)}"
|
| 254 |
+
if raise_on_error:
|
| 255 |
+
raise
|
| 256 |
+
print(error_msg)
|
| 257 |
+
return None
|
| 258 |
+
except Exception as e:
|
| 259 |
+
error_msg = f"❌ Error loading image from URL {url}: {str(e)}"
|
| 260 |
+
if raise_on_error:
|
| 261 |
+
raise
|
| 262 |
+
print(error_msg)
|
| 263 |
+
return None
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
def load_images_from_files(
|
| 267 |
+
filepaths: List[Union[str, Path]],
|
| 268 |
+
convert_to_rgb: bool = True,
|
| 269 |
+
skip_errors: bool = True
|
| 270 |
+
) -> List[Image.Image]:
|
| 271 |
+
"""
|
| 272 |
+
Load multiple images from file paths, supporting all major formats.
|
| 273 |
+
|
| 274 |
+
Args:
|
| 275 |
+
filepaths: List of paths to image files
|
| 276 |
+
convert_to_rgb: Convert images to RGB mode (required for models)
|
| 277 |
+
skip_errors: If True, skip files that fail to load; if False, raise on first error
|
| 278 |
+
|
| 279 |
+
Returns:
|
| 280 |
+
List of PIL Image objects (only successfully loaded images)
|
| 281 |
+
"""
|
| 282 |
+
images = []
|
| 283 |
+
loaded_count = 0
|
| 284 |
+
failed_count = 0
|
| 285 |
+
|
| 286 |
+
for fp in filepaths:
|
| 287 |
+
img = load_image_from_file(fp, convert_to_rgb, raise_on_error=not skip_errors)
|
| 288 |
+
if img is not None:
|
| 289 |
+
images.append(img)
|
| 290 |
+
loaded_count += 1
|
| 291 |
+
else:
|
| 292 |
+
failed_count += 1
|
| 293 |
+
|
| 294 |
+
if failed_count > 0:
|
| 295 |
+
print(f"⚠️ Loaded {loaded_count} images, {failed_count} failed")
|
| 296 |
+
|
| 297 |
+
return images
|
| 298 |
+
|
| 299 |
+
|
| 300 |
+
def validate_image_format(img: Image.Image) -> Tuple[bool, Optional[str]]:
|
| 301 |
+
"""
|
| 302 |
+
Validate that an image is in a supported format and ready for processing.
|
| 303 |
+
|
| 304 |
+
Args:
|
| 305 |
+
img: PIL Image object
|
| 306 |
+
|
| 307 |
+
Returns:
|
| 308 |
+
Tuple of (is_valid, error_message)
|
| 309 |
+
"""
|
| 310 |
+
if img is None:
|
| 311 |
+
return False, "Image is None"
|
| 312 |
+
|
| 313 |
+
if not hasattr(img, 'mode'):
|
| 314 |
+
return False, "Invalid image object"
|
| 315 |
+
|
| 316 |
+
# Check if format is supported
|
| 317 |
+
if hasattr(img, 'format') and img.format:
|
| 318 |
+
if img.format not in SUPPORTED_FORMATS:
|
| 319 |
+
return False, f"Unsupported format: {img.format}"
|
| 320 |
+
|
| 321 |
+
# Check if image has valid size
|
| 322 |
+
if img.size[0] == 0 or img.size[1] == 0:
|
| 323 |
+
return False, "Image has zero dimensions"
|
| 324 |
+
|
| 325 |
+
return True, None
|
| 326 |
+
|
| 327 |
+
|
| 328 |
+
def ensure_rgb_image(img: Image.Image) -> Image.Image:
|
| 329 |
+
"""
|
| 330 |
+
Ensure an image is in RGB mode, converting if necessary.
|
| 331 |
+
|
| 332 |
+
Args:
|
| 333 |
+
img: PIL Image object
|
| 334 |
+
|
| 335 |
+
Returns:
|
| 336 |
+
RGB mode PIL Image
|
| 337 |
+
"""
|
| 338 |
+
if img.mode == 'RGB':
|
| 339 |
+
return img
|
| 340 |
+
|
| 341 |
+
if img.mode in ('RGBA', 'LA', 'P'):
|
| 342 |
+
# Handle transparency
|
| 343 |
+
background = Image.new('RGB', img.size, (255, 255, 255))
|
| 344 |
+
if img.mode == 'P':
|
| 345 |
+
img = img.convert('RGBA')
|
| 346 |
+
if img.mode in ('RGBA', 'LA'):
|
| 347 |
+
if img.mode == 'RGBA':
|
| 348 |
+
background.paste(img, mask=img.split()[-1])
|
| 349 |
+
else:
|
| 350 |
+
background.paste(img, mask=img.split()[-1])
|
| 351 |
+
return background
|
| 352 |
+
else:
|
| 353 |
+
return img.convert('RGB')
|
| 354 |
+
|
| 355 |
+
|
| 356 |
+
def get_supported_formats() -> List[str]:
|
| 357 |
+
"""
|
| 358 |
+
Get list of all supported image formats.
|
| 359 |
+
|
| 360 |
+
Returns:
|
| 361 |
+
List of format names
|
| 362 |
+
"""
|
| 363 |
+
return sorted(list(SUPPORTED_FORMATS))
|
| 364 |
+
|
| 365 |
+
|
| 366 |
+
def get_supported_extensions() -> List[str]:
|
| 367 |
+
"""
|
| 368 |
+
Get list of all supported file extensions.
|
| 369 |
+
|
| 370 |
+
Returns:
|
| 371 |
+
List of file extensions (with dots)
|
| 372 |
+
"""
|
| 373 |
+
return sorted(list(EXTENSION_TO_FORMAT.keys()))
|
| 374 |
+
|