Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,347 +1,279 @@
|
|
| 1 |
-
# 👔 Outfit Compatibility System
|
| 2 |
-
|
| 3 |
-
> **AI-Powered Fashion Recommendation Engine**
|
| 4 |
-
> Bitirme Projesi - Ocak 2026
|
| 5 |
-
|
| 6 |
-
Derin öğrenme tabanlı kıyafet uyumluluk analizi ve öneri sistemi. Kullanıcıların yüklediği kıyafetler için uyumlu parçalar önerir ve kombin skorlaması yapar.
|
| 7 |
-
|
| 8 |
---
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
-
|
| 13 |
-
|
| 14 |
-
-
|
| 15 |
-
-
|
| 16 |
-
-
|
| 17 |
-
-
|
| 18 |
-
-
|
| 19 |
-
-
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
| 21 |
---
|
| 22 |
|
| 23 |
-
#
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|---------|----------|
|
| 27 |
-
| **Uyumluluk Skoru** | Seçilen kıyafetlerin birbiriyle ne kadar uyumlu olduğunu 0-100% arasında skorlar |
|
| 28 |
-
| **Akıllı Öneri** | Mevcut kombine en uygun ürünleri AI ile önerir |
|
| 29 |
-
| **Kombin Tamamlama** | Eksik kategorileri otomatik olarak uyumlu ürünlerle doldurur |
|
| 30 |
-
| **Özel Ürün Yükleme** | Kullanıcının kendi kıyafet fotoğrafını yükleyip analiz etmesini sağlar |
|
| 31 |
-
| **Benzer Ürün Arama** | Yüklenen ürüne en benzer ürünleri veritabanından bulur |
|
| 32 |
-
| **Kategori Filtreleme** | Üst, alt, aksesuar, ayakkabı gibi kategorilere göre filtreleme |
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
| 39 |
-
┌─────────────────────────────────────────────────────────────────┐
|
| 40 |
-
│ FRONTEND │
|
| 41 |
-
│ (HTML/CSS/JavaScript) │
|
| 42 |
-
│ http://localhost:3000 │
|
| 43 |
-
└─────────────────────────┬───────────────────────────────────────┘
|
| 44 |
-
│
|
| 45 |
-
▼
|
| 46 |
-
┌─────────────────────────────────────────────────────────────────┐
|
| 47 |
-
│ API GATEWAY │
|
| 48 |
-
│ (FastAPI - Port 8000) │
|
| 49 |
-
│ /api/compatibility, /api/recommend, /api/complete-outfit │
|
| 50 |
-
└───────┬─────────────────┬─────────────────┬─────────────────────┘
|
| 51 |
-
│ │ │
|
| 52 |
-
▼ ▼ ▼
|
| 53 |
-
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
|
| 54 |
-
│ PostgreSQL │ │ Vector Index │ │ Outfit │
|
| 55 |
-
│ (Port 5432) │ │ (Port 8003) │ │ Transformer │
|
| 56 |
-
│ 5000+ Ürün │ │ FAISS Index │ │ (Port 8002) │
|
| 57 |
-
└───────────────┘ └───────────────┘ └───────┬───────┘
|
| 58 |
-
│
|
| 59 |
-
▼
|
| 60 |
-
┌───────────────┐
|
| 61 |
-
│ Feature │
|
| 62 |
-
│ Extractor │
|
| 63 |
-
│ (Port 8001) │
|
| 64 |
-
│ ResNet+LaBSE │
|
| 65 |
-
└───────────────┘
|
| 66 |
-
```
|
| 67 |
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|--------|------|-----------|----------|
|
| 72 |
-
| **API Gateway** | 8000 | FastAPI | Ana API servisi, tüm istekleri yönetir |
|
| 73 |
-
| **Frontend** | 3000 | HTML/JS | Web arayüzü |
|
| 74 |
-
| **PostgreSQL** | 5432 | PostgreSQL | Ürün veritabanı (5000+ ürün) |
|
| 75 |
-
| **Vector Index** | 8003 | FAISS | Vektör benzerlik arama (4938 vektör) |
|
| 76 |
-
| **Outfit Transformer** | 8002 | PyTorch | CIR modeli, uyumluluk skorlama |
|
| 77 |
-
| **Feature Extractor** | 8001 | ResNet+LaBSE | Görsel ve metin özellik çıkarma |
|
| 78 |
|
| 79 |
-
--
|
| 80 |
-
|
| 81 |
-
## 🚀 Kurulum
|
| 82 |
|
| 83 |
-
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
-
-
|
| 86 |
-
- PostgreSQL 14+
|
| 87 |
-
- Docker (opsiyonel)
|
| 88 |
|
| 89 |
-
|
| 90 |
|
| 91 |
-
```bash
|
| 92 |
-
# Virtual environment oluştur
|
| 93 |
-
python3 -m venv venv
|
| 94 |
-
source venv/bin/activate
|
| 95 |
-
|
| 96 |
-
# Bağımlılıkları kur
|
| 97 |
-
pip install torch torchvision
|
| 98 |
-
pip install transformers sentence-transformers
|
| 99 |
-
pip install fastapi uvicorn
|
| 100 |
-
pip install psycopg2-binary faiss-cpu
|
| 101 |
-
pip install numpy pillow requests
|
| 102 |
```
|
| 103 |
-
|
| 104 |
-
### 2. Veritabanını Hazırla
|
| 105 |
-
|
| 106 |
-
```bash
|
| 107 |
-
# PostgreSQL'i başlat
|
| 108 |
-
docker-compose up -d database
|
| 109 |
-
|
| 110 |
-
# Verileri yükle
|
| 111 |
-
python feed_db.py
|
| 112 |
```
|
| 113 |
|
| 114 |
-
|
|
|
|
|
|
|
| 115 |
|
| 116 |
-
```
|
| 117 |
-
#
|
| 118 |
-
|
|
|
|
| 119 |
|
| 120 |
-
#
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
|
| 125 |
-
|
|
|
|
| 126 |
|
| 127 |
-
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
-
##
|
| 130 |
|
| 131 |
-
```
|
| 132 |
-
|
| 133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
```
|
| 135 |
|
| 136 |
-
|
| 137 |
-
1. Katalogdan ürünlere tıklayarak kombine ekleyin
|
| 138 |
-
2. "Uyumluluk Kontrolü" ile skor alın
|
| 139 |
-
3. "Öneriler Al" ile uygun ürünler görün
|
| 140 |
-
4. "Kombini Tamamla" ile eksik kategorileri doldurun
|
| 141 |
|
| 142 |
-
|
| 143 |
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
-
#
|
| 149 |
-
python create_outfit.py --list --category Tops
|
| 150 |
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
|
|
|
|
|
|
|
|
|
| 154 |
|
| 155 |
-
|
| 156 |
|
| 157 |
-
##
|
| 158 |
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
GET /api/products?category=Tops&limit=20
|
| 162 |
```
|
| 163 |
|
| 164 |
-
###
|
| 165 |
-
```http
|
| 166 |
-
POST /api/compatibility
|
| 167 |
-
Content-Type: application/json
|
| 168 |
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
"details": {
|
| 178 |
-
"items": [
|
| 179 |
-
{"name": "Strap top", "score": 0.85},
|
| 180 |
-
{"name": "Jerry jogger", "score": 0.81}
|
| 181 |
-
]
|
| 182 |
-
}
|
| 183 |
-
}
|
| 184 |
```
|
| 185 |
|
| 186 |
-
###
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
```
|
| 194 |
|
| 195 |
-
###
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
```
|
| 204 |
|
| 205 |
-
##
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 217 |
}
|
| 218 |
```
|
| 219 |
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
## 🧠 Model Detayları
|
| 223 |
-
|
| 224 |
-
### CIR (Compositional Image Retrieval) Modeli
|
| 225 |
-
|
| 226 |
-
Outfit Transformer mimarisine dayalı, kıyafet uyumluluğu için özelleştirilmiş model.
|
| 227 |
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
| **Optimizer** | AdamW |
|
| 236 |
-
| **Learning Rate** | 1e-4 |
|
| 237 |
-
|
| 238 |
-
### Feature Extraction
|
| 239 |
-
|
| 240 |
-
| Bileşen | Model | Çıktı |
|
| 241 |
-
|---------|-------|-------|
|
| 242 |
-
| **Görsel** | ResNet-18 (pretrained) | 64-dim |
|
| 243 |
-
| **Metin** | LaBSE (sentence-transformers) | 64-dim |
|
| 244 |
-
| **Birleşik** | Concat + Projection | 128-dim |
|
| 245 |
-
|
| 246 |
-
### Eğitim Sonuçları
|
| 247 |
-
|
| 248 |
-
```
|
| 249 |
-
Polyvore Disjoint Test Set:
|
| 250 |
-
├── Compatibility AUC: 0.84
|
| 251 |
-
├── FITB Accuracy: 0.67
|
| 252 |
-
└── Top-5 Recall: 0.78
|
| 253 |
-
```
|
| 254 |
-
|
| 255 |
-
---
|
| 256 |
-
|
| 257 |
-
## 📊 Veri Seti
|
| 258 |
-
|
| 259 |
-
### H&M Fashion Dataset
|
| 260 |
-
- **Kaynak**: Kaggle H&M Personalized Fashion Recommendations
|
| 261 |
-
- **Ürün Sayısı**: 5,003 (veritabanında)
|
| 262 |
-
- **Vektör Sayısı**: 4,938 (index'te)
|
| 263 |
-
- **Kategoriler**: Tops, Bottoms, Innerwear, Accessories, Shoes, Outerwear
|
| 264 |
-
|
| 265 |
-
### Polyvore Outfits (Eğitim)
|
| 266 |
-
- **Outfit Sayısı**: 68,306
|
| 267 |
-
- **Ürün Sayısı**: 365,054
|
| 268 |
-
- **Kullanım**: Model eğitimi
|
| 269 |
-
|
| 270 |
-
---
|
| 271 |
-
|
| 272 |
-
## 📁 Proje Yapısı
|
| 273 |
-
|
| 274 |
-
```
|
| 275 |
-
Outfit-Compatibility/
|
| 276 |
-
├── README.md # Bu dosya
|
| 277 |
-
├── Makefile # Komut yönetimi
|
| 278 |
-
├── docker-compose.yml # Docker servisleri
|
| 279 |
-
│
|
| 280 |
-
├── frontend/
|
| 281 |
-
│ └── index.html # Web arayüzü
|
| 282 |
-
│
|
| 283 |
-
├── services/
|
| 284 |
-
│ ├── api/ # API Gateway
|
| 285 |
-
│ │ └── src/main.py
|
| 286 |
-
│ │
|
| 287 |
-
│ ├── database/ # PostgreSQL init
|
| 288 |
-
│ │ └── init/01_init.sql
|
| 289 |
-
│ │
|
| 290 |
-
│ ├── vector-index/ # FAISS index
|
| 291 |
-
│ │ └── src/main.py
|
| 292 |
-
│ │
|
| 293 |
-
│ ├── feature-extractor/ # ResNet + LaBSE
|
| 294 |
-
│ │ └── src/encoders.py
|
| 295 |
-
│ │
|
| 296 |
-
│ └── outfit-transformer/ # CIR Model
|
| 297 |
-
│ ├── src/models/cir_engine.py
|
| 298 |
-
│ └── src/models/cir_model.py
|
| 299 |
-
│
|
| 300 |
-
├── data/
|
| 301 |
-
│ ├── cir_model_best.pth # Eğitilmiş model
|
| 302 |
-
│ ├── vectors_hm.json # Ürün vektörleri
|
| 303 |
-
│ ├── hm_dataset/ # H&M verileri
|
| 304 |
-
│ └── polyvore_outfits/ # Polyvore verileri
|
| 305 |
-
│
|
| 306 |
-
├── colab/ # Google Colab eğitim scriptleri
|
| 307 |
-
│ ├── colab_train_cir.py
|
| 308 |
-
│ └── loss.py
|
| 309 |
-
│
|
| 310 |
-
├── create_outfit.py # CLI aracı
|
| 311 |
-
└── feed_db.py # Veritabanı yükleme
|
| 312 |
-
```
|
| 313 |
-
|
| 314 |
-
---
|
| 315 |
-
|
| 316 |
-
## 🛠 Makefile Komutları
|
| 317 |
-
|
| 318 |
-
```bash
|
| 319 |
-
make help # Tüm komutları göster
|
| 320 |
-
make start # API + Frontend başlat
|
| 321 |
-
make api # Sadece API başlat
|
| 322 |
-
make frontend # Sadece Frontend başlat
|
| 323 |
-
make stop # Servisleri durdur
|
| 324 |
-
make clean # Docker temizliği
|
| 325 |
-
make db # Veritabanını başlat
|
| 326 |
```
|
| 327 |
|
| 328 |
-
|
| 329 |
-
|
| 330 |
-
## 📝 Lisans
|
| 331 |
|
| 332 |
MIT License
|
| 333 |
-
|
| 334 |
-
---
|
| 335 |
-
|
| 336 |
-
## 👤 Geliştirici
|
| 337 |
-
|
| 338 |
-
**Bitirme Projesi** - Ocak 2026
|
| 339 |
-
|
| 340 |
-
---
|
| 341 |
-
|
| 342 |
-
## 🙏 Teşekkürler
|
| 343 |
-
|
| 344 |
-
- Polyvore Outfits Dataset
|
| 345 |
-
- H&M Personalized Fashion Recommendations (Kaggle)
|
| 346 |
-
- Outfit Transformer Paper
|
| 347 |
-
- Hugging Face Transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- tr
|
| 6 |
+
tags:
|
| 7 |
+
- fashion
|
| 8 |
+
- outfit-recommendation
|
| 9 |
+
- multimodal
|
| 10 |
+
- transformer
|
| 11 |
+
- image-text
|
| 12 |
+
- complementary-item-retrieval
|
| 13 |
+
- pytorch
|
| 14 |
+
datasets:
|
| 15 |
+
- polyvore
|
| 16 |
+
pipeline_tag: feature-extraction
|
| 17 |
---
|
| 18 |
|
| 19 |
+
# Outfit Transformer CIR (Complementary Item Retrieval)
|
| 20 |
|
| 21 |
+
A multimodal Transformer model for **fashion outfit completion** and **complementary item retrieval**. Given a partial outfit (e.g., a t-shirt and jeans), the model predicts the ideal embedding for a missing item (e.g., shoes) that would complete the outfit harmoniously.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
+
## Model Description
|
| 24 |
|
| 25 |
+
This model is based on the architecture proposed by **Sarkar et al.** in their paper on outfit recommendation, with several key modifications:
|
| 26 |
|
| 27 |
+
### Differences from Original Paper
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
+
| Aspect | Original (Sarkar et al.) | This Implementation |
|
| 30 |
+
|--------|--------------------------|---------------------|
|
| 31 |
+
| **Text Encoder** | BERT (768-dim) | **LaBSE** (768-dim) |
|
| 32 |
+
| **Text Language** | English only | Multilingual (109 languages) |
|
| 33 |
+
| **Negative Sampling** | Random | **Hard Negative Mining** (same category) |
|
| 34 |
|
| 35 |
+
### Why LaBSE instead of BERT?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
+
[LaBSE (Language-agnostic BERT Sentence Embedding)](https://huggingface.co/sentence-transformers/LaBSE) was chosen because:
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
1. **Multilingual Support**: Works with 109 languages, enabling Turkish/English fashion descriptions
|
| 40 |
+
2. **Cross-lingual Alignment**: "Mavi tişört" and "blue t-shirt" produce similar embeddings
|
| 41 |
+
3. **Same Dimensionality**: Still outputs 768-dim vectors, compatible with the original architecture
|
| 42 |
+
4. **Production Ready**: Better suited for real-world e-commerce applications
|
| 43 |
|
| 44 |
+
### Loss Function: Set-wise Outfit Ranking Loss
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
Following the original paper, we use the **Set-wise Outfit Ranking Loss** (Section 3.2.2):
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
```
|
| 49 |
+
L_set = L_all + L_hard
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
```
|
| 51 |
|
| 52 |
+
Where:
|
| 53 |
+
- **L_all**: Margin-based ranking over all negatives
|
| 54 |
+
- **L_hard**: Extra penalty on the hardest negative (closest wrong answer)
|
| 55 |
|
| 56 |
+
```python
|
| 57 |
+
# L_ALL: General ranking loss
|
| 58 |
+
diff_all = pos_dist - neg_dist + margin # margin = 2.0
|
| 59 |
+
loss_all = ReLU(diff_all).mean()
|
| 60 |
|
| 61 |
+
# L_HARD: Hardest negative focus
|
| 62 |
+
min_neg_dist = neg_dist.min(dim=1)
|
| 63 |
+
diff_hard = pos_dist - min_neg_dist + margin
|
| 64 |
+
loss_hard = ReLU(diff_hard).mean()
|
| 65 |
|
| 66 |
+
total_loss = loss_all + loss_hard
|
| 67 |
+
```
|
| 68 |
|
| 69 |
+
**Why this helps:**
|
| 70 |
+
- InfoNCE treats all negatives equally via softmax
|
| 71 |
+
- Set-wise loss explicitly penalizes the hardest negative
|
| 72 |
+
- Reduces **hubness problem** where popular items dominate retrieval
|
| 73 |
|
| 74 |
+
## Architecture
|
| 75 |
|
| 76 |
+
```
|
| 77 |
+
┌─────────────────────────────────────────────────────────────┐
|
| 78 |
+
│ OutfitTransformerCIR │
|
| 79 |
+
├─────────────────────────────────────────────────────────────┤
|
| 80 |
+
│ │
|
| 81 |
+
│ ┌──────────────┐ ┌──────────────┐ │
|
| 82 |
+
│ │ ResNet-18 │ │ LaBSE │ │
|
| 83 |
+
│ │ (Frozen) │ │ (Frozen) │ │
|
| 84 |
+
│ │ 512-dim │ │ 768-dim │ │
|
| 85 |
+
│ └──────┬───────┘ └──────┬───────┘ │
|
| 86 |
+
│ │ │ │
|
| 87 |
+
│ ┌──────▼───────┐ ┌──────▼───────┐ │
|
| 88 |
+
│ │ Visual Proj │ │ Text Proj │ ← Trained │
|
| 89 |
+
│ │ 512 → 64 │ │ 768 → 64 │ │
|
| 90 |
+
│ └──────┬───────┘ └──────┬───────┘ │
|
| 91 |
+
│ │ │ │
|
| 92 |
+
│ └────────┬──────────┘ │
|
| 93 |
+
│ │ │
|
| 94 |
+
│ ┌──────▼──────┐ │
|
| 95 |
+
│ │ Concat │ │
|
| 96 |
+
│ │ 64+64 = 128 │ │
|
| 97 |
+
│ └──────┬──────┘ │
|
| 98 |
+
│ │ │
|
| 99 |
+
│ ┌─────────────▼─────────────┐ │
|
| 100 |
+
│ │ [QUERY] + Item Embeddings │ │
|
| 101 |
+
│ │ (Learnable Token) │ │
|
| 102 |
+
│ └─────────────┬─────────────┘ │
|
| 103 |
+
│ │ │
|
| 104 |
+
│ ┌─────────────▼─────────────┐ │
|
| 105 |
+
│ │ Transformer Encoder │ │
|
| 106 |
+
│ │ 6 layers, 16 heads │ │
|
| 107 |
+
│ │ d_model=128, ff=512 │ │
|
| 108 |
+
│ └─────────────┬─────────────┘ │
|
| 109 |
+
│ │ │
|
| 110 |
+
│ ┌─────────────▼─────────────┐ │
|
| 111 |
+
│ │ Output Projection │ │
|
| 112 |
+
│ │ + LayerNorm + L2 Norm │ │
|
| 113 |
+
│ └─────────────┬─────────────┘ │
|
| 114 |
+
│ │ │
|
| 115 |
+
│ ┌──────▼──────┐ │
|
| 116 |
+
│ │ 128-dim │ │
|
| 117 |
+
│ │ Predicted │ │
|
| 118 |
+
│ │ Embedding │ │
|
| 119 |
+
│ └─────────────┘ │
|
| 120 |
+
│ │
|
| 121 |
+
└─────────────────────────────────────────────────────────────┘
|
| 122 |
```
|
| 123 |
|
| 124 |
+
## Benchmark Results
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
+
Evaluated on **Polyvore Outfits** dataset (disjoint split):
|
| 127 |
|
| 128 |
+
| Metric | Score |
|
| 129 |
+
|--------|-------|
|
| 130 |
+
| **FITB Accuracy** | 56.39% |
|
| 131 |
+
| **MRR** | 0.7447 |
|
| 132 |
+
| **Recall@1** | 56.39% |
|
| 133 |
+
| **Recall@2** | 80.86% |
|
| 134 |
+
| **Recall@3** | 93.56% |
|
| 135 |
+
| **NDCG@3** | 0.7818 |
|
| 136 |
+
| **NDCG@5** | 0.8095 |
|
| 137 |
|
| 138 |
+
### Comparison with Baselines
|
|
|
|
| 139 |
|
| 140 |
+
| Model | FITB Accuracy | Notes |
|
| 141 |
+
|-------|---------------|-------|
|
| 142 |
+
| Random | 25.00% | 4-choice task |
|
| 143 |
+
| Type-Aware (Vasileva 2018) | ~53% | Category-specific spaces |
|
| 144 |
+
| **Ours (LaBSE + SetWise)** | **56.39%** | Multilingual, margin-based |
|
| 145 |
+
| Sarkar et al. (reported) | ~57% | English BERT, InfoNCE |
|
| 146 |
|
| 147 |
+
## Usage
|
| 148 |
|
| 149 |
+
### Installation
|
| 150 |
|
| 151 |
+
```bash
|
| 152 |
+
pip install torch torchvision transformers
|
|
|
|
| 153 |
```
|
| 154 |
|
| 155 |
+
### Loading the Model
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
+
```python
|
| 158 |
+
import torch
|
| 159 |
+
from model import OutfitTransformerCIR
|
| 160 |
|
| 161 |
+
# Load model
|
| 162 |
+
model = OutfitTransformerCIR(embedding_dim=128, nhead=16, num_layers=6)
|
| 163 |
+
model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
|
| 164 |
+
model.eval()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
```
|
| 166 |
|
| 167 |
+
### Inference Example
|
| 168 |
+
|
| 169 |
+
```python
|
| 170 |
+
# Assume you have pre-extracted features:
|
| 171 |
+
# context_images: (1, num_items, 512) - ResNet-18 features
|
| 172 |
+
# context_texts: (1, num_items, 768) - LaBSE embeddings
|
| 173 |
+
|
| 174 |
+
with torch.no_grad():
|
| 175 |
+
# Predict missing item embedding
|
| 176 |
+
predicted_embedding = model(context_images, context_texts)
|
| 177 |
+
# predicted_embedding: (1, 128)
|
| 178 |
+
|
| 179 |
+
# Use cosine similarity to find closest items in your database
|
| 180 |
+
similarities = torch.cosine_similarity(predicted_embedding, item_database)
|
| 181 |
+
top_matches = similarities.argsort(descending=True)[:10]
|
| 182 |
```
|
| 183 |
|
| 184 |
+
### Feature Extraction (for your own items)
|
| 185 |
+
|
| 186 |
+
```python
|
| 187 |
+
from torchvision import models, transforms
|
| 188 |
+
from transformers import AutoTokenizer, AutoModel
|
| 189 |
+
from PIL import Image
|
| 190 |
+
import torch.nn as nn
|
| 191 |
+
|
| 192 |
+
# Image encoder (ResNet-18)
|
| 193 |
+
resnet = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
|
| 194 |
+
resnet = nn.Sequential(*list(resnet.children())[:-1])
|
| 195 |
+
resnet.eval()
|
| 196 |
+
|
| 197 |
+
preprocess = transforms.Compose([
|
| 198 |
+
transforms.Resize((224, 224)),
|
| 199 |
+
transforms.ToTensor(),
|
| 200 |
+
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
|
| 201 |
+
])
|
| 202 |
+
|
| 203 |
+
# Text encoder (LaBSE)
|
| 204 |
+
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/LaBSE")
|
| 205 |
+
labse = AutoModel.from_pretrained("sentence-transformers/LaBSE")
|
| 206 |
+
labse.eval()
|
| 207 |
+
|
| 208 |
+
def extract_features(image_path, text_description):
|
| 209 |
+
# Image: 512-dim
|
| 210 |
+
image = Image.open(image_path).convert('RGB')
|
| 211 |
+
img_tensor = preprocess(image).unsqueeze(0)
|
| 212 |
+
with torch.no_grad():
|
| 213 |
+
img_features = resnet(img_tensor).flatten(1) # (1, 512)
|
| 214 |
+
|
| 215 |
+
# Text: 768-dim
|
| 216 |
+
inputs = tokenizer(text_description, return_tensors="pt", padding=True, truncation=True)
|
| 217 |
+
with torch.no_grad():
|
| 218 |
+
txt_features = labse(**inputs).pooler_output # (1, 768)
|
| 219 |
+
|
| 220 |
+
return img_features, txt_features
|
| 221 |
```
|
| 222 |
|
| 223 |
+
## Training Details
|
| 224 |
+
|
| 225 |
+
| Hyperparameter | Value |
|
| 226 |
+
|----------------|-------|
|
| 227 |
+
| Optimizer | AdamW |
|
| 228 |
+
| Learning Rate | 1e-5 |
|
| 229 |
+
| Weight Decay | 0.01 |
|
| 230 |
+
| Batch Size | 64 |
|
| 231 |
+
| Epochs | 50 |
|
| 232 |
+
| Warmup Epochs | 2 |
|
| 233 |
+
| LR Scheduler | StepLR (step=10, gamma=0.5) |
|
| 234 |
+
| Margin (loss) | 2.0 |
|
| 235 |
+
| Num Negatives | 10 |
|
| 236 |
+
| Hard Negative Ratio | 50% (same category) |
|
| 237 |
+
|
| 238 |
+
### Training Data
|
| 239 |
+
|
| 240 |
+
- **Dataset**: Polyvore Outfits (Maryland split, disjoint)
|
| 241 |
+
- **Train**: ~17K outfits, ~250K items
|
| 242 |
+
- **Validation**: ~2K outfits
|
| 243 |
+
- **Test**: ~3K outfits
|
| 244 |
+
|
| 245 |
+
## Limitations
|
| 246 |
+
|
| 247 |
+
1. **Fixed Item Length**: Model expects max 8 items per outfit (padding applied)
|
| 248 |
+
2. **Frozen Encoders**: ResNet-18 and LaBSE are frozen during training
|
| 249 |
+
3. **Hubness**: Some popular items may dominate retrieval (mitigated with CSLS)
|
| 250 |
+
4. **Fashion Domain**: Trained on Polyvore data, may not generalize to other domains
|
| 251 |
+
|
| 252 |
+
## Citation
|
| 253 |
+
|
| 254 |
+
If you use this model, please cite:
|
| 255 |
+
|
| 256 |
+
```bibtex
|
| 257 |
+
@misc{outfit-cir-transformer,
|
| 258 |
+
author = {Kuyumcu, Furkan},
|
| 259 |
+
title = {Outfit Transformer CIR: Multilingual Complementary Item Retrieval},
|
| 260 |
+
year = {2026},
|
| 261 |
+
publisher = {Hugging Face},
|
| 262 |
+
url = {https://huggingface.co/fkuyumcu/outfit-cir-transformer}
|
| 263 |
}
|
| 264 |
```
|
| 265 |
|
| 266 |
+
### Original Paper Reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 267 |
|
| 268 |
+
```bibtex
|
| 269 |
+
@inproceedings{sarkar2022outfitbert,
|
| 270 |
+
title={OutfitTransformer: Learning Outfit Representations for Fashion Recommendation},
|
| 271 |
+
author={Sarkar, Rohan and others},
|
| 272 |
+
booktitle={CVPR Workshop on Computer Vision for Fashion, Art, and Design},
|
| 273 |
+
year={2022}
|
| 274 |
+
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 275 |
```
|
| 276 |
|
| 277 |
+
## License
|
|
|
|
|
|
|
| 278 |
|
| 279 |
MIT License
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|