fkuyumcu
/

OutfitTransformer-labse

@@ -1,347 +1,279 @@
-# 👔 Outfit Compatibility System
-> **AI-Powered Fashion Recommendation Engine**
-> Bitirme Projesi - Ocak 2026
-Derin öğrenme tabanlı kıyafet uyumluluk analizi ve öneri sistemi. Kullanıcıların yüklediği kıyafetler için uyumlu parçalar önerir ve kombin skorlaması yapar.
 ---
-## 📋 İçindekiler
-- [Özellikler](#-özellikler)
-- [Mimari](#-mimari)
-- [Kurulum](#-kurulum)
-- [Kullanım](#-kullanım)
-- [API Endpoints](#-api-endpoints)
-- [Model Detayları](#-model-detayları)
-- [Veri Seti](#-veri-seti)
-- [Proje Yapısı](#-proje-yapısı)
 ---
-## ✨ Özellikler
-| Özellik | Açıklama |
-|---------|----------|
-| **Uyumluluk Skoru** | Seçilen kıyafetlerin birbiriyle ne kadar uyumlu olduğunu 0-100% arasında skorlar |
-| **Akıllı Öneri** | Mevcut kombine en uygun ürünleri AI ile önerir |
-| **Kombin Tamamlama** | Eksik kategorileri otomatik olarak uyumlu ürünlerle doldurur |
-| **Özel Ürün Yükleme** | Kullanıcının kendi kıyafet fotoğrafını yükleyip analiz etmesini sağlar |
-| **Benzer Ürün Arama** | Yüklenen ürüne en benzer ürünleri veritabanından bulur |
-| **Kategori Filtreleme** | Üst, alt, aksesuar, ayakkabı gibi kategorilere göre filtreleme |
----
-## 🏗 Mimari
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                         FRONTEND                                │
-│                    (HTML/CSS/JavaScript)                        │
-│                   http://localhost:3000                         │
-└─────────────────────────┬───────────────────────────────────────┘
-                          │
-                          ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                       API GATEWAY                               │
-│                  (FastAPI - Port 8000)                          │
-│    /api/compatibility, /api/recommend, /api/complete-outfit     │
-└───────┬─────────────────┬─────────────────┬─────────────────────┘
-        │                 │                 │
-        ▼                 ▼                 ▼
-┌───────────────┐ ┌───────────────┐ ┌───────────────┐
-│   PostgreSQL  │ │ Vector Index  │ │    Outfit     │
-│   (Port 5432) │ │  (Port 8003)  │ │  Transformer  │
-│   5000+ Ürün  │ │  FAISS Index  │ │  (Port 8002)  │
-└───────────────┘ └───────────────┘ └───────┬───────┘
-                                            │
-                                            ▼
-                                    ┌───────────────┐
-                                    │   Feature     │
-                                    │  Extractor    │
-                                    │  (Port 8001)  │
-                                    │ ResNet+LaBSE  │
-                                    └───────────────┘
-```
-### Servisler
-| Servis | Port | Teknoloji | Açıklama |
-|--------|------|-----------|----------|
-| **API Gateway** | 8000 | FastAPI | Ana API servisi, tüm istekleri yönetir |
-| **Frontend** | 3000 | HTML/JS | Web arayüzü |
-| **PostgreSQL** | 5432 | PostgreSQL | Ürün veritabanı (5000+ ürün) |
-| **Vector Index** | 8003 | FAISS | Vektör benzerlik arama (4938 vektör) |
-| **Outfit Transformer** | 8002 | PyTorch | CIR modeli, uyumluluk skorlama |
-| **Feature Extractor** | 8001 | ResNet+LaBSE | Görsel ve metin özellik çıkarma |
----
-## 🚀 Kurulum
-### Gereksinimler
-- Python 3.10+
-- PostgreSQL 14+
-- Docker (opsiyonel)
-### 1. Bağımlılıkları Kur
-```bash
-# Virtual environment oluştur
-python3 -m venv venv
-source venv/bin/activate
-# Bağımlılıkları kur
-pip install torch torchvision
-pip install transformers sentence-transformers
-pip install fastapi uvicorn
-pip install psycopg2-binary faiss-cpu
-pip install numpy pillow requests
 ```
-### 2. Veritabanını Hazırla
-```bash
-# PostgreSQL'i başlat
-docker-compose up -d database
-# Verileri yükle
-python feed_db.py
 ```
-### 3. Servisleri Başlat
-```bash
-# Hepsini tek seferde
-make start
-# veya ayrı ayrı
-make api        # API Gateway
-make frontend   # Web Arayüzü
-```
----
-## 💻 Kullanım
-### Web Arayüzü
-```bash
-make frontend
-# Tarayıcıda: http://localhost:3000/frontend/
 ```
-**Kullanım Adımları:**
-1. Katalogdan ürünlere tıklayarak kombine ekleyin
-2. "Uyumluluk Kontrolü" ile skor alın
-3. "Öneriler Al" ile uygun ürünler görün
-4. "Kombini Tamamla" ile eksik kategorileri doldurun
-### CLI Aracı
-```bash
-# Rastgele kombin oluştur
-python create_outfit.py --random
-# Belirli kategoriden ürünleri listele
-python create_outfit.py --list --category Tops
-# Belirli ürünle kombin oluştur
-python create_outfit.py 0108775015 --size 4
-```
----
-## 🔌 API Endpoints
-### Ürün Listesi
-```http
-GET /api/products?category=Tops&limit=20
 ```
-### Uyumluluk Kontrolü
-```http
-POST /api/compatibility
-Content-Type: application/json
-{
-  "outfit_items": ["0108775015", "0118458034", "0126589006"]
-}
-Response:
-{
-  "score": 0.83,
-  "interpretation": "Mükemmel Uyum",
-  "details": {
-    "items": [
-      {"name": "Strap top", "score": 0.85},
-      {"name": "Jerry jogger", "score": 0.81}
-    ]
-  }
-}
 ```
-### Öneri Al
-```http
-POST /api/recommend
-{
-  "outfit_items": ["0108775015"],
-  "top_k": 5
-}
 ```
-### Kombin Tamamla
-```http
-POST /api/complete-outfit
-{
-  "partial_outfit": ["0108775015"],
-  "target_size": 4,
-  "target_categories": ["Bottoms", "Accessories"]
-}
 ```
-### Sağlık Kontrolü
-```http
-GET /health
-Response:
-{
-  "status": "ok",
-  "services": {
-    "api": "ok",
-    "database": "ok",
-    "vector_index": "ok"
-  }
 }
 ```
----
-## 🧠 Model Detayları
-### CIR (Compositional Image Retrieval) Modeli
-Outfit Transformer mimarisine dayalı, kıyafet uyumluluğu için özelleştirilmiş model.
-| Parametre | Değer |
-|-----------|-------|
-| **Embedding Boyutu** | 128 (64 görsel + 64 metin) |
-| **Transformer Katmanları** | 6 |
-| **Attention Heads** | 8 |
-| **Eğitim Veri Seti** | Polyvore Outfits |
-| **Eğitim Epoch** | 50 |
-| **Optimizer** | AdamW |
-| **Learning Rate** | 1e-4 |
-### Feature Extraction
-| Bileşen | Model | Çıktı |
-|---------|-------|-------|
-| **Görsel** | ResNet-18 (pretrained) | 64-dim |
-| **Metin** | LaBSE (sentence-transformers) | 64-dim |
-| **Birleşik** | Concat + Projection | 128-dim |
-### Eğitim Sonuçları
-```
-Polyvore Disjoint Test Set:
-├── Compatibility AUC: 0.84
-├── FITB Accuracy: 0.67
-└── Top-5 Recall: 0.78
-```
----
-## 📊 Veri Seti
-### H&M Fashion Dataset
-- **Kaynak**: Kaggle H&M Personalized Fashion Recommendations
-- **Ürün Sayısı**: 5,003 (veritabanında)
-- **Vektör Sayısı**: 4,938 (index'te)
-- **Kategoriler**: Tops, Bottoms, Innerwear, Accessories, Shoes, Outerwear
-### Polyvore Outfits (Eğitim)
-- **Outfit Sayısı**: 68,306
-- **Ürün Sayısı**: 365,054
-- **Kullanım**: Model eğitimi
----
-## 📁 Proje Yapısı
-```
-Outfit-Compatibility/
-├── README.md                 # Bu dosya
-├── Makefile                  # Komut yönetimi
-├── docker-compose.yml        # Docker servisleri
-│
-├── frontend/
-│   └── index.html           # Web arayüzü
-│
-├── services/
-│   ├── api/                 # API Gateway
-│   │   └── src/main.py
-│   │
-│   ├── database/            # PostgreSQL init
-│   │   └── init/01_init.sql
-│   │
-│   ├── vector-index/        # FAISS index
-│   │   └── src/main.py
-│   │
-│   ├── feature-extractor/   # ResNet + LaBSE
-│   │   └── src/encoders.py
-│   │
-│   └── outfit-transformer/  # CIR Model
-│       ├── src/models/cir_engine.py
-│       └── src/models/cir_model.py
-│
-├── data/
-│   ├── cir_model_best.pth   # Eğitilmiş model
-│   ├── vectors_hm.json      # Ürün vektörleri
-│   ├── hm_dataset/          # H&M verileri
-│   └── polyvore_outfits/    # Polyvore verileri
-│
-├── colab/                   # Google Colab eğitim scriptleri
-│   ├── colab_train_cir.py
-│   └── loss.py
-│
-├── create_outfit.py         # CLI aracı
-└── feed_db.py              # Veritabanı yükleme
-```
----
-## 🛠 Makefile Komutları
-```bash
-make help       # Tüm komutları göster
-make start      # API + Frontend başlat
-make api        # Sadece API başlat
-make frontend   # Sadece Frontend başlat
-make stop       # Servisleri durdur
-make clean      # Docker temizliği
-make db         # Veritabanını başlat
 ```
----
-## 📝 Lisans
 MIT License
----
-## 👤 Geliştirici
-**Bitirme Projesi** - Ocak 2026
----
-## 🙏 Teşekkürler
-- Polyvore Outfits Dataset
-- H&M Personalized Fashion Recommendations (Kaggle)
-- Outfit Transformer Paper
-- Hugging Face Transformers

 ---
+license: mit
+language:
+  - en
+  - tr
+tags:
+  - fashion
+  - outfit-recommendation
+  - multimodal
+  - transformer
+  - image-text
+  - complementary-item-retrieval
+  - pytorch
+datasets:
+  - polyvore
+pipeline_tag: feature-extraction
 ---
+# Outfit Transformer CIR (Complementary Item Retrieval)
+A multimodal Transformer model for **fashion outfit completion** and **complementary item retrieval**. Given a partial outfit (e.g., a t-shirt and jeans), the model predicts the ideal embedding for a missing item (e.g., shoes) that would complete the outfit harmoniously.
+## Model Description
+This model is based on the architecture proposed by **Sarkar et al.** in their paper on outfit recommendation, with several key modifications:
+### Differences from Original Paper
+| Aspect | Original (Sarkar et al.) | This Implementation |
+|--------|--------------------------|---------------------|
+| **Text Encoder** | BERT (768-dim) | **LaBSE** (768-dim) |
+| **Text Language** | English only | Multilingual (109 languages) |
+| **Negative Sampling** | Random | **Hard Negative Mining** (same category) |
+### Why LaBSE instead of BERT?
+[LaBSE (Language-agnostic BERT Sentence Embedding)](https://huggingface.co/sentence-transformers/LaBSE) was chosen because:
+1. **Multilingual Support**: Works with 109 languages, enabling Turkish/English fashion descriptions
+2. **Cross-lingual Alignment**: "Mavi tişört" and "blue t-shirt" produce similar embeddings
+3. **Same Dimensionality**: Still outputs 768-dim vectors, compatible with the original architecture
+4. **Production Ready**: Better suited for real-world e-commerce applications
+### Loss Function: Set-wise Outfit Ranking Loss
+Following the original paper, we use the **Set-wise Outfit Ranking Loss** (Section 3.2.2):
 ```
+L_set = L_all + L_hard
 ```
+Where:
+- **L_all**: Margin-based ranking over all negatives
+- **L_hard**: Extra penalty on the hardest negative (closest wrong answer)
+```python
+# L_ALL: General ranking loss
+diff_all = pos_dist - neg_dist + margin  # margin = 2.0
+loss_all = ReLU(diff_all).mean()
+# L_HARD: Hardest negative focus
+min_neg_dist = neg_dist.min(dim=1)
+diff_hard = pos_dist - min_neg_dist + margin
+loss_hard = ReLU(diff_hard).mean()
+total_loss = loss_all + loss_hard
+```
+**Why this helps:**
+- InfoNCE treats all negatives equally via softmax
+- Set-wise loss explicitly penalizes the hardest negative
+- Reduces **hubness problem** where popular items dominate retrieval
+## Architecture
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    OutfitTransformerCIR                     │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│  ┌──────────────┐    ┌──────────────┐                       │
+│  │ ResNet-18    │    │ LaBSE        │                       │
+│  │ (Frozen)     │    │ (Frozen)     │                       │
+│  │ 512-dim      │    │ 768-dim      │                       │
+│  └──────┬───────┘    └──────┬───────┘                       │
+│         │                   │                               │
+│  ┌──────▼───────┐    ┌──────▼───────┐                       │
+│  │ Visual Proj  │    │ Text Proj    │   ← Trained          │
+│  │ 512 → 64     │    │ 768 → 64     │                       │
+│  └──────┬───────┘    └──────┬───────┘                       │
+│         │                   │                               │
+│         └────────┬──────────┘                               │
+│                  │                                          │
+│           ┌──────▼──────┐                                   │
+│           │ Concat      │                                   │
+│           │ 64+64 = 128 │                                   │
+│           └──────┬──────┘                                   │
+│                  │                                          │
+│    ┌─────────────▼─────────────┐                            │
+│    │ [QUERY] + Item Embeddings │                            │
+│    │     (Learnable Token)     │                            │
+│    └─────────────┬─────────────┘                            │
+│                  │                                          │
+│    ┌─────────────▼─────────────┐                            │
+│    │ Transformer Encoder       │                            │
+│    │ 6 layers, 16 heads        │                            │
+│    │ d_model=128, ff=512       │                            │
+│    └─────────────┬─────────────┘                            │
+│                  │                                          │
+│    ┌─────────────▼─────────────┐                            │
+│    │ Output Projection         │                            │
+│    │ + LayerNorm + L2 Norm     │                            │
+│    └─────────────┬─────────────┘                            │
+│                  │                                          │
+│           ┌──────▼──────┐                                   │
+│           │ 128-dim     │                                   │
+│           │ Predicted   │                                   │
+│           │ Embedding   │                                   │
+│           └─────────────┘                                   │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
 ```
+## Benchmark Results
+Evaluated on **Polyvore Outfits** dataset (disjoint split):
+| Metric | Score |
+|--------|-------|
+| **FITB Accuracy** | 56.39% |
+| **MRR** | 0.7447 |
+| **Recall@1** | 56.39% |
+| **Recall@2** | 80.86% |
+| **Recall@3** | 93.56% |
+| **NDCG@3** | 0.7818 |
+| **NDCG@5** | 0.8095 |
+### Comparison with Baselines
+| Model | FITB Accuracy | Notes |
+|-------|---------------|-------|
+| Random | 25.00% | 4-choice task |
+| Type-Aware (Vasileva 2018) | ~53% | Category-specific spaces |
+| **Ours (LaBSE + SetWise)** | **56.39%** | Multilingual, margin-based |
+| Sarkar et al. (reported) | ~57% | English BERT, InfoNCE |
+## Usage
+### Installation
+```bash
+pip install torch torchvision transformers
 ```
+### Loading the Model
+```python
+import torch
+from model import OutfitTransformerCIR
+# Load model
+model = OutfitTransformerCIR(embedding_dim=128, nhead=16, num_layers=6)
+model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
+model.eval()
 ```
+### Inference Example
+```python
+# Assume you have pre-extracted features:
+# context_images: (1, num_items, 512) - ResNet-18 features
+# context_texts: (1, num_items, 768) - LaBSE embeddings
+with torch.no_grad():
+    # Predict missing item embedding
+    predicted_embedding = model(context_images, context_texts)
+    # predicted_embedding: (1, 128)
+    # Use cosine similarity to find closest items in your database
+    similarities = torch.cosine_similarity(predicted_embedding, item_database)
+    top_matches = similarities.argsort(descending=True)[:10]
 ```
+### Feature Extraction (for your own items)
+```python
+from torchvision import models, transforms
+from transformers import AutoTokenizer, AutoModel
+from PIL import Image
+import torch.nn as nn
+# Image encoder (ResNet-18)
+resnet = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
+resnet = nn.Sequential(*list(resnet.children())[:-1])
+resnet.eval()
+preprocess = transforms.Compose([
+    transforms.Resize((224, 224)),
+    transforms.ToTensor(),
+    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
+])
+# Text encoder (LaBSE)
+tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/LaBSE")
+labse = AutoModel.from_pretrained("sentence-transformers/LaBSE")
+labse.eval()
+def extract_features(image_path, text_description):
+    # Image: 512-dim
+    image = Image.open(image_path).convert('RGB')
+    img_tensor = preprocess(image).unsqueeze(0)
+    with torch.no_grad():
+        img_features = resnet(img_tensor).flatten(1)  # (1, 512)
+    # Text: 768-dim
+    inputs = tokenizer(text_description, return_tensors="pt", padding=True, truncation=True)
+    with torch.no_grad():
+        txt_features = labse(**inputs).pooler_output  # (1, 768)
+    return img_features, txt_features
 ```
+## Training Details
+| Hyperparameter | Value |
+|----------------|-------|
+| Optimizer | AdamW |
+| Learning Rate | 1e-5 |
+| Weight Decay | 0.01 |
+| Batch Size | 64 |
+| Epochs | 50 |
+| Warmup Epochs | 2 |
+| LR Scheduler | StepLR (step=10, gamma=0.5) |
+| Margin (loss) | 2.0 |
+| Num Negatives | 10 |
+| Hard Negative Ratio | 50% (same category) |
+### Training Data
+- **Dataset**: Polyvore Outfits (Maryland split, disjoint)
+- **Train**: ~17K outfits, ~250K items
+- **Validation**: ~2K outfits
+- **Test**: ~3K outfits
+## Limitations
+1. **Fixed Item Length**: Model expects max 8 items per outfit (padding applied)
+2. **Frozen Encoders**: ResNet-18 and LaBSE are frozen during training
+3. **Hubness**: Some popular items may dominate retrieval (mitigated with CSLS)
+4. **Fashion Domain**: Trained on Polyvore data, may not generalize to other domains
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{outfit-cir-transformer,
+  author = {Kuyumcu, Furkan},
+  title = {Outfit Transformer CIR: Multilingual Complementary Item Retrieval},
+  year = {2026},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/fkuyumcu/outfit-cir-transformer}
 }
 ```
+### Original Paper Reference
+```bibtex
+@inproceedings{sarkar2022outfitbert,
+  title={OutfitTransformer: Learning Outfit Representations for Fashion Recommendation},
+  author={Sarkar, Rohan and others},
+  booktitle={CVPR Workshop on Computer Vision for Fashion, Art, and Design},
+  year={2022}
+}
 ```
+## License
 MIT License