fkuyumcu commited on
Commit
ed99b9c
·
verified ·
1 Parent(s): 0dd5ab2

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +278 -3
  2. config.json +71 -0
  3. model.py +193 -0
  4. pytorch_model.bin +3 -0
README.md CHANGED
@@ -1,3 +1,278 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - tr
6
+ tags:
7
+ - fashion
8
+ - outfit-recommendation
9
+ - multimodal
10
+ - transformer
11
+ - image-text
12
+ - complementary-item-retrieval
13
+ - pytorch
14
+ datasets:
15
+ - polyvore
16
+ pipeline_tag: feature-extraction
17
+ ---
18
+
19
+ # Outfit Transformer CIR (Complementary Item Retrieval)
20
+
21
+ A multimodal Transformer model for **fashion outfit completion** and **complementary item retrieval**. Given a partial outfit (e.g., a t-shirt and jeans), the model predicts the ideal embedding for a missing item (e.g., shoes) that would complete the outfit harmoniously.
22
+
23
+ ## Model Description
24
+
25
+ This model is based on the architecture proposed by **Sarkar et al.** in their paper on outfit recommendation, with several key modifications:
26
+
27
+ ### Differences from Original Paper
28
+
29
+ | Aspect | Original (Sarkar et al.) | This Implementation |
30
+ |--------|--------------------------|---------------------|
31
+ | **Text Encoder** | BERT (768-dim) | **LaBSE** (768-dim) |
32
+ | **Text Language** | English only | Multilingual (109 languages) |
33
+ | **Loss Function** | InfoNCE | **Set-wise Outfit Ranking Loss** |
34
+ | **Negative Sampling** | Random | **Hard Negative Mining** (same category) |
35
+
36
+ ### Why LaBSE instead of BERT?
37
+
38
+ [LaBSE (Language-agnostic BERT Sentence Embedding)](https://huggingface.co/sentence-transformers/LaBSE) was chosen because:
39
+
40
+ 1. **Multilingual Support**: Works with 109 languages, enabling Turkish/English fashion descriptions
41
+ 2. **Cross-lingual Alignment**: "Mavi tişört" and "blue t-shirt" produce similar embeddings
42
+ 3. **Same Dimensionality**: Still outputs 768-dim vectors, compatible with the original architecture
43
+ 4. **Production Ready**: Better suited for real-world e-commerce applications
44
+
45
+ ### Loss Function: Set-wise Outfit Ranking Loss
46
+
47
+ Instead of the standard InfoNCE loss, we use the **Set-wise Outfit Ranking Loss** from the paper (Section 3.2.2):
48
+
49
+ ```
50
+ L_set = L_all + L_hard
51
+ ```
52
+
53
+ Where:
54
+ - **L_all**: Margin-based ranking over all negatives
55
+ - **L_hard**: Extra penalty on the hardest negative (closest wrong answer)
56
+
57
+ ```python
58
+ # L_ALL: General ranking loss
59
+ diff_all = pos_dist - neg_dist + margin # margin = 2.0
60
+ loss_all = ReLU(diff_all).mean()
61
+
62
+ # L_HARD: Hardest negative focus
63
+ min_neg_dist = neg_dist.min(dim=1)
64
+ diff_hard = pos_dist - min_neg_dist + margin
65
+ loss_hard = ReLU(diff_hard).mean()
66
+
67
+ total_loss = loss_all + loss_hard
68
+ ```
69
+
70
+ **Why this helps:**
71
+ - InfoNCE treats all negatives equally via softmax
72
+ - Set-wise loss explicitly penalizes the hardest negative
73
+ - Reduces **hubness problem** where popular items dominate retrieval
74
+
75
+ ## Architecture
76
+
77
+ ```
78
+ ┌─────────────────────────────────────────────────────────────┐
79
+ │ OutfitTransformerCIR │
80
+ ├─────────────────────────────────────────────────────────────┤
81
+ │ │
82
+ │ ┌──────────────┐ ┌──────────────┐ │
83
+ │ │ ResNet-18 │ │ LaBSE │ │
84
+ │ │ (Frozen) │ │ (Frozen) │ │
85
+ │ │ 512-dim │ │ 768-dim │ │
86
+ │ └──────┬───────┘ └──────┬───────┘ │
87
+ │ │ │ │
88
+ │ ┌──────▼───────┐ ┌──────▼───────┐ │
89
+ │ │ Visual Proj │ │ Text Proj │ ← Trained │
90
+ │ │ 512 → 64 │ │ 768 → 64 │ │
91
+ │ └──────┬───────┘ └──────┬───────┘ │
92
+ │ │ │ │
93
+ │ └────────┬──────────┘ │
94
+ │ │ │
95
+ │ ┌──────▼──────┐ │
96
+ │ │ Concat │ │
97
+ │ │ 64+64 = 128 │ │
98
+ │ └──────┬──────┘ │
99
+ │ │ │
100
+ │ ┌─────────────▼─────────────┐ │
101
+ │ │ [QUERY] + Item Embeddings │ │
102
+ │ │ (Learnable Token) │ │
103
+ │ └─────────────┬─────────────┘ │
104
+ │ │ │
105
+ │ ┌─────────────▼─────────────┐ │
106
+ │ │ Transformer Encoder │ │
107
+ │ │ 6 layers, 16 heads │ │
108
+ │ │ d_model=128, ff=512 │ │
109
+ │ └─────────────┬─────────────┘ │
110
+ │ │ │
111
+ │ ┌─────────────▼─────────────┐ │
112
+ │ │ Output Projection │ │
113
+ │ │ + LayerNorm + L2 Norm │ │
114
+ │ └─────────────┬─────────────┘ │
115
+ │ │ │
116
+ │ ┌──────▼──────┐ │
117
+ │ │ 128-dim │ │
118
+ │ │ Predicted │ │
119
+ │ │ Embedding │ │
120
+ │ └─────────────┘ │
121
+ │ │
122
+ └─────────────────────────────────────────────────────────────┘
123
+ ```
124
+
125
+ ## Benchmark Results
126
+
127
+ Evaluated on **Polyvore Outfits** dataset (disjoint split):
128
+
129
+ | Metric | Score |
130
+ |--------|-------|
131
+ | **FITB Accuracy** | 56.39% |
132
+ | **MRR** | 0.7447 |
133
+ | **Recall@1** | 56.39% |
134
+ | **Recall@2** | 80.86% |
135
+ | **Recall@3** | 93.56% |
136
+ | **NDCG@3** | 0.7818 |
137
+ | **NDCG@5** | 0.8095 |
138
+
139
+ ### Comparison with Baselines
140
+
141
+ | Model | FITB Accuracy | Notes |
142
+ |-------|---------------|-------|
143
+ | Random | 25.00% | 4-choice task |
144
+ | Type-Aware (Vasileva 2018) | ~53% | Category-specific spaces |
145
+ | **Ours (LaBSE + SetWise)** | **56.39%** | Multilingual, margin-based |
146
+ | Sarkar et al. (reported) | ~57% | English BERT, InfoNCE |
147
+
148
+ ## Usage
149
+
150
+ ### Installation
151
+
152
+ ```bash
153
+ pip install torch torchvision transformers
154
+ ```
155
+
156
+ ### Loading the Model
157
+
158
+ ```python
159
+ import torch
160
+ from model import OutfitTransformerCIR
161
+
162
+ # Load model
163
+ model = OutfitTransformerCIR(embedding_dim=128, nhead=16, num_layers=6)
164
+ model.load_state_dict(torch.load("pytorch_model.bin", map_location="cpu"))
165
+ model.eval()
166
+ ```
167
+
168
+ ### Inference Example
169
+
170
+ ```python
171
+ # Assume you have pre-extracted features:
172
+ # context_images: (1, num_items, 512) - ResNet-18 features
173
+ # context_texts: (1, num_items, 768) - LaBSE embeddings
174
+
175
+ with torch.no_grad():
176
+ # Predict missing item embedding
177
+ predicted_embedding = model(context_images, context_texts)
178
+ # predicted_embedding: (1, 128)
179
+
180
+ # Use cosine similarity to find closest items in your database
181
+ similarities = torch.cosine_similarity(predicted_embedding, item_database)
182
+ top_matches = similarities.argsort(descending=True)[:10]
183
+ ```
184
+
185
+ ### Feature Extraction (for your own items)
186
+
187
+ ```python
188
+ from torchvision import models, transforms
189
+ from transformers import AutoTokenizer, AutoModel
190
+ from PIL import Image
191
+ import torch.nn as nn
192
+
193
+ # Image encoder (ResNet-18)
194
+ resnet = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
195
+ resnet = nn.Sequential(*list(resnet.children())[:-1])
196
+ resnet.eval()
197
+
198
+ preprocess = transforms.Compose([
199
+ transforms.Resize((224, 224)),
200
+ transforms.ToTensor(),
201
+ transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
202
+ ])
203
+
204
+ # Text encoder (LaBSE)
205
+ tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/LaBSE")
206
+ labse = AutoModel.from_pretrained("sentence-transformers/LaBSE")
207
+ labse.eval()
208
+
209
+ def extract_features(image_path, text_description):
210
+ # Image: 512-dim
211
+ image = Image.open(image_path).convert('RGB')
212
+ img_tensor = preprocess(image).unsqueeze(0)
213
+ with torch.no_grad():
214
+ img_features = resnet(img_tensor).flatten(1) # (1, 512)
215
+
216
+ # Text: 768-dim
217
+ inputs = tokenizer(text_description, return_tensors="pt", padding=True, truncation=True)
218
+ with torch.no_grad():
219
+ txt_features = labse(**inputs).pooler_output # (1, 768)
220
+
221
+ return img_features, txt_features
222
+ ```
223
+
224
+ ## Training Details
225
+
226
+ | Hyperparameter | Value |
227
+ |----------------|-------|
228
+ | Optimizer | AdamW |
229
+ | Learning Rate | 1e-4 |
230
+ | Weight Decay | 0.01 |
231
+ | Batch Size | 64 |
232
+ | Epochs | 30 |
233
+ | Margin (loss) | 2.0 |
234
+ | Num Negatives | 5 |
235
+ | Hard Negative Ratio | 50% (same category) |
236
+
237
+ ### Training Data
238
+
239
+ - **Dataset**: Polyvore Outfits (Maryland split, disjoint)
240
+ - **Train**: ~17K outfits, ~250K items
241
+ - **Validation**: ~2K outfits
242
+ - **Test**: ~3K outfits
243
+
244
+ ## Limitations
245
+
246
+ 1. **Fixed Item Length**: Model expects max 8 items per outfit (padding applied)
247
+ 2. **Frozen Encoders**: ResNet-18 and LaBSE are frozen during training
248
+ 3. **Hubness**: Some popular items may dominate retrieval (mitigated with CSLS)
249
+ 4. **Fashion Domain**: Trained on Polyvore data, may not generalize to other domains
250
+
251
+ ## Citation
252
+
253
+ If you use this model, please cite:
254
+
255
+ ```bibtex
256
+ @misc{outfit-cir-transformer,
257
+ author = {Kuyumcu, Furkan},
258
+ title = {Outfit Transformer CIR: Multilingual Complementary Item Retrieval},
259
+ year = {2026},
260
+ publisher = {Hugging Face},
261
+ url = {https://huggingface.co/fkuyumcu/outfit-cir-transformer}
262
+ }
263
+ ```
264
+
265
+ ### Original Paper Reference
266
+
267
+ ```bibtex
268
+ @inproceedings{sarkar2022outfitbert,
269
+ title={OutfitTransformer: Learning Outfit Representations for Fashion Recommendation},
270
+ author={Sarkar, Rohan and others},
271
+ booktitle={CVPR Workshop on Computer Vision for Fashion, Art, and Design},
272
+ year={2022}
273
+ }
274
+ ```
275
+
276
+ ## License
277
+
278
+ MIT License
config.json ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "outfit-cir-transformer",
3
+ "architectures": ["OutfitTransformerCIR"],
4
+
5
+ "embedding_dim": 128,
6
+ "nhead": 16,
7
+ "num_layers": 6,
8
+ "dim_feedforward": 512,
9
+ "dropout": 0.1,
10
+ "max_items": 8,
11
+
12
+ "image_encoder": {
13
+ "name": "resnet18",
14
+ "pretrained": "torchvision",
15
+ "output_dim": 512,
16
+ "frozen": true
17
+ },
18
+
19
+ "text_encoder": {
20
+ "name": "sentence-transformers/LaBSE",
21
+ "output_dim": 768,
22
+ "frozen": true
23
+ },
24
+
25
+ "projection": {
26
+ "visual": {"in_features": 512, "out_features": 64},
27
+ "text": {"in_features": 768, "out_features": 64}
28
+ },
29
+
30
+ "training": {
31
+ "loss": "SetWiseOutfitRankingLoss",
32
+ "margin": 2.0,
33
+ "num_negatives": 5,
34
+ "hard_negative_ratio": 0.5,
35
+ "optimizer": "AdamW",
36
+ "learning_rate": 1e-4,
37
+ "weight_decay": 0.01,
38
+ "batch_size": 64,
39
+ "epochs": 30
40
+ },
41
+
42
+ "dataset": {
43
+ "name": "polyvore_outfits",
44
+ "split": "disjoint",
45
+ "train_outfits": 17316,
46
+ "valid_outfits": 1497,
47
+ "test_outfits": 3076,
48
+ "total_items": 251008
49
+ },
50
+
51
+ "benchmark": {
52
+ "fitb_accuracy": 0.5639,
53
+ "mrr": 0.7447,
54
+ "recall_at_1": 0.5639,
55
+ "recall_at_2": 0.8086,
56
+ "recall_at_3": 0.9356,
57
+ "ndcg_at_3": 0.7818,
58
+ "ndcg_at_5": 0.8095
59
+ },
60
+
61
+ "base_paper": {
62
+ "title": "OutfitTransformer: Learning Outfit Representations for Fashion Recommendation",
63
+ "authors": "Sarkar et al.",
64
+ "venue": "CVPR Workshop 2022",
65
+ "modifications": [
66
+ "Replaced BERT with LaBSE for multilingual support",
67
+ "Replaced InfoNCE with Set-wise Outfit Ranking Loss",
68
+ "Added hard negative mining from same category"
69
+ ]
70
+ }
71
+ }
model.py ADDED
@@ -0,0 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ OutfitTransformerCIR - Complementary Item Retrieval Model
3
+ ==========================================================
4
+
5
+ Architecture based on Sarkar et al. with modifications:
6
+ - LaBSE instead of BERT for multilingual text encoding
7
+ - Set-wise Outfit Ranking Loss instead of InfoNCE
8
+
9
+ Usage:
10
+ from model import OutfitTransformerCIR
11
+
12
+ model = OutfitTransformerCIR()
13
+ model.load_state_dict(torch.load("pytorch_model.bin"))
14
+ model.eval()
15
+
16
+ # context_images: (B, S, 512) - ResNet-18 features
17
+ # context_texts: (B, S, 768) - LaBSE embeddings
18
+ predicted = model(context_images, context_texts)
19
+ # predicted: (B, 128) - Missing item embedding
20
+ """
21
+
22
+ import torch
23
+ import torch.nn as nn
24
+ import torch.nn.functional as F
25
+
26
+
27
+ class OutfitTransformerCIR(nn.Module):
28
+ """
29
+ Complementary Item Retrieval Transformer
30
+
31
+ Given context items (partial outfit), predicts the embedding of a missing item
32
+ that would complete the outfit harmoniously.
33
+
34
+ Architecture:
35
+ - Visual projection: 512 (ResNet-18) → 64
36
+ - Text projection: 768 (LaBSE) → 64
37
+ - Combined: 64 + 64 = 128 dim item embedding
38
+ - Transformer Encoder: 6 layers, 16 heads
39
+ - Learnable [QUERY] token for missing item prediction
40
+
41
+ Args:
42
+ embedding_dim (int): Final embedding dimension (default: 128)
43
+ nhead (int): Number of attention heads (default: 16)
44
+ num_layers (int): Number of transformer layers (default: 6)
45
+ use_projection (bool): Whether to apply projection layers.
46
+ - True: Input is raw features (512 + 768)
47
+ - False: Input is pre-projected features (64 + 64)
48
+ """
49
+
50
+ def __init__(self, embedding_dim=128, nhead=16, num_layers=6, use_projection=True):
51
+ super(OutfitTransformerCIR, self).__init__()
52
+
53
+ self.use_projection = use_projection
54
+ self.embedding_dim = embedding_dim
55
+
56
+ # Projection layers (trained, not frozen)
57
+ self.visual_proj = nn.Linear(512, 64)
58
+ self.text_proj = nn.Linear(768, 64)
59
+
60
+ # Transformer encoder
61
+ encoder_layer = nn.TransformerEncoderLayer(
62
+ d_model=embedding_dim,
63
+ nhead=nhead,
64
+ dim_feedforward=512,
65
+ batch_first=True,
66
+ dropout=0.1
67
+ )
68
+ self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
69
+
70
+ # Learnable query token (represents the missing item)
71
+ self.query_token = nn.Parameter(torch.randn(1, 1, embedding_dim))
72
+
73
+ # Output projection with normalization
74
+ self.output_proj = nn.Sequential(
75
+ nn.Linear(embedding_dim, embedding_dim),
76
+ nn.LayerNorm(embedding_dim)
77
+ )
78
+
79
+ def encode_items(self, images, texts):
80
+ """
81
+ Encode multiple items (for context).
82
+
83
+ Args:
84
+ images: (B, S, D_img) where D_img=512 (raw) or 64 (projected)
85
+ texts: (B, S, D_txt) where D_txt=768 (raw) or 64 (projected)
86
+
87
+ Returns:
88
+ (B, S, 128) - Unified item embeddings
89
+ """
90
+ if self.use_projection:
91
+ img_emb = self.visual_proj(images)
92
+ txt_emb = self.text_proj(texts)
93
+ else:
94
+ img_emb = images
95
+ txt_emb = texts
96
+
97
+ return torch.cat((img_emb, txt_emb), dim=-1)
98
+
99
+ def encode_single_item(self, image, text):
100
+ """
101
+ Encode a single item (for candidate scoring).
102
+
103
+ Args:
104
+ image: (B, D_img)
105
+ text: (B, D_txt)
106
+
107
+ Returns:
108
+ (B, 128) - Item embedding
109
+ """
110
+ if self.use_projection:
111
+ img_emb = self.visual_proj(image)
112
+ txt_emb = self.text_proj(text)
113
+ else:
114
+ img_emb = image
115
+ txt_emb = text
116
+
117
+ return torch.cat((img_emb, txt_emb), dim=-1)
118
+
119
+ def forward(self, context_images, context_texts, padding_mask=None):
120
+ """
121
+ Predict the embedding of a missing item.
122
+
123
+ Args:
124
+ context_images: (B, S, 512) - ResNet-18 features of context items
125
+ context_texts: (B, S, 768) - LaBSE embeddings of context items
126
+ padding_mask: (B, S) - True indicates padding positions
127
+
128
+ Returns:
129
+ (B, 128) - Predicted embedding for the missing item
130
+
131
+ Example:
132
+ >>> model = OutfitTransformerCIR()
133
+ >>> # Outfit with 3 items: t-shirt, jeans, watch
134
+ >>> img_features = torch.randn(1, 3, 512) # ResNet-18 outputs
135
+ >>> txt_features = torch.randn(1, 3, 768) # LaBSE outputs
136
+ >>> predicted = model(img_features, txt_features)
137
+ >>> # predicted: (1, 128) - embedding for ideal 4th item (e.g., shoes)
138
+ """
139
+ batch_size = context_images.size(0)
140
+ device = context_images.device
141
+
142
+ # 1. Encode context items
143
+ item_embeddings = self.encode_items(context_images, context_texts)
144
+
145
+ # 2. Prepend learnable query token
146
+ query = self.query_token.expand(batch_size, -1, -1)
147
+ x = torch.cat([query, item_embeddings], dim=1)
148
+
149
+ # 3. Build attention mask (query always attends, padding positions masked)
150
+ if padding_mask is not None:
151
+ query_mask = torch.zeros(batch_size, 1, dtype=torch.bool, device=device)
152
+ full_mask = torch.cat([query_mask, padding_mask], dim=1)
153
+ else:
154
+ full_mask = None
155
+
156
+ # 4. Transformer forward
157
+ out = self.transformer(x, src_key_padding_mask=full_mask)
158
+
159
+ # 5. Extract query output (first position)
160
+ query_out = out[:, 0, :]
161
+
162
+ # 6. Project and L2 normalize
163
+ predicted = self.output_proj(query_out)
164
+ predicted = F.normalize(predicted, p=2, dim=-1)
165
+
166
+ return predicted
167
+
168
+
169
+ # Convenience function for loading
170
+ def load_model(checkpoint_path, device="cpu"):
171
+ """
172
+ Load a trained OutfitTransformerCIR model.
173
+
174
+ Args:
175
+ checkpoint_path: Path to pytorch_model.bin
176
+ device: "cpu" or "cuda"
177
+
178
+ Returns:
179
+ Loaded model in eval mode
180
+ """
181
+ model = OutfitTransformerCIR(
182
+ embedding_dim=128,
183
+ nhead=16,
184
+ num_layers=6,
185
+ use_projection=True
186
+ )
187
+
188
+ state_dict = torch.load(checkpoint_path, map_location=device)
189
+ model.load_state_dict(state_dict)
190
+ model.to(device)
191
+ model.eval()
192
+
193
+ return model
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb5aad51daa663bc45c489d9711b69661ed75009df267c79be380df4b8614823
3
+ size 5184977