| | --- |
| | license: cc-by-sa-4.0 |
| | pipeline_tag: feature-extraction |
| | library_name: timm |
| | language: [] |
| | base_model: timm/convnext_base.fb_in22k_ft_in1k |
| | embedding_dimension: 512 |
| | training_steps: 108 |
| | model_type: trendyol_arcface |
| | tags: |
| | - computer-vision |
| | - image-feature-extraction |
| | - arcface |
| | - product-similarity |
| | - e-commerce |
| | - image-embeddings |
| | - convnext |
| | --- |
| | |
| | # E-Commerce Product Image Encoder |
| |
|
| | _ConvNeXt-based image embedding model for product unification and visual search on the Trendyol e-commerce catalogue._ |
| |
|
| | ## Model Details |
| |
|
| | - **Architecture**: ConvNeXt-Base (224px) backbone + 512-dim projection head with BatchNorm and ArcFace loss |
| | - **Objective**: ArcFace with additive angular margin (scale=128, margin=0.25) for improved product similarity learning |
| | - **Training Data**: Large-scale Trendyol product image dataset covering diverse e-commerce categories |
| | - **Hardware**: Multi-GPU training with PyTorch Lightning (training epoch: 5, global steps: 108) |
| | - **Framework**: PyTorch Lightning 1.8.1 with mixed-precision training |
| |
|
| | ## Intended Use |
| |
|
| | - **Primary** – Generate embeddings for duplicate product detection ("unification"), near-duplicate search, and product similarity ranking in e-commerce applications |
| | - **Secondary** – Feature extractor for image-based product recommendation systems and visual search |
| | - **Downstream Tasks** – Product clustering, visual search, duplicate detection, and content-based product recommendation |
| |
|
| | ## Usage |
| |
|
| | Complete example to load the model and generate embeddings: |
| |
|
| | ```python |
| | import torch |
| | import torch.nn as nn |
| | import torch.nn.functional as F |
| | import timm |
| | import json |
| | from safetensors.torch import load_file |
| | from PIL import Image |
| | import torchvision.transforms as transforms |
| | import requests |
| | |
| | # 1. Define the model class |
| | class TYArcFaceModel(nn.Module): |
| | def __init__(self, config): |
| | super().__init__() |
| | self.config = config |
| | self.backbone = timm.create_model( |
| | config['backbone_name'], |
| | pretrained=False, |
| | num_classes=0 |
| | ) |
| | self.bn1 = nn.BatchNorm2d(config['backbone_features']) |
| | self.fc11 = nn.Linear( |
| | config['backbone_features'] * config['hidden_size'], |
| | config['embedding_dim'] |
| | ) |
| | self.bn11 = nn.BatchNorm1d(config['embedding_dim']) |
| | |
| | def forward(self, x): |
| | features = self.backbone.forward_features(x) |
| | features = self.bn1(features) |
| | features = features.flatten(start_dim=1) |
| | features = self.fc11(features) |
| | features = self.bn11(features) |
| | features = F.normalize(features, p=2, dim=1) |
| | return features |
| | |
| | # 2. Load the model |
| | device = "cuda" if torch.cuda.is_available() else "cpu" |
| | |
| | # Load configuration and weights |
| | config = json.load(open('config.json')) |
| | model = TYArcFaceModel(config) |
| | state_dict = load_file('model.safetensors') |
| | |
| | # Filter to only load compatible weights |
| | model_keys = set(model.state_dict().keys()) |
| | filtered_state_dict = {k: v for k, v in state_dict.items() if k in model_keys} |
| | |
| | model.load_state_dict(filtered_state_dict, strict=False) |
| | model.to(device) |
| | model.eval() |
| | |
| | print(f"✅ Model loaded successfully!") |
| | print(f"📊 Ready to generate {config['embedding_dim']}-dimensional embeddings") |
| | |
| | # 3. Define preprocessing transforms |
| | transform = transforms.Compose([ |
| | transforms.Resize((config['input_size'], config['input_size'])), |
| | transforms.ToTensor(), |
| | transforms.Normalize( |
| | mean=config['normalization']['mean'], |
| | std=config['normalization']['std'] |
| | ) |
| | ]) |
| | |
| | # 4. Process an image and generate embeddings |
| | def get_embeddings(image_path_or_url): |
| | """Get embeddings for a single image""" |
| | # Load image |
| | if image_path_or_url.startswith('http'): |
| | image = Image.open(requests.get(image_path_or_url, stream=True).raw).convert('RGB') |
| | else: |
| | image = Image.open(image_path_or_url).convert('RGB') |
| | |
| | # Preprocess |
| | input_tensor = transform(image).unsqueeze(0).to(device) |
| | |
| | # Generate embeddings |
| | with torch.no_grad(): |
| | embeddings = model(input_tensor) |
| | |
| | return embeddings |
| | |
| | # 5. Example usage |
| | image_url = "https://example.com/product_image.jpg" # Replace with your image |
| | embeddings = get_embeddings(image_url) |
| | print(f"Embedding shape: {embeddings.shape}") # torch.Size([1, 512]) |
| | |
| | # 6. Compute similarity between two products |
| | def compute_similarity(embedding1, embedding2): |
| | """Compute cosine similarity between two embeddings""" |
| | return F.cosine_similarity(embedding1, embedding2, dim=1) |
| | |
| | # Example: Compare two products |
| | # embedding2 = get_embeddings("path/to/another/image.jpg") |
| | # similarity_score = compute_similarity(embeddings, embedding2) |
| | # print(f"Product similarity: {similarity_score.item():.4f}") |
| | ``` |
| |
|
| | ## Model Performance |
| |
|
| | The model has been trained using ArcFace loss which provides several advantages for product similarity tasks: |
| |
|
| | - **Improved Discriminative Power**: ArcFace adds angular margin in the feature space, creating better separation between different products |
| | - **Normalized Embeddings**: All output embeddings are L2-normalized, making cosine similarity computation efficient |
| | - **Scale Robustness**: The learned representations are robust to scale variations in product images |
| |
|
| | ### Training Configuration |
| |
|
| | - **Backbone**: ConvNeXt-Base pretrained on ImageNet-22k and fine-tuned on ImageNet-1k |
| | - **Embedding Dimension**: 512 |
| | - **ArcFace Scale**: 128 |
| | - **ArcFace Margin**: 0.25 |
| | - **Input Resolution**: 224×224 |
| | - **Normalization**: ImageNet statistics |
| | - **Training Framework**: PyTorch Lightning 1.8.1 |
| |
|
| | ## Limitations |
| |
|
| | - **Domain Specificity**: Optimized for e-commerce product images; may not generalize well to other image domains |
| | - **Image Quality**: Performance may degrade on low-quality, heavily compressed, or significantly distorted images |
| | - **Category Bias**: Performance may vary across different product categories based on training data distribution |
| | - **Scale Dependency**: Input images should be resized to 224×224 for optimal performance |
| |
|
| | ## Bias Analysis |
| |
|
| | - **Dataset Bias**: The model's embeddings may reflect biases present in the e-commerce training dataset |
| | - **Product Category Imbalance**: Some product categories may be over-represented in the training data |
| | - **Brand and Style Bias**: The model may learn to encode brand-specific or style-specific features that could affect similarity judgments |
| |
|
| | ## Environmental Impact |
| |
|
| | - **Training Hardware**: Multi-GPU setup with PyTorch Lightning |
| | - **Training Time**: 5 epochs with 108 global steps |
| | - **Energy Consumption**: Estimated moderate carbon footprint due to relatively short training duration |
| |
|
| | ## Ethical Considerations |
| |
|
| | - **Commercial Use**: Designed for e-commerce applications; consider potential impacts on market competition |
| | - **Privacy**: Ensure compliance with data protection regulations when processing product images |
| | - **Fairness**: Monitor for biased similarity judgments across different product categories or brands |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{trendyol2025convnextarcface, |
| | title={E-Commerce Product Image Encoder: High-Fidelity Image Embeddings for E-commerce Product Unification}, |
| | author={Trendyol Data Science Team}, |
| | year={2025}, |
| | howpublished={\url{https://huggingface.co/Trendyol/e-commerce-product-image-encoder }} |
| | } |
| | ``` |
| |
|
| | ## Model Card Authors |
| |
|
| | - Trendyol Data Science Team |
| | - Model trained using the TYArcFace architecture with ConvNeXt backbone |
| |
|
| | ## License |
| |
|
| | This model is released by Trendyol as a source-available, non-open-source model. |
| |
|
| | ### You are allowed to: |
| |
|
| | - View, download, and evaluate the model weights. |
| | - Use the model for non-commercial research and internal testing. |
| | - Use the model or its derivatives for commercial purposes, provided that: |
| | - You cite Trendyol as the original model creator. |
| | - You notify Trendyol in advance via cqm.datascience@trendyol.com or other designated contact. |
| |
|
| | ### You are not allowed to: |
| |
|
| | - Redistribute or host the model or its derivatives on third-party platforms without prior written consent from Trendyol. |
| | - Use the model in applications violating ethical standards, including but not limited to surveillance, misinformation, or harm to individuals or groups. |
| |
|
| | By downloading or using this model, you agree to the terms above. |
| |
|
| | © 2025 Trendyol Group. All rights reserved. |
| |
|
| | See the [LICENSE](LICENSE) file for more details. |
| |
|
| | --- |
| |
|
| | _For technical support or questions about this model, please contact the Trendyol Data Science team._ |
| |
|