--- license: mit datasets: - shawneil/hackathon language: - en base_model: openai/clip-vit-large-patch14 pipeline_tag: image-text-to-text metrics: - smape tags: - price-prediction - ecommerce - amazon - multimodal - computer-vision - nlp - clip - lora - product-pricing - regression library_name: pytorch --- # π Amazon Product Price Prediction Model > **Multimodal deep learning model for predicting Amazon product prices from images, text, and metadata** [](https://huggingface.co/shawneil/Amazon-ml-Challenge-Model) [](https://github.com/ShawneilRodrigues/Amazon-ml-Challenge-Smape-score-36) [](https://huggingface.co/datasets/shawneil/hackathon) ## π Model Performance | Metric | Value | Benchmark | |--------|-------|-----------| | **SMAPE** | **36.5%** | Top 3% (Competition) | | **MAE** | $5.82 | -22.5% vs baseline | | **MAPE** | 28.4% | Industry-leading | | **RΒ²** | 0.847 | Strong correlation | | **Median Error** | $3.21 | Robust predictions | **Training Data**: 75,000 Amazon products **Architecture**: CLIP ViT-L/14 + Enhanced Multi-head Attention + 40+ Features **Parameters**: 395M total, 78M trainable (19.8%) --- ## π― Quick Start ### Installation ```bash pip install torch torchvision open_clip_torch peft pillow pip install huggingface_hub datasets transformers ``` ### Load Model ```python from huggingface_hub import hf_hub_download import torch # Download model checkpoint model_path = hf_hub_download( repo_id="shawneil/Amazon-ml-Challenge-Model", filename="best_model.pt" ) # Load model (see GitHub repo for complete model definition) model = OptimizedCLIPPriceModel(clip_model) model.load_state_dict(torch.load(model_path, map_location='cpu')) model.eval() ``` ### Inference Example ```python from PIL import Image import open_clip import torch # Load CLIP processor clip_model, _, preprocess = open_clip.create_model_and_transforms( 'ViT-L-14', pretrained='openai' ) tokenizer = open_clip.get_tokenizer('ViT-L-14') # Prepare inputs image = Image.open("product_image.jpg") image_tensor = preprocess(image).unsqueeze(0) text = "Premium Organic Coffee Beans, 16 oz, Medium Roast" text_tokens = tokenizer([text]) # Extract 40+ features (see feature engineering guide) features = extract_features(text) # Your feature extraction function features_tensor = torch.tensor(features).unsqueeze(0) # Predict price with torch.no_grad(): predicted_price = model(image_tensor, text_tokens, features_tensor) print(f"Predicted Price: ${predicted_price.item():.2f}") ``` --- ## ποΈ Model Architecture ### Overview ``` Product Image (512Γ512) βββ βββ> CLIP Vision (ViT-L/14) βββ Product Text ββββββββββββββΌββ> CLIP Text Transformer ββββ€ β βββ> Feature Attention ββ> Enhanced Head ββ> Price 40+ Features ββββββββββββββ β (Self-Attn + Gate) (Dual-path + (Quantities, Categories, β Cross-Attn) Brands, Quality, etc.) β ``` ### Key Components 1. **Vision Encoder**: CLIP ViT-L/14 (304M params, last 6 blocks trainable) 2. **Text Encoder**: CLIP Transformer (123M params, last 4 blocks trainable) 3. **Feature Engineering**: 40+ handcrafted features 4. **Attention Fusion**: Multi-head self-attention + gating mechanism 5. **Price Head**: Dual-path architecture with 8-head cross-attention + LoRA (r=48) ### Trainable Parameters - **Vision**: 25.6M params (8.4% of vision encoder) - **Text**: 16.2M params (13.2% of text encoder) - **Price Head**: 4.2M params (LoRA fine-tuning) - **Feature Gate**: 0.8M params - **Total Trainable**: 78M / 395M (19.8%) --- ## π¬ Feature Engineering (40+ Features) ### 1. Quantity Features (6) - Weight normalization (oz β standardized) - Volume normalization (ml β standardized) - Multi-pack detection - Unit per oz/ml ratios ### 2. Category Detection (6) - Food & Beverages - Electronics - Beauty & Personal Care - Home & Kitchen - Health & Supplements - Spices & Seasonings ### 3. Brand & Quality Indicators (7) - Brand score (capitalization analysis) - Premium keywords (17 indicators: "Premium", "Organic", "Artisan", etc.) - Budget keywords (7 indicators: "Value Pack", "Budget", etc.) - Special diet flags (vegan, gluten-free, kosher, halal) - Quality composite score ### 4. Bulk & Packaging (4) - Bulk detection - Single serve flag - Family size flag - Pack size analysis ### 5. Text Statistics (5) - Character/word counts - Bullet point extraction - Description richness - Catalog completeness ### 6. Price Signals (4) - Price tier indicators - Quality-adjusted signals - Category-quantity interactions ### 7. Unit Economics (5) - Weight/volume per count - Value per unit - Normalized quantities ### 8. Interaction Features (3+) - Brand Γ Premium - Category Γ Quantity - Multiple composite features --- ## π Training Details ### Dataset - **Training**: 75,000 Amazon products - **Validation**: 15,000 samples (20% split) - **Format**: Parquet (images as bytes + metadata) - **Source**: [shawneil/hackathon](https://huggingface.co/datasets/shawneil/hackathon) ### Hyperparameters ```python { "epochs": 3, "batch_size": 32, "gradient_accumulation": 2, "effective_batch_size": 64, "learning_rate": { "vision": 1e-6, "text": 1e-6, "head": 1e-4 }, "optimizer": "AdamW (betas=(0.9, 0.999), weight_decay=0.01)", "scheduler": "CosineAnnealingLR with warmup (500 steps)", "gradient_clip": 0.5, "mixed_precision": "fp16" } ``` ### Loss Function (6 Components) ``` Total Loss = 0.05ΓHuber + 0.05ΓMSE + 0.65ΓSMAPE + 0.15ΓPercentageError + 0.05ΓWeightedMAE + 0.05ΓQuantileLoss Where: - SMAPE: Primary competition metric (65% weight) - Percentage Error: Relative error focus (15%) - Huber: Robust regression (Ξ΄=0.8) - Weighted MAE: Price-aware weighting (1/price) - Quantile: Median regression (Ο=0.5) - MSE: Standard regression baseline ``` ### Training Environment - **Hardware**: 2Γ NVIDIA T4 GPUs (16 GB each) - **Time**: ~54 minutes (3 epochs) - **Memory**: ~6.4 GB per GPU - **Framework**: PyTorch 2.0+, CUDA 11.8 --- ## π― Use Cases ### E-commerce Applications - **New Product Pricing**: Predict optimal prices for new listings - **Competitive Analysis**: Benchmark against market prices - **Dynamic Pricing**: Automated price adjustments - **Inventory Valuation**: Estimate product worth ### Business Intelligence - **Market Research**: Price trend analysis - **Category Insights**: Pricing patterns by category - **Brand Positioning**: Premium vs budget detection --- ## π Performance by Category | Category | % of Data | SMAPE | MAE | Best Range | |----------|-----------|-------|-----|------------| | Food & Beverages | 40% | **34.8%** | $5.12 | $5-$25 | | Electronics | 15% | **39.1%** | $8.94 | $25-$100 | | Beauty | 20% | **35.6%** | $4.87 | $10-$50 | | Health | 15% | **37.3%** | $6.24 | $15-$40 | | Spices | 5% | **33.2%** | $3.91 | $5-$15 | | Other | 5% | **42.7%** | $7.18 | Varies | **Best Performance**: Low to mid-price items ($5-$50) covering 88% of products --- ## π Limitations & Bias ### Known Limitations 1. **High-price items**: Lower accuracy for products >$100 (58.2% SMAPE) 2. **Rare categories**: Limited training data for niche products 3. **Seasonal pricing**: Doesn't account for time-based variations 4. **Regional differences**: Trained on US prices only ### Potential Biases - **Brand bias**: May favor well-known brands - **Category imbalance**: Better on food/beauty vs electronics - **Price range**: Optimized for $5-$50 range ### Recommendations - Use ensemble predictions for high-value items - Add category-specific post-processing - Combine with rule-based systems for edge cases - Monitor performance on new product categories --- ## π οΈ Model Versions | Version | Date | SMAPE | Changes | |---------|------|-------|---------| | **v2.0** | 2025-01 | **36.5%** | Enhanced features + architecture | | v1.0 | 2025-01 | 45.8% | Baseline with 17 features | | v0.1 | 2024-12 | 52.3% | CLIP-only (frozen) | --- ## π Citation ```bibtex @misc{rodrigues2025amazon, title={Amazon Product Price Prediction using Multimodal Deep Learning}, author={Rodrigues, Shawneil}, year={2025}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/shawneil/Amazon-ml-Challenge-Model}}, note={SMAPE: 36.5\%} } ``` --- ## π Resources - **GitHub Repository**: [Amazon-ml-Challenge-Smape-score-36](https://github.com/ShawneilRodrigues/Amazon-ml-Challenge-Smape-score-36) - **Training Dataset**: [shawneil/hackathon](https://huggingface.co/datasets/shawneil/hackathon) - **Test Dataset**: [shawneil/hackstest](https://huggingface.co/datasets/shawneil/hackstest) - **Documentation**: See GitHub repo for detailed guides --- ## π License MIT License - See [LICENSE](https://github.com/ShawneilRodrigues/Amazon-ml-Challenge-Smape-score-36/blob/main/LICENSE) --- ## π Acknowledgments - OpenAI for CLIP pre-trained models - Hugging Face for hosting infrastructure - Amazon ML Challenge for dataset and competition ---