--- library_name: transformers tags: - recommendation - e-commerce - sequential-modeling - next-item-prediction - behavioral-modeling --- # Commerce Intent ## Model Overview Commerce Intent is a pretrained sequential behavioral model for e-commerce session understanding. It is trained to predict the next item in a user session based on historical interaction sequences. The model learns representations from multi-modal structured signals, including: - Item ID - Brand - Category - Event type (view, cart and purchase) - Normalized price - Positional order within the session It is designed as a foundation model for downstream recommendation and behavioral modeling tasks. --- ## Model Details ### Model Description Commerce Intent models user behavior within a session as an autoregressive sequence modeling problem. Given a sequence of past interactions, the model predicts the next likely item. The architecture consists of: - Multi-embedding token fusion (item, brand, category, event) - Continuous price projection - Positional encoding - Transformer encoder with causal masking - Linear head for next-item prediction This model is pretrained and can be fine-tuned for recommendation, ranking, or conversion modeling tasks. - **Developed by:** infinity6 - **Model type:** Sequential autoregressive transformer - **Language(s):** Structured e-commerce interaction data (non-NLP) - **License:** Apache 2.0 - **Finetuned from model:** None (trained from scratch) --- ## Dependencies This model depends on the external package: - https://github.com/infinity6-ai/i6model_ecomm The package contains the custom architecture required to correctly load and run the model. You must install it before using Commerce Intent. ### Installation Clone the repository: ```bash git clone https://github.com/infinity6-ai/i6model_ecomm.git cd i6model_ecomm pip install . ``` --- ## Intended Use ### Direct Use The model can be used directly for: - Next-item prediction - Session-based recommendation - Behavioral embedding extraction - Purchase intent modeling - Real-time ranking systems Example: ```python import torch from i6modelecomm.model import i6modelecomm model = i6modelecomm.CommerceIntent.from_pretrained( "infinity6/ecomm_shop_intent_pretrained" ) # TODO: map items and remap categories. # TODO: freeze layers and train with your data. model.eval() D = 'cpu' # batch_size | seq_len = 3 itms = torch.tensor([[12, 45, 78]], dtype=torch.long).to(D) brds = torch.tensor([[3, 7, 2]], dtype=torch.long).to(D) cats = torch.tensor([[8, 8, 15]], dtype=torch.long).to(D) prcs = torch.tensor([[29.9, 35.0, 15.5]], dtype=torch.float).to(D) evts = torch.tensor([[1, 1, 2]], dtype=torch.long).to(D) # mask mask = torch.tensor([[1, 1, 1]], dtype=torch.bool).to(D) with torch.no_grad(): outputs = model( itms=itms, # items brds=brds, # brands cats=cats, # categories prcs=prcs, # prices evts=evts, # events attention_mask=mask, labels=None # inference only -- no loss computation ) # logits tem shape (B, L-1, num_itm) logits = outputs.logits print("Logits shape:", logits.shape) ``` Inputs must include: - `itms` - `brds` - `cats` - `evts` - `prcs` - `attention_mask` --- ### Downstream Use The model can be fine-tuned for: - Conversion prediction - Cart abandonment modeling - Customer lifetime value modeling - Cross-sell / upsell recommendation - Personalized search ranking --- ### Out-of-Scope Use This model is not suitable for: - Natural language tasks - Image tasks - Generative text modeling - Multi-user graph modeling without adaptation - Cold-start scenarios without item mappings and category remapping --- ## Bias, Risks, and Limitations - The model reflects behavioral biases present in historical e-commerce data. - Popularity bias may emerge due to item frequency distribution. - Model performance depends on session length and interaction quality. - Cold-start performance for unseen items is limited. - It does not encode demographic or identity-aware fairness constraints. ### Recommendations - Monitor recommendation fairness and popularity skew. - Retrain periodically to reflect new item distributions. - Apply business constraints in production systems. - Use A/B testing before large-scale deployment. --- ## Training Details ### Training Data The model was trained on large-scale anonymized e-commerce interaction logs containing: - Session-based user interactions - Item identifiers - Brand identifiers - Category identifiers - Event types - Timestamped behavioral sequences - Price values (log-normalized and standardized) Sessions shorter than a minimum threshold were filtered. --- ## Training Data ### Data Sources and Preparation The model was trained on a unified, large-scale corpus of e-commerce interaction data, aggregating and normalizing multiple public datasets to create a robust foundation for sequential behavior modeling. The training data combines the following sources: | Dataset | Description | Key Statistics | |---------------------------------------------------------------|------------------------------------------------------------------|-------------------| | **E-commerce behavior data from multi category store** | Real event logs from a multi-category e-commerce platform | ~285M records | | **E-commerce Clickstream and Transaction Dataset (Kaggle)** | Sequential event data including views and clicks | ~500K+ events | | **E-Commerce Behavior Dataset – Agents for Data** | Product interactions from ~18k users across multiple event types | ~2M interactions | | **Retail Rocket clickstream dataset** | Industry-standard dataset with views, carts, and purchases | ~2.7M events | | **SIGIR 2021 / Coveo Session data challenge** | Navigation sessions with clicks, adds, purchase + metadata | ~30M events | | **JDsearch dataset** | Real interactionswith search queries from JD.com platform | ~26M interactions | ### Data Unification and Normalization All datasets underwent a rigorous unification and normalization process: - **Schema Alignment**: Standardized field names and types across all sources (item_id, brand_id, category_id, event_type, timestamp, price) - **Event Type Normalization**: Mapped varied event nomenclature to a standardized taxonomy (view, cart, purchase) - **ID Harmonization**: Created consistent ID spaces for items, brands, and categories through cross-dataset mapping - **Temporal Alignment**: Unified timestamp formats and established consistent session windows - **Price Normalization**: Applied log-normalization (log1p) followed by standardization using global statistics - **Session Construction**: Reconstructed user sessions based on temporal proximity and interaction patterns - **Quality Filtering**: Removed sessions below minimum length threshold and filtered anomalous interactions This diverse and comprehensive training corpus enables the model to learn robust representations of e-commerce behavior patterns across different platforms, markets, and interaction types, serving as a strong foundation for downstream fine-tuning tasks. --- ### Preprocessing - Missing categorical values replaced with `UNK` - Price values transformed via `log1p` - Standardization using global mean and standard deviation - Session truncation to fixed-length sequences - Right padding with attention masking --- ### Training Objective Next-item autoregressive prediction using cross-entropy loss with padding ignored. --- ### Training Regime - **Precision:** FP32 - **Optimizer:** AdamW - **Learning Rate:** 1e-3 with warmup - **Gradient Clipping:** 5.0 - **Causal masking applied** --- ## Evaluation ### Metrics - Cross-Entropy Loss - Perplexity - Recall@20 ### Results On the evaluation split, the model achieved: - **Perplexity:** 24.04 - **Recall@20:** 0.6823 These results indicate strong next-item prediction performance in session-based e-commerce interaction modeling. ### Summary The model demonstrates: - Low predictive uncertainty (Perplexity 24.04) - High ranking quality for next-item recommendation (Recall@20 of 68.23%) Performance may vary depending on dataset distribution, session length, and preprocessing configuration. --- ## Environmental Impact - **Hardware:** GPU NVIDIA H100 NVL (94GB PCIe 5.0) - **Precision:** FP32 - **Training Duration:** Several hours (varies by configuration) - **Carbon Impact:** ≈45 kg CO₂e (estimated based on energy consumption of 30h on H100 GPU) --- ## Limitations - No long-term user modeling beyond session scope - Does not include user-level embeddings - Requires predefined categorical vocabularies - Limited generalization to unseen item IDs