pip install pyarmor==7.7.4
Commerce Intent
Model Overview
Commerce Intent is a pretrained sequential behavioral model for e-commerce session understanding. It is trained to predict the next item in a user session based on historical interaction sequences.
The model learns representations from multi-modal structured signals, including:
- Item ID
- Brand
- Category
- Event type (view, cart and purchase)
- Normalized price
- Positional order within the session
It is designed as a foundation model for downstream recommendation and behavioral modeling tasks.
Model Details
Model Description
Commerce Intent models user behavior within a session as an autoregressive sequence modeling problem. Given a sequence of past interactions, the model predicts the next likely item.
The architecture consists of:
- Multi-embedding token fusion (item, brand, category, event)
- Continuous price projection
- Positional encoding
- Transformer encoder with causal masking
- Linear head for next-item prediction
This model is pretrained and can be fine-tuned for recommendation, ranking, or conversion modeling tasks.
- Developed by: infinity6
- Model type: Sequential autoregressive transformer
- Language(s): Structured e-commerce interaction data (non-NLP)
- License: Apache 2.0
- Finetuned from model: None (trained from scratch)
Intended Use
Direct Use
The model can be used directly for:
- Next-item prediction
- Session-based recommendation
- Behavioral embedding extraction
- Purchase intent modeling
- Real-time ranking systems
Example:
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained(
"i6-aiworks/ecomm_shop_intent_pretrained", trust_remote_code=True
)
# TODO: map items and remap categories.
# TODO: freeze layers and train with your data.
# making inference
model.eval()
device = "cpu"
# batch_size 1 | seq_len = 3
itms = torch.tensor([[12, 45, 78]], dtype=torch.long).to(device)
brds = torch.tensor([[3, 7, 2]], dtype=torch.long).to(device)
cats = torch.tensor([[8, 8, 15]], dtype=torch.long).to(device)
prcs = torch.tensor([[29.9, 35.0, 15.5]], dtype=torch.float).to(device)
evts = torch.tensor([[1, 1, 2]], dtype=torch.long).to(device)
mask = torch.tensor([[1, 1, 1]], dtype=torch.bool).to(device)
with torch.no_grad():
outputs = model(
itms=itms, # items
brds=brds, # brands
cats=cats, # categories
prcs=prcs, # prices
evts=evts, # events
attention_mask=mask,
labels=None # just inference -- without loss
)
logits = outputs.logits # (B, L-1, num_itm)
print("Logits shape:", logits.shape)
Inputs must include:
itmsbrdscatsevtsprcsattention_mask
Downstream Use
The model can be fine-tuned for:
- Conversion prediction
- Cart abandonment modeling
- Customer lifetime value modeling
- Cross-sell / upsell recommendation
- Personalized search ranking
Out-of-Scope Use
This model is not suitable for:
- Natural language tasks
- Image tasks
- Generative text modeling
- Multi-user graph modeling without adaptation
- Cold-start scenarios without item mappings and category remapping
Bias, Risks, and Limitations
- The model reflects behavioral biases present in historical e-commerce data.
- Popularity bias may emerge due to item frequency distribution.
- Model performance depends on session length and interaction quality.
- Cold-start performance for unseen items is limited.
- It does not encode demographic or identity-aware fairness constraints.
Recommendations
- Monitor recommendation fairness and popularity skew.
- Retrain periodically to reflect new item distributions.
- Apply business constraints in production systems.
- Use A/B testing before large-scale deployment.
Training Details
Training Data
The model was trained on large-scale anonymized e-commerce interaction logs containing:
- Session-based user interactions
- Item identifiers
- Brand identifiers
- Category identifiers
- Event types
- Timestamped behavioral sequences
- Price values (log-normalized and standardized)
Sessions shorter than a minimum threshold were filtered.
Training Data
Data Sources and Preparation
The model was trained on a unified, large-scale corpus of e-commerce interaction data, aggregating and normalizing multiple public datasets to create a robust foundation for sequential behavior modeling.
The training data combines the following sources:
| Dataset | Description | Key Statistics |
|---|---|---|
| E-commerce behavior data from multi category store | Real event logs from a multi-category e-commerce platform | ~285M records |
| E-commerce Clickstream and Transaction Dataset (Kaggle) | Sequential event data including views and clicks | ~500K+ events |
| E-Commerce Behavior Dataset – Agents for Data | Product interactions from ~18k users across multiple event types | ~2M interactions |
| Retail Rocket clickstream dataset | Industry-standard dataset with views, carts, and purchases | ~2.7M events |
| SIGIR 2021 / Coveo Session data challenge | Navigation sessions with clicks, adds, purchase + metadata | ~30M events |
| JDsearch dataset | Real interactionswith search queries from JD.com platform | ~26M interactions |
Data Unification and Normalization
All datasets underwent a rigorous unification and normalization process:
- Schema Alignment: Standardized field names and types across all sources (item_id, brand_id, category_id, event_type, timestamp, price)
- Event Type Normalization: Mapped varied event nomenclature to a standardized taxonomy (view, cart, purchase)
- ID Harmonization: Created consistent ID spaces for items, brands, and categories through cross-dataset mapping
- Temporal Alignment: Unified timestamp formats and established consistent session windows
- Price Normalization: Applied log-normalization (log1p) followed by standardization using global statistics
- Session Construction: Reconstructed user sessions based on temporal proximity and interaction patterns
- Quality Filtering: Removed sessions below minimum length threshold and filtered anomalous interactions
This diverse and comprehensive training corpus enables the model to learn robust representations of e-commerce behavior patterns across different platforms, markets, and interaction types, serving as a strong foundation for downstream fine-tuning tasks.
Preprocessing
- Missing categorical values replaced with
UNK - Price values transformed via
log1p - Standardization using global mean and standard deviation
- Session truncation to fixed-length sequences
- Right padding with attention masking
Training Objective
Next-item autoregressive prediction using cross-entropy loss with padding ignored.
Training Regime
- Precision: FP32
- Optimizer: AdamW
- Learning Rate: 1e-3 with warmup
- Gradient Clipping: 5.0
- Causal masking applied
Evaluation
Metrics
- Cross-Entropy Loss
- Perplexity
- Recall@20
Results
On the evaluation split, the model achieved:
- Perplexity: 24.04
- Recall@20: 0.6823
These results indicate strong next-item prediction performance in session-based e-commerce interaction modeling.
Summary
The model demonstrates:
- Low predictive uncertainty (Perplexity 24.04)
- High ranking quality for next-item recommendation (Recall@20 of 68.23%)
Performance may vary depending on dataset distribution, session length, and preprocessing configuration.
Environmental Impact
- Hardware: GPU NVIDIA H100 NVL (94GB PCIe 5.0)
- Precision: FP32
- Training Duration: Several hours (varies by configuration)
- Carbon Impact: ≈45 kg CO₂e (estimated based on energy consumption of 30h on H100 GPU)
Limitations
- No long-term user modeling beyond session scope
- Does not include user-level embeddings
- Requires predefined categorical vocabularies
- Limited generalization to unseen item IDs
- Downloads last month
- 53