pip install pyarmor==7.7.4

Commerce Intent

Model Overview

Commerce Intent is a pretrained sequential behavioral model for e-commerce session understanding. It is trained to predict the next item in a user session based on historical interaction sequences.

The model learns representations from multi-modal structured signals, including:

Item ID
Brand
Category
Event type (view, cart and purchase)
Normalized price
Positional order within the session

It is designed as a foundation model for downstream recommendation and behavioral modeling tasks.

Model Details

Model Description

Commerce Intent models user behavior within a session as an autoregressive sequence modeling problem. Given a sequence of past interactions, the model predicts the next likely item.

The architecture consists of:

Multi-embedding token fusion (item, brand, category, event)
Continuous price projection
Positional encoding
Transformer encoder with causal masking
Linear head for next-item prediction

This model is pretrained and can be fine-tuned for recommendation, ranking, or conversion modeling tasks.

Developed by: infinity6
Model type: Sequential autoregressive transformer
Language(s): Structured e-commerce interaction data (non-NLP)
License: Apache 2.0
Finetuned from model: None (trained from scratch)

Intended Use

Direct Use

The model can be used directly for:

Next-item prediction
Session-based recommendation
Behavioral embedding extraction
Purchase intent modeling
Real-time ranking systems

Example:

import torch

from transformers import AutoModel

model = AutoModel.from_pretrained(
    "i6-aiworks/ecomm_shop_intent_pretrained", trust_remote_code=True
)

# TODO: map items and remap categories.

# TODO: freeze layers and train with your data.

# making inference
model.eval()

device = "cpu"

# batch_size 1 | seq_len = 3
itms = torch.tensor([[12, 45, 78]], dtype=torch.long).to(device)
brds = torch.tensor([[3, 7, 2]], dtype=torch.long).to(device)
cats = torch.tensor([[8, 8, 15]], dtype=torch.long).to(device)
prcs = torch.tensor([[29.9, 35.0, 15.5]], dtype=torch.float).to(device)
evts = torch.tensor([[1, 1, 2]], dtype=torch.long).to(device)

mask = torch.tensor([[1, 1, 1]], dtype=torch.bool).to(device)

with torch.no_grad():
    outputs = model(
        itms=itms,      # items
        brds=brds,      # brands
        cats=cats,      # categories
        prcs=prcs,      # prices
        evts=evts,      # events
        attention_mask=mask,
        labels=None     # just inference -- without loss
    )

logits = outputs.logits # (B, L-1, num_itm)

print("Logits shape:", logits.shape)

Inputs must include:

itms
brds
cats
evts
prcs
attention_mask

Downstream Use

The model can be fine-tuned for:

Conversion prediction
Cart abandonment modeling
Customer lifetime value modeling
Cross-sell / upsell recommendation
Personalized search ranking

Out-of-Scope Use

This model is not suitable for:

Natural language tasks
Image tasks
Generative text modeling
Multi-user graph modeling without adaptation
Cold-start scenarios without item mappings and category remapping

Bias, Risks, and Limitations

The model reflects behavioral biases present in historical e-commerce data.
Popularity bias may emerge due to item frequency distribution.
Model performance depends on session length and interaction quality.
Cold-start performance for unseen items is limited.
It does not encode demographic or identity-aware fairness constraints.

Recommendations

Monitor recommendation fairness and popularity skew.
Retrain periodically to reflect new item distributions.
Apply business constraints in production systems.
Use A/B testing before large-scale deployment.

Training Details

Training Data

The model was trained on large-scale anonymized e-commerce interaction logs containing:

Session-based user interactions
Item identifiers
Brand identifiers
Category identifiers
Event types
Timestamped behavioral sequences
Price values (log-normalized and standardized)

Sessions shorter than a minimum threshold were filtered.

Training Data

Data Sources and Preparation

The model was trained on a unified, large-scale corpus of e-commerce interaction data, aggregating and normalizing multiple public datasets to create a robust foundation for sequential behavior modeling.

The training data combines the following sources:

Dataset	Description	Key Statistics
E-commerce behavior data from multi category store	Real event logs from a multi-category e-commerce platform	~285M records
E-commerce Clickstream and Transaction Dataset (Kaggle)	Sequential event data including views and clicks	~500K+ events
E-Commerce Behavior Dataset – Agents for Data	Product interactions from ~18k users across multiple event types	~2M interactions
Retail Rocket clickstream dataset	Industry-standard dataset with views, carts, and purchases	~2.7M events
SIGIR 2021 / Coveo Session data challenge	Navigation sessions with clicks, adds, purchase + metadata	~30M events
JDsearch dataset	Real interactionswith search queries from JD.com platform	~26M interactions

Data Unification and Normalization

All datasets underwent a rigorous unification and normalization process:

Schema Alignment: Standardized field names and types across all sources (item_id, brand_id, category_id, event_type, timestamp, price)
Event Type Normalization: Mapped varied event nomenclature to a standardized taxonomy (view, cart, purchase)
ID Harmonization: Created consistent ID spaces for items, brands, and categories through cross-dataset mapping
Temporal Alignment: Unified timestamp formats and established consistent session windows
Price Normalization: Applied log-normalization (log1p) followed by standardization using global statistics
Session Construction: Reconstructed user sessions based on temporal proximity and interaction patterns
Quality Filtering: Removed sessions below minimum length threshold and filtered anomalous interactions

This diverse and comprehensive training corpus enables the model to learn robust representations of e-commerce behavior patterns across different platforms, markets, and interaction types, serving as a strong foundation for downstream fine-tuning tasks.

Preprocessing

Missing categorical values replaced with UNK
Price values transformed via log1p
Standardization using global mean and standard deviation
Session truncation to fixed-length sequences
Right padding with attention masking

Training Objective

Next-item autoregressive prediction using cross-entropy loss with padding ignored.

Training Regime

Precision: FP32
Optimizer: AdamW
Learning Rate: 1e-3 with warmup
Gradient Clipping: 5.0
Causal masking applied

Evaluation

Metrics

Cross-Entropy Loss
Perplexity
Recall@20

Results

On the evaluation split, the model achieved:

Perplexity: 24.04
Recall@20: 0.6823

These results indicate strong next-item prediction performance in session-based e-commerce interaction modeling.

Summary

The model demonstrates:

Low predictive uncertainty (Perplexity 24.04)
High ranking quality for next-item recommendation (Recall@20 of 68.23%)

Performance may vary depending on dataset distribution, session length, and preprocessing configuration.

Environmental Impact

Hardware: GPU NVIDIA H100 NVL (94GB PCIe 5.0)
Precision: FP32
Training Duration: Several hours (varies by configuration)
Carbon Impact: ≈45 kg CO₂e (estimated based on energy consumption of 30h on H100 GPU)

Limitations

No long-term user modeling beyond session scope
Does not include user-level embeddings
Requires predefined categorical vocabularies
Limited generalization to unseen item IDs

Downloads last month: 53

Safetensors

Model size

54.1M params

Tensor type

F32