File size: 9,049 Bytes

---
library_name: transformers
tags:
  - recommendation
  - e-commerce
  - sequential-modeling
  - next-item-prediction
  - behavioral-modeling
---

# Commerce Intent

## Model Overview

Commerce Intent is a pretrained sequential behavioral model for e-commerce session understanding. It is trained to predict the next item in a user session based on historical interaction sequences.

The model learns representations from multi-modal structured signals, including:

- Item ID
- Brand
- Category
- Event type (view, cart and purchase)
- Normalized price
- Positional order within the session

It is designed as a foundation model for downstream recommendation and behavioral modeling tasks.

---

## Model Details

### Model Description

Commerce Intent models user behavior within a session as an autoregressive sequence modeling problem. Given a sequence of past interactions, the model predicts the next likely item.

The architecture consists of:

- Multi-embedding token fusion (item, brand, category, event)
- Continuous price projection
- Positional encoding
- Transformer encoder with causal masking
- Linear head for next-item prediction

This model is pretrained and can be fine-tuned for recommendation, ranking, or conversion modeling tasks.

- **Developed by:** infinity6
- **Model type:** Sequential autoregressive transformer
- **Language(s):** Structured e-commerce interaction data (non-NLP)
- **License:** Apache 2.0
- **Finetuned from model:** None (trained from scratch)

---

## Dependencies

This model depends on the external package:

- https://github.com/infinity6-ai/i6model_ecomm

The package contains the custom architecture required to correctly load and run the model. You must install it before using Commerce Intent.

### Installation

Clone the repository:

```bash
git clone https://github.com/infinity6-ai/i6model_ecomm.git
cd i6model_ecomm
pip install .
```

---

## Intended Use

### Direct Use

The model can be used directly for:

- Next-item prediction
- Session-based recommendation
- Behavioral embedding extraction
- Purchase intent modeling
- Real-time ranking systems

Example:

```python
import torch

from i6modelecomm.model import i6modelecomm

model = i6modelecomm.CommerceIntent.from_pretrained(
    "infinity6/ecomm_shop_intent_pretrained"
)

# TODO: map items and remap categories.

# TODO: freeze layers and train with your data.

model.eval()

D = 'cpu'

# batch_size | seq_len = 3
itms = torch.tensor([[12, 45, 78]], dtype=torch.long).to(D)
brds = torch.tensor([[3, 7, 2]], dtype=torch.long).to(D)
cats = torch.tensor([[8, 8, 15]], dtype=torch.long).to(D)
prcs = torch.tensor([[29.9, 35.0, 15.5]], dtype=torch.float).to(D)
evts = torch.tensor([[1, 1, 2]], dtype=torch.long).to(D)

# mask
mask = torch.tensor([[1, 1, 1]], dtype=torch.bool).to(D)

with torch.no_grad():
    outputs = model(
        itms=itms,      # items
        brds=brds,      # brands
        cats=cats,      # categories
        prcs=prcs,      # prices
        evts=evts,      # events
        attention_mask=mask,
        labels=None     # inference only -- no loss computation
    )

# logits tem shape (B, L-1, num_itm)
logits = outputs.logits

print("Logits shape:", logits.shape)
```

Inputs must include:

- `itms`
- `brds`
- `cats`
- `evts`
- `prcs`
- `attention_mask`

---

### Downstream Use

The model can be fine-tuned for:

- Conversion prediction
- Cart abandonment modeling
- Customer lifetime value modeling
- Cross-sell / upsell recommendation
- Personalized search ranking

---

### Out-of-Scope Use

This model is not suitable for:

- Natural language tasks
- Image tasks
- Generative text modeling
- Multi-user graph modeling without adaptation
- Cold-start scenarios without item mappings and category remapping

---

## Bias, Risks, and Limitations

- The model reflects behavioral biases present in historical e-commerce data.
- Popularity bias may emerge due to item frequency distribution.
- Model performance depends on session length and interaction quality.
- Cold-start performance for unseen items is limited.
- It does not encode demographic or identity-aware fairness constraints.

### Recommendations

- Monitor recommendation fairness and popularity skew.
- Retrain periodically to reflect new item distributions.
- Apply business constraints in production systems.
- Use A/B testing before large-scale deployment.

---

## Training Details

### Training Data

The model was trained on large-scale anonymized e-commerce interaction logs containing:

- Session-based user interactions
- Item identifiers
- Brand identifiers
- Category identifiers
- Event types
- Timestamped behavioral sequences
- Price values (log-normalized and standardized)

Sessions shorter than a minimum threshold were filtered.

---

## Training Data

### Data Sources and Preparation

The model was trained on a unified, large-scale corpus of e-commerce interaction data, aggregating and normalizing multiple public datasets to create a robust foundation for sequential behavior modeling.

The training data combines the following sources:

| Dataset                                                       | Description                                                      | Key Statistics    |
|---------------------------------------------------------------|------------------------------------------------------------------|-------------------|
| **E-commerce behavior data from multi category store**        | Real event logs from a multi-category e-commerce platform        | ~285M records     |
| **E-commerce Clickstream and Transaction Dataset (Kaggle)**   | Sequential event data including views and clicks                 | ~500K+ events     |
| **E-Commerce Behavior Dataset – Agents for Data**             | Product interactions from ~18k users across multiple event types | ~2M interactions  |
| **Retail Rocket clickstream dataset**                         | Industry-standard dataset with views, carts, and purchases       | ~2.7M events      |
| **SIGIR 2021 / Coveo Session data challenge**                 | Navigation sessions with clicks, adds, purchase + metadata       | ~30M events       |
| **JDsearch dataset**                                          | Real interactionswith search queries from JD.com platform        | ~26M interactions |

### Data Unification and Normalization

All datasets underwent a rigorous unification and normalization process:

- **Schema Alignment**: Standardized field names and types across all sources (item_id, brand_id, category_id, event_type, timestamp, price)
- **Event Type Normalization**: Mapped varied event nomenclature to a standardized taxonomy (view, cart, purchase)
- **ID Harmonization**: Created consistent ID spaces for items, brands, and categories through cross-dataset mapping
- **Temporal Alignment**: Unified timestamp formats and established consistent session windows
- **Price Normalization**: Applied log-normalization (log1p) followed by standardization using global statistics
- **Session Construction**: Reconstructed user sessions based on temporal proximity and interaction patterns
- **Quality Filtering**: Removed sessions below minimum length threshold and filtered anomalous interactions

This diverse and comprehensive training corpus enables the model to learn robust representations of e-commerce behavior patterns across different platforms, markets, and interaction types, serving as a strong foundation for downstream fine-tuning tasks.

---

### Preprocessing

- Missing categorical values replaced with `UNK`
- Price values transformed via `log1p`
- Standardization using global mean and standard deviation
- Session truncation to fixed-length sequences
- Right padding with attention masking

---

### Training Objective

Next-item autoregressive prediction using cross-entropy loss with padding ignored.

---

### Training Regime

- **Precision:** FP32
- **Optimizer:** AdamW
- **Learning Rate:** 1e-3 with warmup
- **Gradient Clipping:** 5.0
- **Causal masking applied**

---

## Evaluation

### Metrics

- Cross-Entropy Loss
- Perplexity
- Recall@20

### Results

On the evaluation split, the model achieved:

- **Perplexity:** 24.04
- **Recall@20:** 0.6823

These results indicate strong next-item prediction performance in session-based e-commerce interaction modeling.

### Summary

The model demonstrates:

- Low predictive uncertainty (Perplexity 24.04)
- High ranking quality for next-item recommendation (Recall@20 of 68.23%)

Performance may vary depending on dataset distribution, session length, and preprocessing configuration.

---

## Environmental Impact

- **Hardware:** GPU NVIDIA H100 NVL (94GB PCIe 5.0)
- **Precision:** FP32
- **Training Duration:** Several hours (varies by configuration)
- **Carbon Impact:** ≈45 kg CO₂e (estimated based on energy consumption of 30h on H100 GPU)

---

## Limitations

- No long-term user modeling beyond session scope
- Does not include user-level embeddings
- Requires predefined categorical vocabularies
- Limited generalization to unseen item IDs