File size: 9,049 Bytes
f0ab69f c1377bd f0ab69f ca057d3 f0ab69f c1377bd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 | ---
library_name: transformers
tags:
- recommendation
- e-commerce
- sequential-modeling
- next-item-prediction
- behavioral-modeling
---
# Commerce Intent
## Model Overview
Commerce Intent is a pretrained sequential behavioral model for e-commerce session understanding. It is trained to predict the next item in a user session based on historical interaction sequences.
The model learns representations from multi-modal structured signals, including:
- Item ID
- Brand
- Category
- Event type (view, cart and purchase)
- Normalized price
- Positional order within the session
It is designed as a foundation model for downstream recommendation and behavioral modeling tasks.
---
## Model Details
### Model Description
Commerce Intent models user behavior within a session as an autoregressive sequence modeling problem. Given a sequence of past interactions, the model predicts the next likely item.
The architecture consists of:
- Multi-embedding token fusion (item, brand, category, event)
- Continuous price projection
- Positional encoding
- Transformer encoder with causal masking
- Linear head for next-item prediction
This model is pretrained and can be fine-tuned for recommendation, ranking, or conversion modeling tasks.
- **Developed by:** infinity6
- **Model type:** Sequential autoregressive transformer
- **Language(s):** Structured e-commerce interaction data (non-NLP)
- **License:** Apache 2.0
- **Finetuned from model:** None (trained from scratch)
---
## Dependencies
This model depends on the external package:
- https://github.com/infinity6-ai/i6model_ecomm
The package contains the custom architecture required to correctly load and run the model. You must install it before using Commerce Intent.
### Installation
Clone the repository:
```bash
git clone https://github.com/infinity6-ai/i6model_ecomm.git
cd i6model_ecomm
pip install .
```
---
## Intended Use
### Direct Use
The model can be used directly for:
- Next-item prediction
- Session-based recommendation
- Behavioral embedding extraction
- Purchase intent modeling
- Real-time ranking systems
Example:
```python
import torch
from i6modelecomm.model import i6modelecomm
model = i6modelecomm.CommerceIntent.from_pretrained(
"infinity6/ecomm_shop_intent_pretrained"
)
# TODO: map items and remap categories.
# TODO: freeze layers and train with your data.
model.eval()
D = 'cpu'
# batch_size | seq_len = 3
itms = torch.tensor([[12, 45, 78]], dtype=torch.long).to(D)
brds = torch.tensor([[3, 7, 2]], dtype=torch.long).to(D)
cats = torch.tensor([[8, 8, 15]], dtype=torch.long).to(D)
prcs = torch.tensor([[29.9, 35.0, 15.5]], dtype=torch.float).to(D)
evts = torch.tensor([[1, 1, 2]], dtype=torch.long).to(D)
# mask
mask = torch.tensor([[1, 1, 1]], dtype=torch.bool).to(D)
with torch.no_grad():
outputs = model(
itms=itms, # items
brds=brds, # brands
cats=cats, # categories
prcs=prcs, # prices
evts=evts, # events
attention_mask=mask,
labels=None # inference only -- no loss computation
)
# logits tem shape (B, L-1, num_itm)
logits = outputs.logits
print("Logits shape:", logits.shape)
```
Inputs must include:
- `itms`
- `brds`
- `cats`
- `evts`
- `prcs`
- `attention_mask`
---
### Downstream Use
The model can be fine-tuned for:
- Conversion prediction
- Cart abandonment modeling
- Customer lifetime value modeling
- Cross-sell / upsell recommendation
- Personalized search ranking
---
### Out-of-Scope Use
This model is not suitable for:
- Natural language tasks
- Image tasks
- Generative text modeling
- Multi-user graph modeling without adaptation
- Cold-start scenarios without item mappings and category remapping
---
## Bias, Risks, and Limitations
- The model reflects behavioral biases present in historical e-commerce data.
- Popularity bias may emerge due to item frequency distribution.
- Model performance depends on session length and interaction quality.
- Cold-start performance for unseen items is limited.
- It does not encode demographic or identity-aware fairness constraints.
### Recommendations
- Monitor recommendation fairness and popularity skew.
- Retrain periodically to reflect new item distributions.
- Apply business constraints in production systems.
- Use A/B testing before large-scale deployment.
---
## Training Details
### Training Data
The model was trained on large-scale anonymized e-commerce interaction logs containing:
- Session-based user interactions
- Item identifiers
- Brand identifiers
- Category identifiers
- Event types
- Timestamped behavioral sequences
- Price values (log-normalized and standardized)
Sessions shorter than a minimum threshold were filtered.
---
## Training Data
### Data Sources and Preparation
The model was trained on a unified, large-scale corpus of e-commerce interaction data, aggregating and normalizing multiple public datasets to create a robust foundation for sequential behavior modeling.
The training data combines the following sources:
| Dataset | Description | Key Statistics |
|---------------------------------------------------------------|------------------------------------------------------------------|-------------------|
| **E-commerce behavior data from multi category store** | Real event logs from a multi-category e-commerce platform | ~285M records |
| **E-commerce Clickstream and Transaction Dataset (Kaggle)** | Sequential event data including views and clicks | ~500K+ events |
| **E-Commerce Behavior Dataset – Agents for Data** | Product interactions from ~18k users across multiple event types | ~2M interactions |
| **Retail Rocket clickstream dataset** | Industry-standard dataset with views, carts, and purchases | ~2.7M events |
| **SIGIR 2021 / Coveo Session data challenge** | Navigation sessions with clicks, adds, purchase + metadata | ~30M events |
| **JDsearch dataset** | Real interactionswith search queries from JD.com platform | ~26M interactions |
### Data Unification and Normalization
All datasets underwent a rigorous unification and normalization process:
- **Schema Alignment**: Standardized field names and types across all sources (item_id, brand_id, category_id, event_type, timestamp, price)
- **Event Type Normalization**: Mapped varied event nomenclature to a standardized taxonomy (view, cart, purchase)
- **ID Harmonization**: Created consistent ID spaces for items, brands, and categories through cross-dataset mapping
- **Temporal Alignment**: Unified timestamp formats and established consistent session windows
- **Price Normalization**: Applied log-normalization (log1p) followed by standardization using global statistics
- **Session Construction**: Reconstructed user sessions based on temporal proximity and interaction patterns
- **Quality Filtering**: Removed sessions below minimum length threshold and filtered anomalous interactions
This diverse and comprehensive training corpus enables the model to learn robust representations of e-commerce behavior patterns across different platforms, markets, and interaction types, serving as a strong foundation for downstream fine-tuning tasks.
---
### Preprocessing
- Missing categorical values replaced with `UNK`
- Price values transformed via `log1p`
- Standardization using global mean and standard deviation
- Session truncation to fixed-length sequences
- Right padding with attention masking
---
### Training Objective
Next-item autoregressive prediction using cross-entropy loss with padding ignored.
---
### Training Regime
- **Precision:** FP32
- **Optimizer:** AdamW
- **Learning Rate:** 1e-3 with warmup
- **Gradient Clipping:** 5.0
- **Causal masking applied**
---
## Evaluation
### Metrics
- Cross-Entropy Loss
- Perplexity
- Recall@20
### Results
On the evaluation split, the model achieved:
- **Perplexity:** 24.04
- **Recall@20:** 0.6823
These results indicate strong next-item prediction performance in session-based e-commerce interaction modeling.
### Summary
The model demonstrates:
- Low predictive uncertainty (Perplexity 24.04)
- High ranking quality for next-item recommendation (Recall@20 of 68.23%)
Performance may vary depending on dataset distribution, session length, and preprocessing configuration.
---
## Environmental Impact
- **Hardware:** GPU NVIDIA H100 NVL (94GB PCIe 5.0)
- **Precision:** FP32
- **Training Duration:** Several hours (varies by configuration)
- **Carbon Impact:** ≈45 kg CO₂e (estimated based on energy consumption of 30h on H100 GPU)
---
## Limitations
- No long-term user modeling beyond session scope
- Does not include user-level embeddings
- Requires predefined categorical vocabularies
- Limited generalization to unseen item IDs
|