i6-aiworks's picture
Upload folder using huggingface_hub
c1377bd verified
---
library_name: transformers
tags:
- recommendation
- e-commerce
- sequential-modeling
- next-item-prediction
- behavioral-modeling
---
# Commerce Intent
## Model Overview
Commerce Intent is a pretrained sequential behavioral model for e-commerce session understanding. It is trained to predict the next item in a user session based on historical interaction sequences.
The model learns representations from multi-modal structured signals, including:
- Item ID
- Brand
- Category
- Event type (view, cart and purchase)
- Normalized price
- Positional order within the session
It is designed as a foundation model for downstream recommendation and behavioral modeling tasks.
---
## Model Details
### Model Description
Commerce Intent models user behavior within a session as an autoregressive sequence modeling problem. Given a sequence of past interactions, the model predicts the next likely item.
The architecture consists of:
- Multi-embedding token fusion (item, brand, category, event)
- Continuous price projection
- Positional encoding
- Transformer encoder with causal masking
- Linear head for next-item prediction
This model is pretrained and can be fine-tuned for recommendation, ranking, or conversion modeling tasks.
- **Developed by:** infinity6
- **Model type:** Sequential autoregressive transformer
- **Language(s):** Structured e-commerce interaction data (non-NLP)
- **License:** Apache 2.0
- **Finetuned from model:** None (trained from scratch)
---
## Dependencies
This model depends on the external package:
- https://github.com/infinity6-ai/i6model_ecomm
The package contains the custom architecture required to correctly load and run the model. You must install it before using Commerce Intent.
### Installation
Clone the repository:
```bash
git clone https://github.com/infinity6-ai/i6model_ecomm.git
cd i6model_ecomm
pip install .
```
---
## Intended Use
### Direct Use
The model can be used directly for:
- Next-item prediction
- Session-based recommendation
- Behavioral embedding extraction
- Purchase intent modeling
- Real-time ranking systems
Example:
```python
import torch
from i6modelecomm.model import i6modelecomm
model = i6modelecomm.CommerceIntent.from_pretrained(
"infinity6/ecomm_shop_intent_pretrained"
)
# TODO: map items and remap categories.
# TODO: freeze layers and train with your data.
model.eval()
D = 'cpu'
# batch_size | seq_len = 3
itms = torch.tensor([[12, 45, 78]], dtype=torch.long).to(D)
brds = torch.tensor([[3, 7, 2]], dtype=torch.long).to(D)
cats = torch.tensor([[8, 8, 15]], dtype=torch.long).to(D)
prcs = torch.tensor([[29.9, 35.0, 15.5]], dtype=torch.float).to(D)
evts = torch.tensor([[1, 1, 2]], dtype=torch.long).to(D)
# mask
mask = torch.tensor([[1, 1, 1]], dtype=torch.bool).to(D)
with torch.no_grad():
outputs = model(
itms=itms, # items
brds=brds, # brands
cats=cats, # categories
prcs=prcs, # prices
evts=evts, # events
attention_mask=mask,
labels=None # inference only -- no loss computation
)
# logits tem shape (B, L-1, num_itm)
logits = outputs.logits
print("Logits shape:", logits.shape)
```
Inputs must include:
- `itms`
- `brds`
- `cats`
- `evts`
- `prcs`
- `attention_mask`
---
### Downstream Use
The model can be fine-tuned for:
- Conversion prediction
- Cart abandonment modeling
- Customer lifetime value modeling
- Cross-sell / upsell recommendation
- Personalized search ranking
---
### Out-of-Scope Use
This model is not suitable for:
- Natural language tasks
- Image tasks
- Generative text modeling
- Multi-user graph modeling without adaptation
- Cold-start scenarios without item mappings and category remapping
---
## Bias, Risks, and Limitations
- The model reflects behavioral biases present in historical e-commerce data.
- Popularity bias may emerge due to item frequency distribution.
- Model performance depends on session length and interaction quality.
- Cold-start performance for unseen items is limited.
- It does not encode demographic or identity-aware fairness constraints.
### Recommendations
- Monitor recommendation fairness and popularity skew.
- Retrain periodically to reflect new item distributions.
- Apply business constraints in production systems.
- Use A/B testing before large-scale deployment.
---
## Training Details
### Training Data
The model was trained on large-scale anonymized e-commerce interaction logs containing:
- Session-based user interactions
- Item identifiers
- Brand identifiers
- Category identifiers
- Event types
- Timestamped behavioral sequences
- Price values (log-normalized and standardized)
Sessions shorter than a minimum threshold were filtered.
---
## Training Data
### Data Sources and Preparation
The model was trained on a unified, large-scale corpus of e-commerce interaction data, aggregating and normalizing multiple public datasets to create a robust foundation for sequential behavior modeling.
The training data combines the following sources:
| Dataset | Description | Key Statistics |
|---------------------------------------------------------------|------------------------------------------------------------------|-------------------|
| **E-commerce behavior data from multi category store** | Real event logs from a multi-category e-commerce platform | ~285M records |
| **E-commerce Clickstream and Transaction Dataset (Kaggle)** | Sequential event data including views and clicks | ~500K+ events |
| **E-Commerce Behavior Dataset – Agents for Data** | Product interactions from ~18k users across multiple event types | ~2M interactions |
| **Retail Rocket clickstream dataset** | Industry-standard dataset with views, carts, and purchases | ~2.7M events |
| **SIGIR 2021 / Coveo Session data challenge** | Navigation sessions with clicks, adds, purchase + metadata | ~30M events |
| **JDsearch dataset** | Real interactionswith search queries from JD.com platform | ~26M interactions |
### Data Unification and Normalization
All datasets underwent a rigorous unification and normalization process:
- **Schema Alignment**: Standardized field names and types across all sources (item_id, brand_id, category_id, event_type, timestamp, price)
- **Event Type Normalization**: Mapped varied event nomenclature to a standardized taxonomy (view, cart, purchase)
- **ID Harmonization**: Created consistent ID spaces for items, brands, and categories through cross-dataset mapping
- **Temporal Alignment**: Unified timestamp formats and established consistent session windows
- **Price Normalization**: Applied log-normalization (log1p) followed by standardization using global statistics
- **Session Construction**: Reconstructed user sessions based on temporal proximity and interaction patterns
- **Quality Filtering**: Removed sessions below minimum length threshold and filtered anomalous interactions
This diverse and comprehensive training corpus enables the model to learn robust representations of e-commerce behavior patterns across different platforms, markets, and interaction types, serving as a strong foundation for downstream fine-tuning tasks.
---
### Preprocessing
- Missing categorical values replaced with `UNK`
- Price values transformed via `log1p`
- Standardization using global mean and standard deviation
- Session truncation to fixed-length sequences
- Right padding with attention masking
---
### Training Objective
Next-item autoregressive prediction using cross-entropy loss with padding ignored.
---
### Training Regime
- **Precision:** FP32
- **Optimizer:** AdamW
- **Learning Rate:** 1e-3 with warmup
- **Gradient Clipping:** 5.0
- **Causal masking applied**
---
## Evaluation
### Metrics
- Cross-Entropy Loss
- Perplexity
- Recall@20
### Results
On the evaluation split, the model achieved:
- **Perplexity:** 24.04
- **Recall@20:** 0.6823
These results indicate strong next-item prediction performance in session-based e-commerce interaction modeling.
### Summary
The model demonstrates:
- Low predictive uncertainty (Perplexity 24.04)
- High ranking quality for next-item recommendation (Recall@20 of 68.23%)
Performance may vary depending on dataset distribution, session length, and preprocessing configuration.
---
## Environmental Impact
- **Hardware:** GPU NVIDIA H100 NVL (94GB PCIe 5.0)
- **Precision:** FP32
- **Training Duration:** Several hours (varies by configuration)
- **Carbon Impact:** ≈45 kg CO₂e (estimated based on energy consumption of 30h on H100 GPU)
---
## Limitations
- No long-term user modeling beyond session scope
- Does not include user-level embeddings
- Requires predefined categorical vocabularies
- Limited generalization to unseen item IDs