Upload folder using huggingface_hub

c1377bd verified about 1 month ago

9.05 kB

	---
	library_name: transformers
	tags:
	- recommendation
	- e-commerce
	- sequential-modeling
	- next-item-prediction
	- behavioral-modeling
	---

	# Commerce Intent

	## Model Overview

	Commerce Intent is a pretrained sequential behavioral model for e-commerce session understanding. It is trained to predict the next item in a user session based on historical interaction sequences.

	The model learns representations from multi-modal structured signals, including:

	- Item ID
	- Brand
	- Category
	- Event type (view, cart and purchase)
	- Normalized price
	- Positional order within the session

	It is designed as a foundation model for downstream recommendation and behavioral modeling tasks.

	---

	## Model Details

	### Model Description

	Commerce Intent models user behavior within a session as an autoregressive sequence modeling problem. Given a sequence of past interactions, the model predicts the next likely item.

	The architecture consists of:

	- Multi-embedding token fusion (item, brand, category, event)
	- Continuous price projection
	- Positional encoding
	- Transformer encoder with causal masking
	- Linear head for next-item prediction

	This model is pretrained and can be fine-tuned for recommendation, ranking, or conversion modeling tasks.

	- Developed by: infinity6
	- Model type: Sequential autoregressive transformer
	- Language(s): Structured e-commerce interaction data (non-NLP)
	- License: Apache 2.0
	- Finetuned from model: None (trained from scratch)

	---

	## Dependencies

	This model depends on the external package:

	- https://github.com/infinity6-ai/i6model_ecomm

	The package contains the custom architecture required to correctly load and run the model. You must install it before using Commerce Intent.

	### Installation

	Clone the repository:

	```bash
	git clone https://github.com/infinity6-ai/i6model_ecomm.git
	cd i6model_ecomm
	pip install .
	```

	---

	## Intended Use

	### Direct Use

	The model can be used directly for:

	- Next-item prediction
	- Session-based recommendation
	- Behavioral embedding extraction
	- Purchase intent modeling
	- Real-time ranking systems

	Example:

	```python
	import torch

	from i6modelecomm.model import i6modelecomm

	model = i6modelecomm.CommerceIntent.from_pretrained(
	"infinity6/ecomm_shop_intent_pretrained"
	)

	# TODO: map items and remap categories.

	# TODO: freeze layers and train with your data.

	model.eval()

	D = 'cpu'

	# batch_size \| seq_len = 3
	itms = torch.tensor([[12, 45, 78]], dtype=torch.long).to(D)
	brds = torch.tensor([[3, 7, 2]], dtype=torch.long).to(D)
	cats = torch.tensor([[8, 8, 15]], dtype=torch.long).to(D)
	prcs = torch.tensor([[29.9, 35.0, 15.5]], dtype=torch.float).to(D)
	evts = torch.tensor([[1, 1, 2]], dtype=torch.long).to(D)

	# mask
	mask = torch.tensor([[1, 1, 1]], dtype=torch.bool).to(D)

	with torch.no_grad():
	outputs = model(
	itms=itms, # items
	brds=brds, # brands
	cats=cats, # categories
	prcs=prcs, # prices
	evts=evts, # events
	attention_mask=mask,
	labels=None # inference only -- no loss computation
	)

	# logits tem shape (B, L-1, num_itm)
	logits = outputs.logits

	print("Logits shape:", logits.shape)
	```

	Inputs must include:

	- `itms`
	- `brds`
	- `cats`
	- `evts`
	- `prcs`
	- `attention_mask`

	---

	### Downstream Use

	The model can be fine-tuned for:

	- Conversion prediction
	- Cart abandonment modeling
	- Customer lifetime value modeling
	- Cross-sell / upsell recommendation
	- Personalized search ranking

	---

	### Out-of-Scope Use

	This model is not suitable for:

	- Natural language tasks
	- Image tasks
	- Generative text modeling
	- Multi-user graph modeling without adaptation
	- Cold-start scenarios without item mappings and category remapping

	---

	## Bias, Risks, and Limitations

	- The model reflects behavioral biases present in historical e-commerce data.
	- Popularity bias may emerge due to item frequency distribution.
	- Model performance depends on session length and interaction quality.
	- Cold-start performance for unseen items is limited.
	- It does not encode demographic or identity-aware fairness constraints.

	### Recommendations

	- Monitor recommendation fairness and popularity skew.
	- Retrain periodically to reflect new item distributions.
	- Apply business constraints in production systems.
	- Use A/B testing before large-scale deployment.

	---

	## Training Details

	### Training Data

	The model was trained on large-scale anonymized e-commerce interaction logs containing:

	- Session-based user interactions
	- Item identifiers
	- Brand identifiers
	- Category identifiers
	- Event types
	- Timestamped behavioral sequences
	- Price values (log-normalized and standardized)

	Sessions shorter than a minimum threshold were filtered.

	---

	## Training Data

	### Data Sources and Preparation

	The model was trained on a unified, large-scale corpus of e-commerce interaction data, aggregating and normalizing multiple public datasets to create a robust foundation for sequential behavior modeling.

	The training data combines the following sources:

	\| Dataset \| Description \| Key Statistics \|
	\|---------------------------------------------------------------\|------------------------------------------------------------------\|-------------------\|
	\| E-commerce behavior data from multi category store \| Real event logs from a multi-category e-commerce platform \| ~285M records \|
	\| E-commerce Clickstream and Transaction Dataset (Kaggle) \| Sequential event data including views and clicks \| ~500K+ events \|
	\| E-Commerce Behavior Dataset – Agents for Data \| Product interactions from ~18k users across multiple event types \| ~2M interactions \|
	\| Retail Rocket clickstream dataset \| Industry-standard dataset with views, carts, and purchases \| ~2.7M events \|
	\| SIGIR 2021 / Coveo Session data challenge \| Navigation sessions with clicks, adds, purchase + metadata \| ~30M events \|
	\| JDsearch dataset \| Real interactionswith search queries from JD.com platform \| ~26M interactions \|

	### Data Unification and Normalization

	All datasets underwent a rigorous unification and normalization process:

	- Schema Alignment: Standardized field names and types across all sources (item_id, brand_id, category_id, event_type, timestamp, price)
	- Event Type Normalization: Mapped varied event nomenclature to a standardized taxonomy (view, cart, purchase)
	- ID Harmonization: Created consistent ID spaces for items, brands, and categories through cross-dataset mapping
	- Temporal Alignment: Unified timestamp formats and established consistent session windows
	- Price Normalization: Applied log-normalization (log1p) followed by standardization using global statistics
	- Session Construction: Reconstructed user sessions based on temporal proximity and interaction patterns
	- Quality Filtering: Removed sessions below minimum length threshold and filtered anomalous interactions

	This diverse and comprehensive training corpus enables the model to learn robust representations of e-commerce behavior patterns across different platforms, markets, and interaction types, serving as a strong foundation for downstream fine-tuning tasks.

	---

	### Preprocessing

	- Missing categorical values replaced with `UNK`
	- Price values transformed via `log1p`
	- Standardization using global mean and standard deviation
	- Session truncation to fixed-length sequences
	- Right padding with attention masking

	---

	### Training Objective

	Next-item autoregressive prediction using cross-entropy loss with padding ignored.

	---

	### Training Regime

	- Precision: FP32
	- Optimizer: AdamW
	- Learning Rate: 1e-3 with warmup
	- Gradient Clipping: 5.0
	- Causal masking applied

	---

	## Evaluation

	### Metrics

	- Cross-Entropy Loss
	- Perplexity
	- Recall@20

	### Results

	On the evaluation split, the model achieved:

	- Perplexity: 24.04
	- Recall@20: 0.6823

	These results indicate strong next-item prediction performance in session-based e-commerce interaction modeling.

	### Summary

	The model demonstrates:

	- Low predictive uncertainty (Perplexity 24.04)
	- High ranking quality for next-item recommendation (Recall@20 of 68.23%)

	Performance may vary depending on dataset distribution, session length, and preprocessing configuration.

	---

	## Environmental Impact

	- Hardware: GPU NVIDIA H100 NVL (94GB PCIe 5.0)
	- Precision: FP32
	- Training Duration: Several hours (varies by configuration)
	- Carbon Impact: ≈45 kg CO₂e (estimated based on energy consumption of 30h on H100 GPU)

	---

	## Limitations

	- No long-term user modeling beyond session scope
	- Does not include user-level embeddings
	- Requires predefined categorical vocabularies
	- Limited generalization to unseen item IDs