YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Model Card: DreamPrice

Model Details

  • Model name: DreamPrice
  • Model type: Learned world model (RSSM with Mamba-2 backbone)
  • Architecture: DreamerV3-style three-phase training with DRAMA decoupled posteriors
  • Parameters: ~22M (full model)
  • Framework: PyTorch
  • License: CC-BY-NC-4.0
  • Trained checkpoint: step_0100000.pt (100K gradient steps)

Training Results (100K Steps, DGX Spark)

Metric Final Value
World Model ELBO 22.44
Reconstruction Loss 0.001
KL Divergence 19.20
Reward Prediction 3.17
Actor Return (mean) 124.33
Critic Loss 2.50
Training Time ~2.6 hours
Hardware NVIDIA DGX Spark (GB10 GPU, 128 GB unified memory)

World Model Quality (Evaluation on Validation Set)

Metric h=1 h=5 h=10 h=13 h=25
RMSE 5.001 5.167 5.286 5.283 5.243
MAE 2.130 2.168 2.207 2.205 2.197
WMAPE (%) 71.7 72.3 72.6 72.6 72.4
NDR(h) 1.00 1.033 1.057 1.056 1.049

Policy Comparison

Method Mean Return IQM
Cost-plus (25%) 54.8 54.8
Static XGBoost 87.2 85.6
Competitive Matching 42.1 41.5
DQN 68.9 65.2
PPO 76.4 72.8
SAC 82.3 79.6
DreamPrice 124.3 117.4

Intended Use

DreamPrice is a learned dynamics model for retail pricing environments. It is intended for:

  • Counterfactual demand estimation: "What would demand be if price were cut by 5%?"
  • Pricing policy optimization: Learning pricing strategies via imagination-based offline RL
  • Research: Studying learned world models in economic domains

DreamPrice is NOT intended for:

  • Direct deployment in production pricing systems without human oversight
  • Real-time autonomous pricing decisions
  • Categories or retail environments significantly different from the training data

Training Data

  • Dataset: Dominick's Finer Foods scanner data (Kilts Center for Marketing, University of Chicago)
  • Period: September 1989 to May 1997 (400 weeks)
  • Scope: 93 stores, ~18,000 UPCs, 29 product categories
  • Primary category: Canned soup (cso), ~581K training tuples
  • Temporal split: Train weeks 1-280, Validation weeks 281-340, Test weeks 341-400
  • HuggingFace dataset: qbz506/dreamprice-dominicks-cso

Architecture Details

Component Specification
Backbone Mamba-2 SSM (d_model=512) with GRU fallback
Stochastic latent 32 categorical variables x 32 classes (z_dim=1024)
Posterior DRAMA-style decoupled: q(z_t | x_t)
Observation decoder 3-layer MLP: cat(h_t, z_t) -> obs_dim
Demand decoder Causal: theta * log(price) + MLP(z_t, store_features)
Reward ensemble 5 independent heads, twohot distributional (255 bins)
Continue head Linear -> sigmoid
MOPO pessimism r_pessimistic = r_mean - lambda_lcb * r_std

Causal Identification

Price elasticities are estimated via Hausman IV + DML-PLIV and frozen into the decoder:

  • Instrument: Leave-one-out mean log(price) across other stores for same UPC-week
  • Method: DoubleML PLIV with random forest nuisance learners, 5-fold cross-fitting
  • DML-PLIV elasticity: -0.940 (SE=0.006, 95% CI: [-0.952, -0.928])
  • First-stage F-stat: 23,381 (>> 10 Stock-Yogo threshold)

Resources

Limitations

  • Trained on 1989-1997 data; modern retail dynamics may differ substantially
  • Single category evaluation (canned soup); cross-category transfer not validated
  • Offline learning only; no online fine-tuning or real-time deployment tested
  • Causal identification relies on Hausman IV panel structure

Citation

@article{sathish2026dreamprice,
  title  = {DreamPrice: A Learned World Model for Retail Pricing via Mamba-2 Recurrence and Causal Demand Identification},
  author = {Sathish, Sharath},
  year   = {2026},
  url    = {https://github.com/SharathSPhD/dreamprice}
}
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using qbz506/dreamprice-cso 1