File size: 3,720 Bytes
RESI miner bundle — USD ONNX + subnet feature_config (14-feature schema)
================================================================================

A) Training (must match validator: ONNX outputs USD, MAPE vs dollar price)

  1. Default train_real_estate.py exports a **single fused ONNX**: raw features
     (same order as feature_config) → StandardScaler inside the graph → trees →
     optional expm1 → USD tensor ``price_usd``. You do **not** need --no-log-target
     for a valid miner model if you keep default fusion (log1p training + expm1
     fused). Input name is ``float_input`` for fused exports.

     Alternative — train directly in dollars (no log head in ONNX):

       MPLBACKEND=Agg python train_real_estate.py \\
         --data training_data.json \\
         --catboost \\
         --out artifacts_miner_usd \\
         --no-log-target

     Legacy unfused tree-only ONNX (no scaler / no expm1 in graph):

       ... --no-onnx-fusion

     Use --all / --xgboost / --lightgbm instead of --catboost if you prefer.

  2. Keep the same feature columns as this bundle: train with DEFAULT redundant
     dropping (omit --no-drop-redundant) so you have exactly the 14 features
     listed in miner_submission/feature_config.json.

     If you train with --no-drop-redundant (17 columns), regenerate the JSON:

       MPLBACKEND=Agg python train_real_estate.py ... --no-drop-redundant \\
         --write-miner-feature-config miner_submission/feature_config.json

B) Export files for chain / HF repo

  3. Copy the chosen ONNX to your repo as model.onnx (e.g. from
     artifacts_miner_usd/catboost_price_model.onnx).

  4. Commit miner_submission/feature_config.json alongside the model (same
     feature order as training / encoder).

  5. ONNX input: one float tensor, shape [N, 14], columns in the exact order
     of "features" in feature_config.json. Fused models use input ``float_input``.
     Output is typically a single tensor; fused USD models name it ``price_usd``
     (take index 0 if your runner binds by position).

C) Validate JSON against subnet rules (optional)

  cd /home/RESI-models
  .venv/bin/python -c "
  from pathlib import Path
  from real_estate.data.config_encoder import load_feature_config
  load_feature_config(Path('/home/46/miner_submission/feature_config.json'))
  print('feature_config.json OK')
  "

D) Do not rely on target_transform.json on-chain — the validator does not apply
   expm1; the model must emit dollars.

E) You cannot change the eval system — submission-only rules

   The validator always: encodes raw API fields → float32 matrix (same order as
   your feature_config) → ONNX → treats outputs as USD for MAPE.

   Therefore you MUST NOT depend on any extra JSON, hooks, or server-side
   preprocessing. Everything the model needs must be inside model.onnx OR you
   must train without that preprocessing:

   • Feature normalization (z-score, min-max): only valid if you fuse those ops
     into the ONNX graph ahead of the trees. Default train_real_estate.py does
     this (StandardScaler → trees → optional expm1). Training with a sklearn
     scaler but submitting plain tree ONNX on raw inputs = wrong.

   • log1p(price) training: only valid if the ONNX output is already USD, i.e.
     expm1 is in the graph (default fused export) or you use --no-log-target.

   • For gradient-boosted trees on tabular data, raw features + USD target is
     usually enough; focus on data and regularization rather than z-score unless
     you invest in ONNX fusion tools.

   Minimum viable submission: model.onnx (raw in → USD out) + feature_config.json
   matching column order; no other files required by default.