File size: 3,720 Bytes
9889592
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
RESI miner bundle β€” USD ONNX + subnet feature_config (14-feature schema)
================================================================================

A) Training (must match validator: ONNX outputs USD, MAPE vs dollar price)

  1. Default train_real_estate.py exports a **single fused ONNX**: raw features
     (same order as feature_config) β†’ StandardScaler inside the graph β†’ trees β†’
     optional expm1 β†’ USD tensor ``price_usd``. You do **not** need --no-log-target
     for a valid miner model if you keep default fusion (log1p training + expm1
     fused). Input name is ``float_input`` for fused exports.

     Alternative β€” train directly in dollars (no log head in ONNX):

       MPLBACKEND=Agg python train_real_estate.py \\
         --data training_data.json \\
         --catboost \\
         --out artifacts_miner_usd \\
         --no-log-target

     Legacy unfused tree-only ONNX (no scaler / no expm1 in graph):

       ... --no-onnx-fusion

     Use --all / --xgboost / --lightgbm instead of --catboost if you prefer.

  2. Keep the same feature columns as this bundle: train with DEFAULT redundant
     dropping (omit --no-drop-redundant) so you have exactly the 14 features
     listed in miner_submission/feature_config.json.

     If you train with --no-drop-redundant (17 columns), regenerate the JSON:

       MPLBACKEND=Agg python train_real_estate.py ... --no-drop-redundant \\
         --write-miner-feature-config miner_submission/feature_config.json

B) Export files for chain / HF repo

  3. Copy the chosen ONNX to your repo as model.onnx (e.g. from
     artifacts_miner_usd/catboost_price_model.onnx).

  4. Commit miner_submission/feature_config.json alongside the model (same
     feature order as training / encoder).

  5. ONNX input: one float tensor, shape [N, 14], columns in the exact order
     of "features" in feature_config.json. Fused models use input ``float_input``.
     Output is typically a single tensor; fused USD models name it ``price_usd``
     (take index 0 if your runner binds by position).

C) Validate JSON against subnet rules (optional)

  cd /home/RESI-models
  .venv/bin/python -c "
  from pathlib import Path
  from real_estate.data.config_encoder import load_feature_config
  load_feature_config(Path('/home/46/miner_submission/feature_config.json'))
  print('feature_config.json OK')
  "

D) Do not rely on target_transform.json on-chain β€” the validator does not apply
   expm1; the model must emit dollars.

E) You cannot change the eval system β€” submission-only rules

   The validator always: encodes raw API fields β†’ float32 matrix (same order as
   your feature_config) β†’ ONNX β†’ treats outputs as USD for MAPE.

   Therefore you MUST NOT depend on any extra JSON, hooks, or server-side
   preprocessing. Everything the model needs must be inside model.onnx OR you
   must train without that preprocessing:

   β€’ Feature normalization (z-score, min-max): only valid if you fuse those ops
     into the ONNX graph ahead of the trees. Default train_real_estate.py does
     this (StandardScaler β†’ trees β†’ optional expm1). Training with a sklearn
     scaler but submitting plain tree ONNX on raw inputs = wrong.

   β€’ log1p(price) training: only valid if the ONNX output is already USD, i.e.
     expm1 is in the graph (default fused export) or you use --no-log-target.

   β€’ For gradient-boosted trees on tabular data, raw features + USD target is
     usually enough; focus on data and regularization rather than z-score unless
     you invest in ONNX fusion tools.

   Minimum viable submission: model.onnx (raw in β†’ USD out) + feature_config.json
   matching column order; no other files required by default.