Cc / CHECKLIST.txt
WebScraper991923's picture
Upload folder using huggingface_hub
9889592 verified
RESI miner bundle β€” USD ONNX + subnet feature_config (14-feature schema)
================================================================================
A) Training (must match validator: ONNX outputs USD, MAPE vs dollar price)
1. Default train_real_estate.py exports a **single fused ONNX**: raw features
(same order as feature_config) β†’ StandardScaler inside the graph β†’ trees β†’
optional expm1 β†’ USD tensor ``price_usd``. You do **not** need --no-log-target
for a valid miner model if you keep default fusion (log1p training + expm1
fused). Input name is ``float_input`` for fused exports.
Alternative β€” train directly in dollars (no log head in ONNX):
MPLBACKEND=Agg python train_real_estate.py \\
--data training_data.json \\
--catboost \\
--out artifacts_miner_usd \\
--no-log-target
Legacy unfused tree-only ONNX (no scaler / no expm1 in graph):
... --no-onnx-fusion
Use --all / --xgboost / --lightgbm instead of --catboost if you prefer.
2. Keep the same feature columns as this bundle: train with DEFAULT redundant
dropping (omit --no-drop-redundant) so you have exactly the 14 features
listed in miner_submission/feature_config.json.
If you train with --no-drop-redundant (17 columns), regenerate the JSON:
MPLBACKEND=Agg python train_real_estate.py ... --no-drop-redundant \\
--write-miner-feature-config miner_submission/feature_config.json
B) Export files for chain / HF repo
3. Copy the chosen ONNX to your repo as model.onnx (e.g. from
artifacts_miner_usd/catboost_price_model.onnx).
4. Commit miner_submission/feature_config.json alongside the model (same
feature order as training / encoder).
5. ONNX input: one float tensor, shape [N, 14], columns in the exact order
of "features" in feature_config.json. Fused models use input ``float_input``.
Output is typically a single tensor; fused USD models name it ``price_usd``
(take index 0 if your runner binds by position).
C) Validate JSON against subnet rules (optional)
cd /home/RESI-models
.venv/bin/python -c "
from pathlib import Path
from real_estate.data.config_encoder import load_feature_config
load_feature_config(Path('/home/46/miner_submission/feature_config.json'))
print('feature_config.json OK')
"
D) Do not rely on target_transform.json on-chain β€” the validator does not apply
expm1; the model must emit dollars.
E) You cannot change the eval system β€” submission-only rules
The validator always: encodes raw API fields β†’ float32 matrix (same order as
your feature_config) β†’ ONNX β†’ treats outputs as USD for MAPE.
Therefore you MUST NOT depend on any extra JSON, hooks, or server-side
preprocessing. Everything the model needs must be inside model.onnx OR you
must train without that preprocessing:
β€’ Feature normalization (z-score, min-max): only valid if you fuse those ops
into the ONNX graph ahead of the trees. Default train_real_estate.py does
this (StandardScaler β†’ trees β†’ optional expm1). Training with a sklearn
scaler but submitting plain tree ONNX on raw inputs = wrong.
β€’ log1p(price) training: only valid if the ONNX output is already USD, i.e.
expm1 is in the graph (default fused export) or you use --no-log-target.
β€’ For gradient-boosted trees on tabular data, raw features + USD target is
usually enough; focus on data and regularization rather than z-score unless
you invest in ONNX fusion tools.
Minimum viable submission: model.onnx (raw in β†’ USD out) + feature_config.json
matching column order; no other files required by default.