Cc / CHECKLIST.txt

WebScraper991923

Upload folder using huggingface_hub

9889592 verified 3 days ago

3.72 kB

	RESI miner bundle — USD ONNX + subnet feature_config (14-feature schema)
	================================================================================

	A) Training (must match validator: ONNX outputs USD, MAPE vs dollar price)

	1. Default train_real_estate.py exports a single fused ONNX: raw features
	(same order as feature_config) → StandardScaler inside the graph → trees →
	optional expm1 → USD tensor ``price_usd``. You do not need --no-log-target
	for a valid miner model if you keep default fusion (log1p training + expm1
	fused). Input name is ``float_input`` for fused exports.

	Alternative — train directly in dollars (no log head in ONNX):

	MPLBACKEND=Agg python train_real_estate.py \\
	--data training_data.json \\
	--catboost \\
	--out artifacts_miner_usd \\
	--no-log-target

	Legacy unfused tree-only ONNX (no scaler / no expm1 in graph):

	... --no-onnx-fusion

	Use --all / --xgboost / --lightgbm instead of --catboost if you prefer.

	2. Keep the same feature columns as this bundle: train with DEFAULT redundant
	dropping (omit --no-drop-redundant) so you have exactly the 14 features
	listed in miner_submission/feature_config.json.

	If you train with --no-drop-redundant (17 columns), regenerate the JSON:

	MPLBACKEND=Agg python train_real_estate.py ... --no-drop-redundant \\
	--write-miner-feature-config miner_submission/feature_config.json

	B) Export files for chain / HF repo

	3. Copy the chosen ONNX to your repo as model.onnx (e.g. from
	artifacts_miner_usd/catboost_price_model.onnx).

	4. Commit miner_submission/feature_config.json alongside the model (same
	feature order as training / encoder).

	5. ONNX input: one float tensor, shape [N, 14], columns in the exact order
	of "features" in feature_config.json. Fused models use input ``float_input``.
	Output is typically a single tensor; fused USD models name it ``price_usd``
	(take index 0 if your runner binds by position).

	C) Validate JSON against subnet rules (optional)

	cd /home/RESI-models
	.venv/bin/python -c "
	from pathlib import Path
	from real_estate.data.config_encoder import load_feature_config
	load_feature_config(Path('/home/46/miner_submission/feature_config.json'))
	print('feature_config.json OK')
	"

	D) Do not rely on target_transform.json on-chain — the validator does not apply
	expm1; the model must emit dollars.

	E) You cannot change the eval system — submission-only rules

	The validator always: encodes raw API fields → float32 matrix (same order as
	your feature_config) → ONNX → treats outputs as USD for MAPE.

	Therefore you MUST NOT depend on any extra JSON, hooks, or server-side
	preprocessing. Everything the model needs must be inside model.onnx OR you
	must train without that preprocessing:

	• Feature normalization (z-score, min-max): only valid if you fuse those ops
	into the ONNX graph ahead of the trees. Default train_real_estate.py does
	this (StandardScaler → trees → optional expm1). Training with a sklearn
	scaler but submitting plain tree ONNX on raw inputs = wrong.

	• log1p(price) training: only valid if the ONNX output is already USD, i.e.
	expm1 is in the graph (default fused export) or you use --no-log-target.

	• For gradient-boosted trees on tabular data, raw features + USD target is
	usually enough; focus on data and regularization rather than z-score unless
	you invest in ONNX fusion tools.

	Minimum viable submission: model.onnx (raw in → USD out) + feature_config.json
	matching column order; no other files required by default.