Update README.md

15c7ad4 verified 18 days ago

13.3 kB

	---
	license: mit
	library_name: xgboost
	pipeline_tag: tabular-regression
	tags:
	- tabular-regression
	- ens
	- ethereum
	- web3
	- domain-names
	- price-prediction
	- nft
	datasets:
	- quantumly/ens-appraiser-data
	base_model: sentence-transformers/all-mpnet-base-v2
	metrics:
	- r_squared
	- mape
	- rmse
	model-index:
	- name: ENS Appraiser v0.2
	results:
	- task:
	type: tabular-regression
	name: ENS Domain Price Prediction
	dataset:
	name: ENS Appraiser Multi-source Training Data
	type: quantumly/ens-appraiser-data
	metrics:
	- type: r_squared
	value: 0.3081
	name: R² (log USD, test)
	- type: median_ape
	value: 1.383
	name: Median APE (test)
	- type: rmse
	value: 1.5469
	name: RMSE (log USD, test)
	---

	# ENS Appraiser v0.2

	A gradient-boosted regressor that predicts the USD sale price of an
	ENS (`.eth`) domain name from on-chain history, semantic embeddings of the
	label, and macro-market context.

	This is the v0 baseline — handcrafted features + mpnet PCA + KNN
	comparable-sale aggregates. Built to establish an honest, leakage-free
	floor that future versions improve on.

	## Quick numbers

	Trained on ~265k ENS secondary sales (Jan 2022 – Sep 2023), evaluated on
	2,744 sales in Q1–Q2 2024 (held out by date, never seen during training):

	\| Split \| n \| R² (log USD) \| RMSE (log USD) \| Median APE \| Bias \|
	\|-------\|--------\|--------------\|----------------\|------------\|-------\|
	\| Train \| 265,240 \| 0.7700 \| 0.7744 \| 32.5% \| +0.000 \|
	\| Val \| 3,545 \| 0.6602 \| 1.0678 \| 57.0% \| +0.203 \|
	\| Test \| 2,744 \| 0.3081 \| 1.5469 \| 138.3% \| +0.732 \|

	Plain-English read: for a typical mid-tier name in test, the model is
	within ~2× of the actual sale price. The long tail — celebrity names,
	3-letter premiums, regime shifts — is where it misses, often by 100×+ in
	either direction.

	## What's good

	- Mid-tier names, $50–$5,000 range: usually within 2× of actual.
	- Length and character composition: strong signals captured well.
	The model knows 3-letter ASCII names are premium and 12-letter random
	handles are cheap.
	- Wordlist hits: matches against Wikipedia, GeoNames, US first names,
	stock tickers, and SEC EDGAR are picked up correctly. `paris.eth` is
	flagged as a city, `nike.eth` as a brand.
	- Comparable-sale anchoring: the top two features are `knn_mean_log`
	and `knn_p90_log` — the model leans heavily on "what did similar names
	sell for recently?" which is the right intuition for valuation.

	## What's not

	- Celebrity / brand premium: a name's value to a known buyer
	(Coinbase wanting `coinbase.eth`, a luxury brand wanting their mark)
	is invisible to this model. It can detect that `nike.eth` is a brand
	word, but not that the sale price reflects Nike's interest specifically.
	- 3-letter premium tail: names like `mph.eth`, `uma.eth` sold for
	$20k–$40k in test; the model predicted $100–$200. The training set
	underweights short premiums because most sales there are 5+ letters.
	- Regime shift on test: test set median price is ~4× higher than
	training median due to the 2023 → 2024 ENS market shift. Recency-weighted
	training (1-year half-life) helps but doesn't fully close the gap.
	- Bidirectional errors: worst predictions split roughly evenly
	between under-prediction (hot names the model didn't recognize) and
	over-prediction (cold names that just didn't move). 138% MedianAPE is
	honest but uncomfortable.

	## How it's built

	\| Component \| Detail \|
	\|---\|---\|
	\| Algorithm \| XGBoost regressor (170 boosted trees, max_depth=7) \|
	\| Target \| `log(sale_price_usd)` \|
	\| Features \| 146 total \|
	\| Training data \| 265,240 sales, Jan 2022 – Sep 2023 \|
	\| Training time \| ~10 min on a single A100 \|
	\| Model size \| 3.3 MB \|

	### Feature breakdown

	- Handcrafted (15): length, n_digits, n_letters, n_special, palindrome,
	is_all_digits, is_all_letters, is_ascii, has_unicode, starts/ends_digit,
	max_char_run, n_unique_chars
	- Wordlist hits (8): Wikipedia titles, GeoNames cities, US first names,
	ISO 3166 countries, stock tickers, SEC EDGAR companies, Wiktionary EN,
	plus a `wordlist_hits` total
	- Grails clubs (~45): binary membership in each curated `.eth` club
	(`999club`, `pre-punks`, `palindromes`, `pokemon_gen1`, etc.)
	- Trademark conflict (1): active USPTO mark in Nice classes 9, 35, 36,
	38, 41, 42, 45 with matching `mark_text_norm`
	- Holder behavior (2): `name_age_days`, `prior_transfer_count`
	(leakage-safe — only counts transfers strictly before the sale block)
	- Macro context (5): Fear & Greed Index, ETH chain TVL, ETH stablecoin
	market cap, ETH DEX volume, total NFT marketplace fees on the sale day
	- mpnet PCA (64): 768-dim `all-mpnet-base-v2` embeddings of the label,
	PCA-reduced to 64 dims (95% explained variance)
	- KNN comparable sales (8): for each label, FAISS-retrieve top-50
	semantic neighbors (HNSW index), filter near-duplicates (sim > 0.999),
	take the most-recent prior sale of each, aggregate as `knn_count`,
	`knn_mean_log`, `knn_median_log`, `knn_p90_log`, `knn_max_sim`,
	`knn_min_sim`, `knn_log_max`, `knn_log_min`. Strict leakage prevention:
	only neighbors with sales before the current sale's date count.

	### Top 10 features by gain

	\| Rank \| Feature \| Gain \|
	\|---:\|---\|---:\|
	\| 1 \| `knn_mean_log` \| 1,714 \|
	\| 2 \| `knn_p90_log` \| 1,613 \|
	\| 3 \| `len` \| 1,364 \|
	\| 4 \| `in_wikipedia` \| 1,052 \|
	\| 5 \| `is_all_digits` \| 944 \|
	\| 6 \| `knn_median_log` \| 604 \|
	\| 7 \| `n_digits` \| 338 \|
	\| 8 \| `pca_000` \| 289 \|
	\| 9 \| `n_clubs` \| 282 \|
	\| 10 \| `ends_digit` \| 277 \|

	Five of the top ten are KNN-comp or PCA features, which means the
	embedding pipeline is doing real work — it's not just paying for itself,
	it's the dominant signal alongside length.

	## Training data + leakage controls

	Built from the [`quantumly/ens-appraiser-data`](https://huggingface.co/datasets/quantumly/ens-appraiser-data)
	dataset:

	- Sales labels: Alchemy `getNFTSales` for ENS BaseRegistrar + NameWrapper
	contracts. Wei amounts converted to USD via CoinGecko hourly OHLC at
	the sale's block timestamp. Coverage gap: Alchemy `getNFTSales` v2
	truncates at block 19,768,978 (May 2024) and does not index Blur
	marketplace sales. v0 ships with this gap; closing it is a v1 priority.
	- Registrations + transfers: The Graph's [ENS subgraph](https://thegraph.com/explorer/subgraphs/5XqPmWe6gjyrJtFn9cLy237i4cWw2j9HcUJEXsP5qGtH).
	- Wordlists: Wiktionary dumps, Wikipedia EN article titles, GeoNames
	`cities500`, US Census baby names, NASDAQ Trader ticker dumps,
	SEC EDGAR company tickers, ISO 3166 country list.
	- Macro: alternative.me Fear & Greed Index, DefiLlama (TVL, stablecoin
	mcap, DEX volume, NFT marketplace fees).
	- Trademarks: USPTO Trademark Case Files Dataset (annual research dump).
	- Embeddings: `sentence-transformers/all-mpnet-base-v2`, encoded once
	for all 3.5M ENS labels in the dataset.

	### Leakage controls

	The first version of this model accidentally leaked future information
	through `lifetime_transfer_count` (it counted all transfers ever for a
	labelhash, including transfers that happened after the sale being
	predicted). The leaky model showed train R² 0.81 / test R² −0.29 — the
	classic catastrophic-overfit signature where the model collapses to
	predicting the population mean on held-out data.

	The current model uses `prior_transfer_count`, which only counts transfers
	where `transfer_block < sale_block` per row. It moved to rank #11 in
	feature importance (was #1 by 3.3×). KNN comparable-sale features have a
	similar safeguard: a neighbor's sale only counts if it happened strictly
	before the sale being predicted.

	### Train/Val/Test split

	Fixed-window temporal split:

	- Train: sales with `sale_date < 2023-10-01`
	- Val: sales 2023-10-01 → 2023-12-31
	- Test: sales 2024-01-01 onwards

	This prevents the v0.1 mistake of training on 2022 prices and asking the
	model to extrapolate to a 2024 market regime that's ~4× more expensive
	on average. Val and test are in the same regime so val RMSE is a
	meaningful proxy for test.

	Training rows are weighted with an exponential recency decay (1-year
	half-life, normalized to mean=1.0) so the model leans on 2023 dynamics
	without throwing away the older data entirely.

	## Intended use

	This model is intended for research and analytics, not as a price
	oracle and not for live trading.

	Reasonable uses:

	- Bulk valuation of mid-tier ENS portfolios for tax/accounting purposes
	- Identifying obviously over- or under-listed names on secondary markets
	- Sanity-checking a listing price before posting
	- Producing comparable-sale ranges for negotiation context

	Out of scope:

	- Pricing 3-letter, 1-2 letter, or otherwise-premium names with confidence
	- Pricing celebrity / known-brand names where the buyer pool is concentrated
	- Predicting prices for names in the post-May-2024 marketplace mix
	(Blur dominance, marketplace fee changes)
	- Any high-stakes financial decision based on a single point estimate

	## Limitations

	- Sales coverage: Jan 2022 – May 2024 only, no Blur. ~2 years of recent
	sales (mid-2024 onwards) are missing entirely from training. Closing
	this gap requires either a new sales source (Reservoir/SimpleHash both
	defunct as of 2024–2025) or direct `eth_getLogs` decoding of Seaport,
	Blur, X2Y2, LooksRare events, planned for v1.
	- Celebrity premium: there's no feature here for "is this a famous
	person/place/thing?" beyond Wikipedia-title matching. v1 adds
	LLM-derived structured features (`fame_score`, `name_kind`,
	`crypto_relevance`, `brand_collision_risk`) which should close most
	of this gap.
	- Out-of-distribution labels: pure-digit labels (`0001`),
	punycode/emoji, and l33tspeak get less benefit from mpnet embeddings
	since they're out of distribution for the pretrained model. Length and
	charset features partially compensate.
	- Time drift: the ENS market shifts noticeably every 6–12 months as
	marketplace dominance, fee structures, and DAO actions move. Predictions
	on names sold "right now" will lag any regime shift since the training
	cutoff.
	- Test-set thinness: only 2,744 sales meet the $10 floor and post-Jan-2024
	cutoff. The reported test R² has roughly ±0.08 95% CI — useful as a
	ballpark, not a precise number.

	## How to use

	```python
	from huggingface_hub import hf_hub_download
	import xgboost as xgb
	import pickle

	model_path = hf_hub_download(
	repo_id="quantumly/ens-appraiser",
	filename="v0_appraiser_xgb.json",
	)
	pca_path = hf_hub_download(
	repo_id="quantumly/ens-appraiser",
	filename="v0_pca_mpnet.pkl",
	)

	booster = xgb.Booster()
	booster.load_model(model_path)
	with open(pca_path, "rb") as f:
	pca = pickle.load(f)

	# Inference also requires:
	# 1. mpnet embedding for the label (sentence-transformers/all-mpnet-base-v2)
	# 2. Handcrafted/wordlist/club/trademark/holder/macro features
	# 3. KNN comp lookup against the dataset repo's FAISS index
	#
	# A self-contained inference notebook is planned in the dataset repo.
	```

	The 146 features expected by the booster are listed in `v0_metadata.json`
	under `feature_cols`, in the exact order required by `xgb.DMatrix`.

	## Reproducibility

	The training notebook ([`v0_appraiser_v2.ipynb`](https://huggingface.co/datasets/quantumly/ens-appraiser-data/blob/main/notebooks/v0_appraiser_v2.ipynb))
	runs end-to-end on a Colab A100 high-RAM instance in ~25 minutes:

	1. Downloads all source parquets from the dataset repo
	2. Reconstructs USD prices via CoinGecko hourly OHLC join
	3. Resolves labels for both BaseRegistrar and NameWrapper sales
	4. Computes all features
	5. Builds HNSW index for KNN
	6. Trains XGBoost with early stopping
	7. Saves model + metadata + diagnostics
	8. Uploads to this model repo

	All randomness is seeded (`seed=42` for XGBoost, PCA, sample weights).

	## Roadmap

	v1 priorities (in expected R² delta order):

	1. LLM-derived features — Llama 3.1 8B local inference over all 3.5M
	labels, extracting `fame_score`, `name_kind`, `cultural_origin`,
	`crypto_relevance`, `brand_collision_risk`, plus a description-embedding.
	Expected delta: +0.05–0.10 test R².
	2. Recent sales backfill via direct `eth_getLogs` decoding of
	Seaport / Blur / Wyvern / X2Y2 / LooksRare events. Closes the
	May 2024 → present coverage gap and adds Blur. Expected delta:
	+0.03–0.06 test R² and a much bigger test set.
	3. Multi-embedding ensemble — concatenate mpnet with `bge-base-en-v1.5`
	and `e5-base-v2`, PCA the joint space. Expected delta: +0.02–0.04.
	4. Cross-encoder reranker for KNN comps. Expected delta: +0.02–0.03.
	5. Contrastive fine-tuning of mpnet on price-similarity triplets.
	Expected delta: +0.03–0.05.

	## Citation

	```bibtex
	@misc{ens_appraiser_2026,
	author = {Drobnič, Nejc},
	title = {ENS Appraiser v0.2},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/quantumly/ens-appraiser}
	}
	```

	## License + contact

	MIT. Questions, corrections, pull requests: nejc@nejc.dev