ens-appraiser / README.md
quantumly's picture
Update README.md
15c7ad4 verified
---
license: mit
library_name: xgboost
pipeline_tag: tabular-regression
tags:
- tabular-regression
- ens
- ethereum
- web3
- domain-names
- price-prediction
- nft
datasets:
- quantumly/ens-appraiser-data
base_model: sentence-transformers/all-mpnet-base-v2
metrics:
- r_squared
- mape
- rmse
model-index:
- name: ENS Appraiser v0.2
results:
- task:
type: tabular-regression
name: ENS Domain Price Prediction
dataset:
name: ENS Appraiser Multi-source Training Data
type: quantumly/ens-appraiser-data
metrics:
- type: r_squared
value: 0.3081
name: (log USD, test)
- type: median_ape
value: 1.383
name: Median APE (test)
- type: rmse
value: 1.5469
name: RMSE (log USD, test)
---
# ENS Appraiser v0.2
A gradient-boosted regressor that predicts the USD sale price of an
ENS (`.eth`) domain name from on-chain history, semantic embeddings of the
label, and macro-market context.
This is the **v0 baseline** — handcrafted features + mpnet PCA + KNN
comparable-sale aggregates. Built to establish an honest, leakage-free
floor that future versions improve on.
## Quick numbers
Trained on ~265k ENS secondary sales (Jan 2022 – Sep 2023), evaluated on
2,744 sales in **Q1–Q2 2024** (held out by date, never seen during training):
| Split | n | R² (log USD) | RMSE (log USD) | Median APE | Bias |
|-------|--------|--------------|----------------|------------|-------|
| Train | 265,240 | 0.7700 | 0.7744 | 32.5% | +0.000 |
| Val | 3,545 | 0.6602 | 1.0678 | 57.0% | +0.203 |
| Test | 2,744 | **0.3081** | 1.5469 | 138.3% | +0.732 |
**Plain-English read:** for a typical mid-tier name in test, the model is
within ~2× of the actual sale price. The long tail — celebrity names,
3-letter premiums, regime shifts — is where it misses, often by 100×+ in
either direction.
## What's good
- **Mid-tier names, $50–$5,000 range:** usually within 2× of actual.
- **Length and character composition:** strong signals captured well.
The model knows 3-letter ASCII names are premium and 12-letter random
handles are cheap.
- **Wordlist hits:** matches against Wikipedia, GeoNames, US first names,
stock tickers, and SEC EDGAR are picked up correctly. `paris.eth` is
flagged as a city, `nike.eth` as a brand.
- **Comparable-sale anchoring:** the top two features are `knn_mean_log`
and `knn_p90_log` — the model leans heavily on "what did similar names
sell for recently?" which is the right intuition for valuation.
## What's not
- **Celebrity / brand premium:** a name's value to a known buyer
(Coinbase wanting `coinbase.eth`, a luxury brand wanting their mark)
is invisible to this model. It can detect that `nike.eth` is a brand
word, but not that the sale price reflects Nike's interest specifically.
- **3-letter premium tail:** names like `mph.eth`, `uma.eth` sold for
$20k–$40k in test; the model predicted $100–$200. The training set
underweights short premiums because most sales there are 5+ letters.
- **Regime shift on test:** test set median price is ~4× higher than
training median due to the 2023 → 2024 ENS market shift. Recency-weighted
training (1-year half-life) helps but doesn't fully close the gap.
- **Bidirectional errors:** worst predictions split roughly evenly
between under-prediction (hot names the model didn't recognize) and
over-prediction (cold names that just didn't move). 138% MedianAPE is
honest but uncomfortable.
## How it's built
| Component | Detail |
|---|---|
| Algorithm | XGBoost regressor (170 boosted trees, max_depth=7) |
| Target | `log(sale_price_usd)` |
| Features | 146 total |
| Training data | 265,240 sales, Jan 2022 – Sep 2023 |
| Training time | ~10 min on a single A100 |
| Model size | 3.3 MB |
### Feature breakdown
- **Handcrafted (15):** length, n_digits, n_letters, n_special, palindrome,
is_all_digits, is_all_letters, is_ascii, has_unicode, starts/ends_digit,
max_char_run, n_unique_chars
- **Wordlist hits (8):** Wikipedia titles, GeoNames cities, US first names,
ISO 3166 countries, stock tickers, SEC EDGAR companies, Wiktionary EN,
plus a `wordlist_hits` total
- **Grails clubs (~45):** binary membership in each curated `.eth` club
(`999club`, `pre-punks`, `palindromes`, `pokemon_gen1`, etc.)
- **Trademark conflict (1):** active USPTO mark in Nice classes 9, 35, 36,
38, 41, 42, 45 with matching `mark_text_norm`
- **Holder behavior (2):** `name_age_days`, `prior_transfer_count`
(leakage-safe — only counts transfers strictly before the sale block)
- **Macro context (5):** Fear & Greed Index, ETH chain TVL, ETH stablecoin
market cap, ETH DEX volume, total NFT marketplace fees on the sale day
- **mpnet PCA (64):** 768-dim `all-mpnet-base-v2` embeddings of the label,
PCA-reduced to 64 dims (95% explained variance)
- **KNN comparable sales (8):** for each label, FAISS-retrieve top-50
semantic neighbors (HNSW index), filter near-duplicates (sim > 0.999),
take the most-recent prior sale of each, aggregate as `knn_count`,
`knn_mean_log`, `knn_median_log`, `knn_p90_log`, `knn_max_sim`,
`knn_min_sim`, `knn_log_max`, `knn_log_min`. **Strict leakage prevention:**
only neighbors with sales **before** the current sale's date count.
### Top 10 features by gain
| Rank | Feature | Gain |
|---:|---|---:|
| 1 | `knn_mean_log` | 1,714 |
| 2 | `knn_p90_log` | 1,613 |
| 3 | `len` | 1,364 |
| 4 | `in_wikipedia` | 1,052 |
| 5 | `is_all_digits` | 944 |
| 6 | `knn_median_log` | 604 |
| 7 | `n_digits` | 338 |
| 8 | `pca_000` | 289 |
| 9 | `n_clubs` | 282 |
| 10 | `ends_digit` | 277 |
Five of the top ten are KNN-comp or PCA features, which means the
embedding pipeline is doing real work — it's not just paying for itself,
it's the dominant signal alongside length.
## Training data + leakage controls
Built from the [`quantumly/ens-appraiser-data`](https://huggingface.co/datasets/quantumly/ens-appraiser-data)
dataset:
- **Sales labels:** Alchemy `getNFTSales` for ENS BaseRegistrar + NameWrapper
contracts. Wei amounts converted to USD via CoinGecko hourly OHLC at
the sale's block timestamp. **Coverage gap:** Alchemy `getNFTSales` v2
truncates at block 19,768,978 (May 2024) and does not index Blur
marketplace sales. v0 ships with this gap; closing it is a v1 priority.
- **Registrations + transfers:** The Graph's [ENS subgraph](https://thegraph.com/explorer/subgraphs/5XqPmWe6gjyrJtFn9cLy237i4cWw2j9HcUJEXsP5qGtH).
- **Wordlists:** Wiktionary dumps, Wikipedia EN article titles, GeoNames
`cities500`, US Census baby names, NASDAQ Trader ticker dumps,
SEC EDGAR company tickers, ISO 3166 country list.
- **Macro:** alternative.me Fear & Greed Index, DefiLlama (TVL, stablecoin
mcap, DEX volume, NFT marketplace fees).
- **Trademarks:** USPTO Trademark Case Files Dataset (annual research dump).
- **Embeddings:** `sentence-transformers/all-mpnet-base-v2`, encoded once
for all 3.5M ENS labels in the dataset.
### Leakage controls
The first version of this model accidentally leaked future information
through `lifetime_transfer_count` (it counted *all* transfers ever for a
labelhash, including transfers that happened *after* the sale being
predicted). The leaky model showed **train R² 0.81 / test R² −0.29** — the
classic catastrophic-overfit signature where the model collapses to
predicting the population mean on held-out data.
The current model uses `prior_transfer_count`, which only counts transfers
where `transfer_block < sale_block` per row. It moved to rank #11 in
feature importance (was #1 by 3.3×). KNN comparable-sale features have a
similar safeguard: a neighbor's sale only counts if it happened strictly
before the sale being predicted.
### Train/Val/Test split
Fixed-window temporal split:
- **Train:** sales with `sale_date < 2023-10-01`
- **Val:** sales 2023-10-01 → 2023-12-31
- **Test:** sales 2024-01-01 onwards
This prevents the v0.1 mistake of training on 2022 prices and asking the
model to extrapolate to a 2024 market regime that's ~4× more expensive
on average. Val and test are in the same regime so val RMSE is a
meaningful proxy for test.
Training rows are weighted with an exponential recency decay (1-year
half-life, normalized to mean=1.0) so the model leans on 2023 dynamics
without throwing away the older data entirely.
## Intended use
This model is intended for **research and analytics**, not as a price
oracle and not for live trading.
**Reasonable uses:**
- Bulk valuation of mid-tier ENS portfolios for tax/accounting purposes
- Identifying obviously over- or under-listed names on secondary markets
- Sanity-checking a listing price before posting
- Producing comparable-sale ranges for negotiation context
**Out of scope:**
- Pricing 3-letter, 1-2 letter, or otherwise-premium names with confidence
- Pricing celebrity / known-brand names where the buyer pool is concentrated
- Predicting prices for names in the post-May-2024 marketplace mix
(Blur dominance, marketplace fee changes)
- Any high-stakes financial decision based on a single point estimate
## Limitations
- **Sales coverage**: Jan 2022 – May 2024 only, no Blur. ~2 years of recent
sales (mid-2024 onwards) are missing entirely from training. Closing
this gap requires either a new sales source (Reservoir/SimpleHash both
defunct as of 2024–2025) or direct `eth_getLogs` decoding of Seaport,
Blur, X2Y2, LooksRare events, planned for v1.
- **Celebrity premium**: there's no feature here for "is this a famous
person/place/thing?" beyond Wikipedia-title matching. v1 adds
LLM-derived structured features (`fame_score`, `name_kind`,
`crypto_relevance`, `brand_collision_risk`) which should close most
of this gap.
- **Out-of-distribution labels**: pure-digit labels (`0001`),
punycode/emoji, and l33tspeak get less benefit from mpnet embeddings
since they're out of distribution for the pretrained model. Length and
charset features partially compensate.
- **Time drift**: the ENS market shifts noticeably every 6–12 months as
marketplace dominance, fee structures, and DAO actions move. Predictions
on names sold "right now" will lag any regime shift since the training
cutoff.
- **Test-set thinness**: only 2,744 sales meet the $10 floor and post-Jan-2024
cutoff. The reported test R² has roughly ±0.08 95% CI — useful as a
ballpark, not a precise number.
## How to use
```python
from huggingface_hub import hf_hub_download
import xgboost as xgb
import pickle
model_path = hf_hub_download(
repo_id="quantumly/ens-appraiser",
filename="v0_appraiser_xgb.json",
)
pca_path = hf_hub_download(
repo_id="quantumly/ens-appraiser",
filename="v0_pca_mpnet.pkl",
)
booster = xgb.Booster()
booster.load_model(model_path)
with open(pca_path, "rb") as f:
pca = pickle.load(f)
# Inference also requires:
# 1. mpnet embedding for the label (sentence-transformers/all-mpnet-base-v2)
# 2. Handcrafted/wordlist/club/trademark/holder/macro features
# 3. KNN comp lookup against the dataset repo's FAISS index
#
# A self-contained inference notebook is planned in the dataset repo.
```
The 146 features expected by the booster are listed in `v0_metadata.json`
under `feature_cols`, in the exact order required by `xgb.DMatrix`.
## Reproducibility
The training notebook ([`v0_appraiser_v2.ipynb`](https://huggingface.co/datasets/quantumly/ens-appraiser-data/blob/main/notebooks/v0_appraiser_v2.ipynb))
runs end-to-end on a Colab A100 high-RAM instance in ~25 minutes:
1. Downloads all source parquets from the dataset repo
2. Reconstructs USD prices via CoinGecko hourly OHLC join
3. Resolves labels for both BaseRegistrar and NameWrapper sales
4. Computes all features
5. Builds HNSW index for KNN
6. Trains XGBoost with early stopping
7. Saves model + metadata + diagnostics
8. Uploads to this model repo
All randomness is seeded (`seed=42` for XGBoost, PCA, sample weights).
## Roadmap
**v1 priorities** (in expected R² delta order):
1. **LLM-derived features** — Llama 3.1 8B local inference over all 3.5M
labels, extracting `fame_score`, `name_kind`, `cultural_origin`,
`crypto_relevance`, `brand_collision_risk`, plus a description-embedding.
Expected delta: +0.05–0.10 test R².
2. **Recent sales backfill** via direct `eth_getLogs` decoding of
Seaport / Blur / Wyvern / X2Y2 / LooksRare events. Closes the
May 2024 → present coverage gap and adds Blur. Expected delta:
+0.03–0.06 test R² and a much bigger test set.
3. **Multi-embedding ensemble** — concatenate mpnet with `bge-base-en-v1.5`
and `e5-base-v2`, PCA the joint space. Expected delta: +0.02–0.04.
4. **Cross-encoder reranker** for KNN comps. Expected delta: +0.02–0.03.
5. **Contrastive fine-tuning** of mpnet on price-similarity triplets.
Expected delta: +0.03–0.05.
## Citation
```bibtex
@misc{ens_appraiser_2026,
author = {Drobnič, Nejc},
title = {ENS Appraiser v0.2},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/quantumly/ens-appraiser}
}
```
## License + contact
MIT. Questions, corrections, pull requests: nejc@nejc.dev