quantumly
/

ens-appraiser

@@ -1,6 +1,7 @@
 ---
 license: mit
 library_name: xgboost
 tags:
   - tabular-regression
   - ens
@@ -8,13 +9,16 @@ tags:
   - web3
   - domain-names
   - price-prediction
 datasets:
   - quantumly/ens-appraiser-data
 metrics:
   - r_squared
   - mape
 model-index:
-  - name: ENS Appraiser v0
     results:
       - task:
           type: tabular-regression
@@ -24,92 +28,230 @@ model-index:
           type: quantumly/ens-appraiser-data
         metrics:
           - type: r_squared
-            value: TODO
-            name: R² (log USD)
           - type: median_ape
-            value: TODO
-            name: Median APE
           - type: rmse
-            value: TODO
-            name: RMSE (log USD)
 ---
-# ENS Appraiser
-A gradient-boosted regression model that predicts the USD sale price of an
-ENS (`.eth`) domain name. This is the v0 baseline — handcrafted features +
-mpnet semantic embeddings + KNN comparable-sale aggregates.
-> ⚠️ Numeric values in the YAML frontmatter (`TODO`) and the **Evaluation**
-> table below should be filled in with the values from the training
-> notebook's `=== v0 SUMMARY ===` block. The notebook prints exact
-> R²/RMSE/MAPE for train/val/test — copy them here before merging.
-## Model Details
-- **Architecture**: XGBoost regressor on `log(sale_price_usd)`
-- **Features**: ~150 total
-  - 15 handcrafted (length, character composition, palindrome/repetition flags)
-  - 8 wordlist hits (Wikipedia, GeoNames, US firstnames, ISO 3166, stock tickers, SEC EDGAR, Wiktionary EN)
-  - ~45 grails club memberships (binary per club)
-  - 1 trademark conflict flag (active USPTO marks in Nice classes 9/35/36/38/41/42/45)
-  - 3 holder behavior (name age, registrant portfolio size, lifetime transfer count)
-  - 5 macro context (Fear & Greed, ETH TVL, ETH stablecoin mcap, ETH DEX volume, NFT marketplace fees)
-  - 64 PCA-reduced mpnet embedding dims (from `sentence-transformers/all-mpnet-base-v2`)
-  - 8 KNN comparable-sale aggregates (count, mean/median/p90 log price of nearest neighbors with prior sales)
-- **Training data**: ENS secondary sales, Jan 2022 — May 2024 (~384k events)
-- **Validation**: temporal split (80/10/10 by sale date, no shuffle to prevent KNN-comp leakage)
-## Evaluation
-| Split | R² (log USD) | RMSE (log USD) | Median APE |
-|---|---|---|---|
-| Train | TODO | TODO | TODO |
-| Val   | TODO | TODO | TODO |
-| Test  | TODO | TODO | TODO |
-## Intended Use
-This model predicts sale prices for ENS `.eth` domain names. It's intended
-for **research and analytics**, not for live trading or as a price oracle.
-**Use cases it handles well:**
-- Bulk valuation of mid-tier names ($50–$5,000 range)
-- Identifying obviously over- or under-priced listings
-- Portfolio-level mark-to-market for ENS holdings
-- Sanity-checking listing prices
-**Use cases where it's weak:**
-- Celebrity/brand-name premium tail ($50k+ sales) — the model lacks fame data
-- Future names not in training distribution (post-May 2024)
-- Names registered through pathways the subgraph doesn't index
-- Blur-marketplace sales — Alchemy `getNFTSales` v2 doesn't index Blur for ENS,
-  so the training data has a marketplace coverage gap
 ## Limitations
-- **Sales coverage limitation**: Training data covers Jan 2022 — May 2024 only.
-  Alchemy's `getNFTSales` v2 endpoint truncates ENS coverage at block 19768978
-  (~May 2024) and doesn't index Blur sales.
-- **Celebrity tail**: Names with significant out-of-band brand value
-  (`coinbase.eth`, `vault.eth`) will be systematically underpriced because
-  the model lacks features for "is this a famous person/brand."
-- **Out-of-distribution labels**: Pure-digit labels (`0001`), punycode/emoji,
-  and l33tspeak get less benefit from mpnet embeddings since they were
-  out-of-distribution for the pretrained model.
-- **Time drift**: ENS market regime shifts in 2024-2025 are not captured.
-  Predictions for current names will lag those regime shifts.
-## How to Use
 ```python
 from huggingface_hub import hf_hub_download
 import xgboost as xgb
 import pickle
-# Download model artifacts
 model_path = hf_hub_download(
     repo_id="quantumly/ens-appraiser",
     filename="v0_appraiser_xgb.json",
@@ -119,48 +261,68 @@ pca_path = hf_hub_download(
     filename="v0_pca_mpnet.pkl",
 )
-# Load
 booster = xgb.Booster()
 booster.load_model(model_path)
 with open(pca_path, "rb") as f:
     pca = pickle.load(f)
-# To make predictions you'll also need:
-# 1. The mpnet embedding for the label (run sentence-transformers all-mpnet-base-v2)
-# 2. The handcrafted features, wordlist lookups, club memberships, trademark check
-# 3. Macro context for the prediction date (ETH price, Fear & Greed, etc.)
-# 4. KNN comp lookup against the FAISS index from the dataset repo
 #
-# See the inference notebook in the dataset repo for the full pipeline.
 ```
-## Training Data
-Built from the [`quantumly/ens-appraiser-data`](https://huggingface.co/datasets/quantumly/ens-appraiser-data)
-dataset, which assembles:
-- ENS on-chain registrations, renewals, transfers (The Graph subgraph)
-- ENS secondary sales (Alchemy `getNFTSales`)
-- CoinGecko hourly OHLC for label denomination
-- Discourse forums for governance signal
-- DefiLlama for macro signals (TVL, stablecoin mcap, DEX volume, NFT marketplace fees)
-- USPTO trademark registry for brand-conflict flags
-- Grails club memberships
-- Wiktionary, Wikipedia, GeoNames, US Census, SEC EDGAR for wordlist hits
-- `sentence-transformers/all-mpnet-base-v2` for semantic embeddings
 ## Citation
 ```bibtex
 @misc{ens_appraiser_2026,
-  author = {Drobnič, Nejc},
-  title  = {ENS Appraiser},
-  year   = {2026},
   publisher = {Hugging Face},
-  url    = {https://huggingface.co/quantumly/ens-appraiser}
 }
 ```
-## Contact
-nejc@nejc.dev

 ---
 license: mit
 library_name: xgboost
+pipeline_tag: tabular-regression
 tags:
   - tabular-regression
   - ens
   - web3
   - domain-names
   - price-prediction
+  - nft
 datasets:
   - quantumly/ens-appraiser-data
+base_model: sentence-transformers/all-mpnet-base-v2
 metrics:
   - r_squared
   - mape
+  - rmse
 model-index:
+  - name: ENS Appraiser v0.2
     results:
       - task:
           type: tabular-regression
           type: quantumly/ens-appraiser-data
         metrics:
           - type: r_squared
+            value: 0.3081
+            name: R² (log USD, test)
           - type: median_ape
+            value: 1.383
+            name: Median APE (test)
           - type: rmse
+            value: 1.5469
+            name: RMSE (log USD, test)
 ---
+# ENS Appraiser v0.2
+A gradient-boosted regressor that predicts the USD sale price of an
+ENS (`.eth`) domain name from on-chain history, semantic embeddings of the
+label, and macro-market context.
+This is the **v0 baseline** — handcrafted features + mpnet PCA + KNN
+comparable-sale aggregates. Built to establish an honest, leakage-free
+floor that future versions improve on.
+## Quick numbers
+Trained on ~265k ENS secondary sales (Jan 2022 – Sep 2023), evaluated on
+2,744 sales in **Q1–Q2 2024** (held out by date, never seen during training):
+| Split | n      | R² (log USD) | RMSE (log USD) | Median APE | Bias  |
+|-------|--------|--------------|----------------|------------|-------|
+| Train | 265,240 | 0.7700      | 0.7744         | 32.5%      | +0.000 |
+| Val   | 3,545   | 0.6602      | 1.0678         | 57.0%      | +0.203 |
+| Test  | 2,744   | **0.3081**  | 1.5469         | 138.3%     | +0.732 |
+**Plain-English read:** for a typical mid-tier name in test, the model is
+within ~2× of the actual sale price. The long tail — celebrity names,
+3-letter premiums, regime shifts — is where it misses, often by 100×+ in
+either direction.
+## What's good
+- **Mid-tier names, $50–$5,000 range:** usually within 2× of actual.
+- **Length and character composition:** strong signals captured well.
+  The model knows 3-letter ASCII names are premium and 12-letter random
+  handles are cheap.
+- **Wordlist hits:** matches against Wikipedia, GeoNames, US first names,
+  stock tickers, and SEC EDGAR are picked up correctly. `paris.eth` is
+  flagged as a city, `nike.eth` as a brand.
+- **Comparable-sale anchoring:** the top two features are `knn_mean_log`
+  and `knn_p90_log` — the model leans heavily on "what did similar names
+  sell for recently?" which is the right intuition for valuation.
+## What's not
+- **Celebrity / brand premium:** a name's value to a known buyer
+  (Coinbase wanting `coinbase.eth`, a luxury brand wanting their mark)
+  is invisible to this model. It can detect that `nike.eth` is a brand
+  word, but not that the sale price reflects Nike's interest specifically.
+- **3-letter premium tail:** names like `mph.eth`, `uma.eth` sold for
+  $20k–$40k in test; the model predicted $100–$200. The training set
+  underweights short premiums because most sales there are 5+ letters.
+- **Regime shift on test:** test set median price is ~4× higher than
+  training median due to the 2023 → 2024 ENS market shift. Recency-weighted
+  training (1-year half-life) helps but doesn't fully close the gap.
+- **Bidirectional errors:** worst predictions split roughly evenly
+  between under-prediction (hot names the model didn't recognize) and
+  over-prediction (cold names that just didn't move). 138% MedianAPE is
+  honest but uncomfortable.
+## How it's built
+| Component | Detail |
+|---|---|
+| Algorithm | XGBoost regressor (170 boosted trees, max_depth=7) |
+| Target | `log(sale_price_usd)` |
+| Features | 146 total |
+| Training data | 265,240 sales, Jan 2022 – Sep 2023 |
+| Training time | ~10 min on a single A100 |
+| Model size | 3.3 MB |
+### Feature breakdown
+- **Handcrafted (15):** length, n_digits, n_letters, n_special, palindrome,
+  is_all_digits, is_all_letters, is_ascii, has_unicode, starts/ends_digit,
+  max_char_run, n_unique_chars
+- **Wordlist hits (8):** Wikipedia titles, GeoNames cities, US first names,
+  ISO 3166 countries, stock tickers, SEC EDGAR companies, Wiktionary EN,
+  plus a `wordlist_hits` total
+- **Grails clubs (~45):** binary membership in each curated `.eth` club
+  (`999club`, `pre-punks`, `palindromes`, `pokemon_gen1`, etc.)
+- **Trademark conflict (1):** active USPTO mark in Nice classes 9, 35, 36,
+  38, 41, 42, 45 with matching `mark_text_norm`
+- **Holder behavior (2):** `name_age_days`, `prior_transfer_count`
+  (leakage-safe — only counts transfers strictly before the sale block)
+- **Macro context (5):** Fear & Greed Index, ETH chain TVL, ETH stablecoin
+  market cap, ETH DEX volume, total NFT marketplace fees on the sale day
+- **mpnet PCA (64):** 768-dim `all-mpnet-base-v2` embeddings of the label,
+  PCA-reduced to 64 dims (95% explained variance)
+- **KNN comparable sales (8):** for each label, FAISS-retrieve top-50
+  semantic neighbors (HNSW index), filter near-duplicates (sim > 0.999),
+  take the most-recent prior sale of each, aggregate as `knn_count`,
+  `knn_mean_log`, `knn_median_log`, `knn_p90_log`, `knn_max_sim`,
+  `knn_min_sim`, `knn_log_max`, `knn_log_min`. **Strict leakage prevention:**
+  only neighbors with sales **before** the current sale's date count.
+### Top 10 features by gain
+| Rank | Feature | Gain |
+|---:|---|---:|
+| 1 | `knn_mean_log` | 1,714 |
+| 2 | `knn_p90_log` | 1,613 |
+| 3 | `len` | 1,364 |
+| 4 | `in_wikipedia` | 1,052 |
+| 5 | `is_all_digits` | 944 |
+| 6 | `knn_median_log` | 604 |
+| 7 | `n_digits` | 338 |
+| 8 | `pca_000` | 289 |
+| 9 | `n_clubs` | 282 |
+| 10 | `ends_digit` | 277 |
+Five of the top ten are KNN-comp or PCA features, which means the
+embedding pipeline is doing real work — it's not just paying for itself,
+it's the dominant signal alongside length.
+## Training data + leakage controls
+Built from the [`quantumly/ens-appraiser-data`](https://huggingface.co/datasets/quantumly/ens-appraiser-data)
+dataset:
+- **Sales labels:** Alchemy `getNFTSales` for ENS BaseRegistrar + NameWrapper
+  contracts. Wei amounts converted to USD via CoinGecko hourly OHLC at
+  the sale's block timestamp. **Coverage gap:** Alchemy `getNFTSales` v2
+  truncates at block 19,768,978 (May 2024) and does not index Blur
+  marketplace sales. v0 ships with this gap; closing it is a v1 priority.
+- **Registrations + transfers:** The Graph's [ENS subgraph](https://thegraph.com/explorer/subgraphs/5XqPmWe6gjyrJtFn9cLy237i4cWw2j9HcUJEXsP5qGtH).
+- **Wordlists:** Wiktionary dumps, Wikipedia EN article titles, GeoNames
+  `cities500`, US Census baby names, NASDAQ Trader ticker dumps,
+  SEC EDGAR company tickers, ISO 3166 country list.
+- **Macro:** alternative.me Fear & Greed Index, DefiLlama (TVL, stablecoin
+  mcap, DEX volume, NFT marketplace fees).
+- **Trademarks:** USPTO Trademark Case Files Dataset (annual research dump).
+- **Embeddings:** `sentence-transformers/all-mpnet-base-v2`, encoded once
+  for all 3.5M ENS labels in the dataset.
+### Leakage controls
+The first version of this model accidentally leaked future information
+through `lifetime_transfer_count` (it counted *all* transfers ever for a
+labelhash, including transfers that happened *after* the sale being
+predicted). The leaky model showed **train R² 0.81 / test R² −0.29** — the
+classic catastrophic-overfit signature where the model collapses to
+predicting the population mean on held-out data.
+The current model uses `prior_transfer_count`, which only counts transfers
+where `transfer_block < sale_block` per row. It moved to rank #11 in
+feature importance (was #1 by 3.3×). KNN comparable-sale features have a
+similar safeguard: a neighbor's sale only counts if it happened strictly
+before the sale being predicted.
+### Train/Val/Test split
+Fixed-window temporal split:
+- **Train:** sales with `sale_date < 2023-10-01`
+- **Val:** sales 2023-10-01 → 2023-12-31
+- **Test:** sales 2024-01-01 onwards
+This prevents the v0.1 mistake of training on 2022 prices and asking the
+model to extrapolate to a 2024 market regime that's ~4× more expensive
+on average. Val and test are in the same regime so val RMSE is a
+meaningful proxy for test.
+Training rows are weighted with an exponential recency decay (1-year
+half-life, normalized to mean=1.0) so the model leans on 2023 dynamics
+without throwing away the older data entirely.
+## Intended use
+This model is intended for **research and analytics**, not as a price
+oracle and not for live trading.
+**Reasonable uses:**
+- Bulk valuation of mid-tier ENS portfolios for tax/accounting purposes
+- Identifying obviously over- or under-listed names on secondary markets
+- Sanity-checking a listing price before posting
+- Producing comparable-sale ranges for negotiation context
+**Out of scope:**
+- Pricing 3-letter, 1-2 letter, or otherwise-premium names with confidence
+- Pricing celebrity / known-brand names where the buyer pool is concentrated
+- Predicting prices for names in the post-May-2024 marketplace mix
+  (Blur dominance, marketplace fee changes)
+- Any high-stakes financial decision based on a single point estimate
 ## Limitations
+- **Sales coverage**: Jan 2022 – May 2024 only, no Blur. ~2 years of recent
+  sales (mid-2024 onwards) are missing entirely from training. Closing
+  this gap requires either a new sales source (Reservoir/SimpleHash both
+  defunct as of 2024–2025) or direct `eth_getLogs` decoding of Seaport,
+  Blur, X2Y2, LooksRare events, planned for v1.
+- **Celebrity premium**: there's no feature here for "is this a famous
+  person/place/thing?" beyond Wikipedia-title matching. v1 adds
+  LLM-derived structured features (`fame_score`, `name_kind`,
+  `crypto_relevance`, `brand_collision_risk`) which should close most
+  of this gap.
+- **Out-of-distribution labels**: pure-digit labels (`0001`),
+  punycode/emoji, and l33tspeak get less benefit from mpnet embeddings
+  since they're out of distribution for the pretrained model. Length and
+  charset features partially compensate.
+- **Time drift**: the ENS market shifts noticeably every 6–12 months as
+  marketplace dominance, fee structures, and DAO actions move. Predictions
+  on names sold "right now" will lag any regime shift since the training
+  cutoff.
+- **Test-set thinness**: only 2,744 sales meet the $10 floor and post-Jan-2024
+  cutoff. The reported test R² has roughly ±0.08 95% CI — useful as a
+  ballpark, not a precise number.
+## How to use
 ```python
 from huggingface_hub import hf_hub_download
 import xgboost as xgb
 import pickle
 model_path = hf_hub_download(
     repo_id="quantumly/ens-appraiser",
     filename="v0_appraiser_xgb.json",
     filename="v0_pca_mpnet.pkl",
 )
 booster = xgb.Booster()
 booster.load_model(model_path)
 with open(pca_path, "rb") as f:
     pca = pickle.load(f)
+# Inference also requires:
+#  1. mpnet embedding for the label (sentence-transformers/all-mpnet-base-v2)
+#  2. Handcrafted/wordlist/club/trademark/holder/macro features
+#  3. KNN comp lookup against the dataset repo's FAISS index
 #
+# A self-contained inference notebook is planned in the dataset repo.
 ```
+The 146 features expected by the booster are listed in `v0_metadata.json`
+under `feature_cols`, in the exact order required by `xgb.DMatrix`.
+## Reproducibility
+The training notebook ([`v0_appraiser_v2.ipynb`](https://huggingface.co/datasets/quantumly/ens-appraiser-data/blob/main/notebooks/v0_appraiser_v2.ipynb))
+runs end-to-end on a Colab A100 high-RAM instance in ~25 minutes:
+1. Downloads all source parquets from the dataset repo
+2. Reconstructs USD prices via CoinGecko hourly OHLC join
+3. Resolves labels for both BaseRegistrar and NameWrapper sales
+4. Computes all features
+5. Builds HNSW index for KNN
+6. Trains XGBoost with early stopping
+7. Saves model + metadata + diagnostics
+8. Uploads to this model repo
+All randomness is seeded (`seed=42` for XGBoost, PCA, sample weights).
+## Roadmap
+**v1 priorities** (in expected R² delta order):
+1. **LLM-derived features** — Llama 3.1 8B local inference over all 3.5M
+   labels, extracting `fame_score`, `name_kind`, `cultural_origin`,
+   `crypto_relevance`, `brand_collision_risk`, plus a description-embedding.
+   Expected delta: +0.05–0.10 test R².
+2. **Recent sales backfill** via direct `eth_getLogs` decoding of
+   Seaport / Blur / Wyvern / X2Y2 / LooksRare events. Closes the
+   May 2024 → present coverage gap and adds Blur. Expected delta:
+   +0.03–0.06 test R² and a much bigger test set.
+3. **Multi-embedding ensemble** — concatenate mpnet with `bge-base-en-v1.5`
+   and `e5-base-v2`, PCA the joint space. Expected delta: +0.02–0.04.
+4. **Cross-encoder reranker** for KNN comps. Expected delta: +0.02–0.03.
+5. **Contrastive fine-tuning** of mpnet on price-similarity triplets.
+   Expected delta: +0.03–0.05.
 ## Citation
 ```bibtex
 @misc{ens_appraiser_2026,
+  author    = {Drobnič, Nejc},
+  title     = {ENS Appraiser v0.2},
+  year      = {2026},
   publisher = {Hugging Face},
+  url       = {https://huggingface.co/quantumly/ens-appraiser}
 }
 ```
+## License + contact
+MIT. Questions, corrections, pull requests: nejc@nejc.dev