quantumly commited on
Commit
4553391
·
verified ·
1 Parent(s): 1841752

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +166 -3
README.md CHANGED
@@ -1,3 +1,166 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: xgboost
4
+ tags:
5
+ - tabular-regression
6
+ - ens
7
+ - ethereum
8
+ - web3
9
+ - domain-names
10
+ - price-prediction
11
+ datasets:
12
+ - quantumly/ens-appraiser-data
13
+ metrics:
14
+ - r_squared
15
+ - mape
16
+ model-index:
17
+ - name: ENS Appraiser v0
18
+ results:
19
+ - task:
20
+ type: tabular-regression
21
+ name: ENS Domain Price Prediction
22
+ dataset:
23
+ name: ENS Appraiser Multi-source Training Data
24
+ type: quantumly/ens-appraiser-data
25
+ metrics:
26
+ - type: r_squared
27
+ value: TODO
28
+ name: R² (log USD)
29
+ - type: median_ape
30
+ value: TODO
31
+ name: Median APE
32
+ - type: rmse
33
+ value: TODO
34
+ name: RMSE (log USD)
35
+ ---
36
+
37
+ # ENS Appraiser
38
+
39
+ A gradient-boosted regression model that predicts the USD sale price of an
40
+ ENS (`.eth`) domain name. This is the v0 baseline — handcrafted features +
41
+ mpnet semantic embeddings + KNN comparable-sale aggregates.
42
+
43
+ > ⚠️ Numeric values in the YAML frontmatter (`TODO`) and the **Evaluation**
44
+ > table below should be filled in with the values from the training
45
+ > notebook's `=== v0 SUMMARY ===` block. The notebook prints exact
46
+ > R²/RMSE/MAPE for train/val/test — copy them here before merging.
47
+
48
+ ## Model Details
49
+
50
+ - **Architecture**: XGBoost regressor on `log(sale_price_usd)`
51
+ - **Features**: ~150 total
52
+ - 15 handcrafted (length, character composition, palindrome/repetition flags)
53
+ - 8 wordlist hits (Wikipedia, GeoNames, US firstnames, ISO 3166, stock tickers, SEC EDGAR, Wiktionary EN)
54
+ - ~45 grails club memberships (binary per club)
55
+ - 1 trademark conflict flag (active USPTO marks in Nice classes 9/35/36/38/41/42/45)
56
+ - 3 holder behavior (name age, registrant portfolio size, lifetime transfer count)
57
+ - 5 macro context (Fear & Greed, ETH TVL, ETH stablecoin mcap, ETH DEX volume, NFT marketplace fees)
58
+ - 64 PCA-reduced mpnet embedding dims (from `sentence-transformers/all-mpnet-base-v2`)
59
+ - 8 KNN comparable-sale aggregates (count, mean/median/p90 log price of nearest neighbors with prior sales)
60
+ - **Training data**: ENS secondary sales, Jan 2022 — May 2024 (~384k events)
61
+ - **Validation**: temporal split (80/10/10 by sale date, no shuffle to prevent KNN-comp leakage)
62
+
63
+ ## Evaluation
64
+
65
+ | Split | R² (log USD) | RMSE (log USD) | Median APE |
66
+ |---|---|---|---|
67
+ | Train | TODO | TODO | TODO |
68
+ | Val | TODO | TODO | TODO |
69
+ | Test | TODO | TODO | TODO |
70
+
71
+ ## Intended Use
72
+
73
+ This model predicts sale prices for ENS `.eth` domain names. It's intended
74
+ for **research and analytics**, not for live trading or as a price oracle.
75
+
76
+ **Use cases it handles well:**
77
+
78
+ - Bulk valuation of mid-tier names ($50–$5,000 range)
79
+ - Identifying obviously over- or under-priced listings
80
+ - Portfolio-level mark-to-market for ENS holdings
81
+ - Sanity-checking listing prices
82
+
83
+ **Use cases where it's weak:**
84
+
85
+ - Celebrity/brand-name premium tail ($50k+ sales) — the model lacks fame data
86
+ - Future names not in training distribution (post-May 2024)
87
+ - Names registered through pathways the subgraph doesn't index
88
+ - Blur-marketplace sales — Alchemy `getNFTSales` v2 doesn't index Blur for ENS,
89
+ so the training data has a marketplace coverage gap
90
+
91
+ ## Limitations
92
+
93
+ - **Sales coverage limitation**: Training data covers Jan 2022 — May 2024 only.
94
+ Alchemy's `getNFTSales` v2 endpoint truncates ENS coverage at block 19768978
95
+ (~May 2024) and doesn't index Blur sales.
96
+ - **Celebrity tail**: Names with significant out-of-band brand value
97
+ (`coinbase.eth`, `vault.eth`) will be systematically underpriced because
98
+ the model lacks features for "is this a famous person/brand."
99
+ - **Out-of-distribution labels**: Pure-digit labels (`0001`), punycode/emoji,
100
+ and l33tspeak get less benefit from mpnet embeddings since they were
101
+ out-of-distribution for the pretrained model.
102
+ - **Time drift**: ENS market regime shifts in 2024-2025 are not captured.
103
+ Predictions for current names will lag those regime shifts.
104
+
105
+ ## How to Use
106
+
107
+ ```python
108
+ from huggingface_hub import hf_hub_download
109
+ import xgboost as xgb
110
+ import pickle
111
+
112
+ # Download model artifacts
113
+ model_path = hf_hub_download(
114
+ repo_id="quantumly/ens-appraiser",
115
+ filename="v0_appraiser_xgb.json",
116
+ )
117
+ pca_path = hf_hub_download(
118
+ repo_id="quantumly/ens-appraiser",
119
+ filename="v0_pca_mpnet.pkl",
120
+ )
121
+
122
+ # Load
123
+ booster = xgb.Booster()
124
+ booster.load_model(model_path)
125
+ with open(pca_path, "rb") as f:
126
+ pca = pickle.load(f)
127
+
128
+ # To make predictions you'll also need:
129
+ # 1. The mpnet embedding for the label (run sentence-transformers all-mpnet-base-v2)
130
+ # 2. The handcrafted features, wordlist lookups, club memberships, trademark check
131
+ # 3. Macro context for the prediction date (ETH price, Fear & Greed, etc.)
132
+ # 4. KNN comp lookup against the FAISS index from the dataset repo
133
+ #
134
+ # See the inference notebook in the dataset repo for the full pipeline.
135
+ ```
136
+
137
+ ## Training Data
138
+
139
+ Built from the [`quantumly/ens-appraiser-data`](https://huggingface.co/datasets/quantumly/ens-appraiser-data)
140
+ dataset, which assembles:
141
+
142
+ - ENS on-chain registrations, renewals, transfers (The Graph subgraph)
143
+ - ENS secondary sales (Alchemy `getNFTSales`)
144
+ - CoinGecko hourly OHLC for label denomination
145
+ - Discourse forums for governance signal
146
+ - DefiLlama for macro signals (TVL, stablecoin mcap, DEX volume, NFT marketplace fees)
147
+ - USPTO trademark registry for brand-conflict flags
148
+ - Grails club memberships
149
+ - Wiktionary, Wikipedia, GeoNames, US Census, SEC EDGAR for wordlist hits
150
+ - `sentence-transformers/all-mpnet-base-v2` for semantic embeddings
151
+
152
+ ## Citation
153
+
154
+ ```bibtex
155
+ @misc{ens_appraiser_2026,
156
+ author = {Drobnič, Nejc},
157
+ title = {ENS Appraiser},
158
+ year = {2026},
159
+ publisher = {Hugging Face},
160
+ url = {https://huggingface.co/quantumly/ens-appraiser}
161
+ }
162
+ ```
163
+
164
+ ## Contact
165
+
166
+ nejc@nejc.dev