AbstractPhil commited on
Commit
fa015c1
·
verified ·
1 Parent(s): 185899c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -233,6 +233,36 @@ if __name__ == "__main__":
233
  print(f"{row['D']:5d} {row['avg_cv']:8.4f} {row['in_band_pct']:5.1f}% {row['status']}")
234
  ```
235
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
236
  ## Implications for Architecture Design
237
 
238
  The band is not a training outcome. It is a geometric property of dimensionality. This means:
 
233
  print(f"{row['D']:5d} {row['avg_cv']:8.4f} {row['in_band_pct']:5.1f}% {row['status']}")
234
  ```
235
 
236
+ ## Large Vocabulary Ablation
237
+
238
+ The CV is consistent with the findings and deterministically sample-capable for validity and conjunctive utility.
239
+
240
+ ```
241
+ D=32 fixed. CV across vocab sizes.
242
+ Pool capped at 512 for fair comparison.
243
+ ============================================================
244
+ V= 32 D=32 CV=0.2578 0.1s 0MB
245
+ V= 512 D=32 CV=0.2615 0.0s 0MB
246
+ V= 8,192 D=32 CV=0.2578 0.0s 1MB
247
+ V= 65,536 D=32 CV=0.2663 0.0s 8MB
248
+ V= 131,072 D=32 CV=0.2590 0.0s 17MB
249
+ V= 500,000 D=32 CV=0.2745 0.1s 64MB
250
+ V= 1,000,000 D=32 CV=0.2645 0.2s 128MB
251
+ V= 4,000,000 D=32 CV=0.2541 0.9s 512MB
252
+ V=13,000,000 D=32 CV=0.2681 2.9s 1664MB
253
+
254
+ ============================================================
255
+ Now uncapped pool (sample from ALL embeddings):
256
+ ============================================================
257
+ V= 512 D=32 CV=0.2591 pool=512
258
+ V= 8,192 D=32 CV=0.2427 pool=8192
259
+ V= 65,536 D=32 CV=0.2684 pool=65536
260
+ V= 500,000 D=32 CV=0.2562 pool=500000
261
+ ```
262
+
263
+
264
+
265
+
266
  ## Implications for Architecture Design
267
 
268
  The band is not a training outcome. It is a geometric property of dimensionality. This means: