File size: 15,118 Bytes
6fb2607
 
 
36af25f
 
 
402d379
 
 
 
36af25f
402d379
36af25f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d8a6cd0
36af25f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
402d379
 
 
 
 
 
 
 
36af25f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d8a6cd0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36af25f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d8a6cd0
fa015c1
d8a6cd0
fa015c1
d8a6cd0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fa015c1
d8a6cd0
fa015c1
d8a6cd0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fa015c1
d8a6cd0
fa015c1
36af25f
 
 
402d379
36af25f
 
 
 
d8a6cd0
 
6fb2607
36af25f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6fb2607
36af25f
6fb2607
36af25f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
---
license: apache-2.0
---

# GeoLIP Deep Embedding Analysis

The first battery is complete. The JSON shows all the embedding sizes that exist within the CV band.

Parse the [sweep.json](https://huggingface.co/AbstractPhil/geolip-deep-embedding-analysis/resolve/main/cv_sweep.json) and find your model attention bands related
to your embedding spaces. This can be used as a differentiation utility to determine how much downstream task is required to compensate for the embedding,
how much should be reduced, how many layers your embeddings can propagate to, and the effective geometric range of those in conjunction.

## What This Measures

The Cayley-Menger determinant computes the squared volume of a 4-simplex (pentachoron) formed by 5 randomly sampled embedding vectors. The coefficient of variation (CV) of these volumes across many random samples reveals the geometric operating regime of the embedding space.

- **CV > 0.30**: Volatile β€” simplex volumes vary wildly, geometric measurements are unstable
- **0.13 < CV < 0.30**: Band-valid β€” volumes carry discriminative structural information
- **CV < 0.13**: Degenerate β€” all simplices look identical, the measurement is blind

The band exists as a function of embedding dimension only. Vocabulary size is irrelevant. Training signal does not move CV β€” it is a property of the ambient dimensionality.

## Key Findings

| Dimension | Avg CV | Band Status |
|-----------|--------|-------------|
| D=8 | 0.605 | Above band β€” volatile |
| D=16 | 0.383 | Above band β€” entering |
| D=24 | 0.304 | **Phase boundary β€” binding constant 0.29154** |
| **D=32** | **0.257** | **Center of band** |
| **D=40** | **0.229** | **Center of band** |
| **D=48** | **0.207** | **Center of band** |
| **D=56** | **0.192** | **In band** |
| **D=64** | **0.180** | **In band** |
| **D=72** | **0.168** | **In band** |
| **D=80** | **0.159** | **In band** |
| **D=88** | **0.152** | **In band** |
| **D=96** | **0.144** | **In band** |
| **D=104** | **0.139** | **In band** |
| **D=112** | **0.134** | **In band** |
| D=120 | 0.129 | Below band β€” exiting |
| D=128 | 0.125 | Below band |
| D=256 | 0.088 | Degenerate |
| D=512 | 0.063 | Degenerate |
| D=768 | 0.051 | Degenerate |

The standard MHA convention of 64 dims per head sits inside the band. This may be a direct causal relationship β€” the matmul scaling principle in attention operates at the dimensionality where simplex geometry remains discriminative.

## Sweep Data

```json
{
  "sweep": {"step": 8, "low": 8, "high": 2048},
  "band": {"lo": 0.13, "hi": 0.30},
  "band_results": [... 3014 entries sorted by CV ...],
  "all_results": [... 65536 entries ...]
}
```

Each entry: `{"V": vocab_size, "D": dim, "CV": value, "in_band": bool}`

## Download and Nearest Dimensional Lookup

```python
import json
import urllib.request

URL = "https://huggingface.co/AbstractPhil/geolip-deep-embedding-analysis/resolve/main/cv_sweep.json"

def load_sweep(path=None):
    """Load sweep from local path or download from HF."""
    if path:
        with open(path) as f:
            return json.load(f)
    with urllib.request.urlopen(URL) as r:
        return json.loads(r.read().decode())

def nearest_band_dim(target_dim, sweep=None):
    """Find the nearest band-valid dimension to your model's embedding dim.
    
    Returns the closest D where CV is in band, plus the expected CV range.
    Use this to determine compartment size for patchwork decomposition.
    
    Example: Your model uses D=768. This tells you to decompose into
    compartments of D=32 (24 compartments) or D=64 (12 compartments).
    """
    if sweep is None:
        sweep = load_sweep()

    # Build D -> CV stats from band_results
    by_dim = {}
    for r in sweep["band_results"]:
        d = r["D"]
        if d not in by_dim:
            by_dim[d] = []
        by_dim[d].append(r["CV"])

    band_dims = sorted(by_dim.keys())
    if not band_dims:
        return None

    # Find nearest
    nearest = min(band_dims, key=lambda d: abs(d - target_dim))

    # Also find best decompositions of target_dim
    decompositions = []
    for d in band_dims:
        if target_dim % d == 0:
            n_compartments = target_dim // d
            cvs = by_dim[d]
            decompositions.append({
                "compartment_dim": d,
                "n_compartments": n_compartments,
                "cv_min": round(min(cvs), 4),
                "cv_max": round(max(cvs), 4),
                "cv_avg": round(sum(cvs) / len(cvs), 4),
            })

    cvs = by_dim[nearest]
    return {
        "target_dim": target_dim,
        "nearest_band_dim": nearest,
        "cv_range": [round(min(cvs), 4), round(max(cvs), 4)],
        "cv_avg": round(sum(cvs) / len(cvs), 4),
        "valid_decompositions": sorted(decompositions, key=lambda x: x["compartment_dim"]),
    }


# ── Usage ──

if __name__ == "__main__":
    for model_dim in [768, 1024, 512, 384, 256, 128]:
        result = nearest_band_dim(model_dim)
        print(f"\n{'='*50}")
        print(f"Model dim: {model_dim}")
        print(f"Nearest band dim: D={result['nearest_band_dim']}  CV={result['cv_avg']:.4f}")
        if result["valid_decompositions"]:
            print(f"Valid decompositions:")
            for dec in result["valid_decompositions"]:
                print(f"  {dec['n_compartments']:3d} Γ— D={dec['compartment_dim']:3d}  "
                      f"CV={dec['cv_avg']:.4f} [{dec['cv_min']:.4f}-{dec['cv_max']:.4f}]")
        else:
            print(f"  No exact decompositions β€” consider padding or truncating")
```

## Parse and Filter

```python
import json

with open("cv_sweep.json") as f:
    data = json.load(f)

# Filter for any CV range β€” example: binding constant region
lo, hi = 0.290, 0.292
hits = [e for e in data["band_results"] if lo <= e["CV"] <= hi]
hits.sort(key=lambda x: x["CV"])

print(f"CV in [{lo}, {hi}]: {len(hits)} entries")
for h in hits:
    print(f"  V={h['V']:6d}  D={h['D']:4d}  CV={h['CV']:.4f}")

# Group by D
dims = {}
for h in hits:
    dims.setdefault(h["D"], []).append(h)
for d in sorted(dims):
    entries = dims[d]
    print(f"  D={d:3d}: {len(entries)} entries  "
          f"CV={min(e['CV'] for e in entries):.4f}-{max(e['CV'] for e in entries):.4f}")
```

## Rescale and Sort

```python
def rescale_sort(sweep=None, group_by="dim"):
    """Sort and group sweep results for analysis.

    group_by: 'dim' groups by embedding dimension (recommended)
              'cv'  groups into cv quartiles within band
              'ratio' groups by V/D ratio
    """
    if sweep is None:
        sweep = load_sweep()

    band_lo = sweep["band"]["lo"]
    band_hi = sweep["band"]["hi"]
    results = [r for r in sweep["all_results"] if r["CV"] is not None]

    if group_by == "dim":
        # Group by D, show band status and CV statistics
        by_dim = {}
        for r in results:
            d = r["D"]
            if d not in by_dim:
                by_dim[d] = {"in_band": [], "below": [], "above": []}
            if r["CV"] > band_hi:
                by_dim[d]["above"].append(r["CV"])
            elif r["CV"] < band_lo:
                by_dim[d]["below"].append(r["CV"])
            else:
                by_dim[d]["in_band"].append(r["CV"])

        table = []
        for d in sorted(by_dim.keys()):
            g = by_dim[d]
            all_cvs = g["in_band"] + g["below"] + g["above"]
            avg = sum(all_cvs) / len(all_cvs)
            table.append({
                "D": d,
                "avg_cv": round(avg, 4),
                "in_band_pct": round(100 * len(g["in_band"]) / len(all_cvs), 1),
                "n_total": len(all_cvs),
                "n_in_band": len(g["in_band"]),
                "status": "IN_BAND" if band_lo < avg < band_hi else
                          "ABOVE" if avg >= band_hi else "BELOW",
            })
        return table

    elif group_by == "cv":
        # Quartile analysis within band
        band = [r for r in results if band_lo < r["CV"] < band_hi]
        if not band:
            return []
        band.sort(key=lambda r: r["CV"])
        n = len(band)
        return {
            "total_in_band": n,
            "q1_low":  [r for r in band[:n//4]],
            "q2_mid_low": [r for r in band[n//4:n//2]],
            "q3_mid_high": [r for r in band[n//2:3*n//4]],
            "q4_high": [r for r in band[3*n//4:]],
            "q1_cv_range": [round(band[0]["CV"], 4), round(band[n//4-1]["CV"], 4)],
            "q2_cv_range": [round(band[n//4]["CV"], 4), round(band[n//2-1]["CV"], 4)],
            "q3_cv_range": [round(band[n//2]["CV"], 4), round(band[3*n//4-1]["CV"], 4)],
            "q4_cv_range": [round(band[3*n//4]["CV"], 4), round(band[-1]["CV"], 4)],
        }

    elif group_by == "ratio":
        # Group by V/D ratio β€” demonstrates V irrelevance
        band = [r for r in results if band_lo < r["CV"] < band_hi]
        by_ratio = {}
        for r in band:
            ratio = round(r["V"] / r["D"], 1)
            if ratio not in by_ratio:
                by_ratio[ratio] = []
            by_ratio[ratio].append(r)
        return {k: {"count": len(v), "dims": sorted(set(r["D"] for r in v))}
                for k, v in sorted(by_ratio.items())}


# ── Usage ──

if __name__ == "__main__":
    table = rescale_sort(group_by="dim")
    print(f"{'D':>5}  {'Avg CV':>8}  {'Band%':>6}  {'Status'}")
    print("-" * 40)
    for row in table:
        if row["D"] <= 256:
            print(f"{row['D']:5d}  {row['avg_cv']:8.4f}  {row['in_band_pct']:5.1f}%  {row['status']}")
```

## The Binding Constant is D=24

Filtering the sweep for CV in [0.290, 0.292] β€” the region around the empirically observed binding constant 0.29154 β€” returns 12 entries:

| V | D | CV |
|---|---|-----|
| 24 | 16 | 0.2900 |
| 368 | 32 | 0.2903 |
| 1632 | 24 | 0.2906 |
| 208 | 24 | 0.2908 |
| 1096 | 24 | 0.2911 |
| 1992 | 24 | 0.2911 |
| 200 | 24 | 0.2914 |
| 1024 | 24 | 0.2916 |
| 760 | 24 | 0.2917 |
| 1232 | 24 | 0.2917 |
| 776 | 24 | 0.2919 |
| 904 | 24 | 0.2920 |

10 of 12 entries are D=24. The binding constant 0.29154 is the native CV of a 24-dimensional embedding space. It is not a learned value. It is not an empirical coincidence. It is the geometric fingerprint of D=24.

## The Computational Boundary

D=24 is also the exact dimension where custom SVD kernels hit an 8x performance cliff and eigendecomposition (eigh) collapses. The binding constant marks a dual boundary:

- **Geometric**: the phase transition between volatile simplex volumes (above 0.30) and discriminative geometry (below 0.30)
- **Computational**: the resolution limit of compact spectral decomposition kernels

Every time the constant 0.29154 appeared across 17+ pretrained models, the system was measuring the dimensional fingerprint of its own computational ceiling. The constellation encoded this ceiling as a structural constant because it could not compute past it.

D=32 is the first dimension past this wall that remains in band (CV ~0.257). Operating there requires `torch.linalg.det` on a 6Γ—6 CM matrix β€” which compiles regardless of embedding dimension, because the CM matrix is always 6Γ—6 for five-point simplices. The pairwise distances are computed via gram matrix (batched matmul, compiles perfectly). Only the `det` call touches linalg, and 6Γ—6 is well within kernel range.

## MHA Activation Geometry

Measuring CV on per-head Q/K/V **activations** (not weights) after training reveals head_dim-dependent geometric behavior:

| head_dim | Q activation CV | K activation CV | V activation CV |
|----------|----------------|----------------|----------------|
| 64 | ~0.32 | ~0.42 | ~0.41 |
| 32 | ~0.38 | ~0.45 | ~0.43 |
| 16 | ~0.48 | ~0.70 | ~0.53 |
| 8 | ~0.65 | ~0.77 | ~0.63 |

Key observations:

- **Embedding activations are always in band** (CV 0.19–0.30) regardless of nominal D β€” training compresses effective dimensionality into band
- **K activations are asymmetrically volatile** β€” keys spread further than queries to make attention discriminative
- **Q activations track head_dim** following the same curve as the embedding sweep β€” the 64-dim convention keeps Q near band edge
- **The Q/K ratio** measures selectivity pressure: too high = brittle attention, too close to 1.0 = uniform attention

These ratios can be used as a zero-cost diagnostic on any pretrained transformer: forward one batch, measure per-head activation CV, and immediately identify which heads are geometrically healthy vs collapsing.

## Vocabulary Independence

CV at D=32 was verified from V=32 to V=13,000,000. The result is invariant:

```
V=        32  D=32  CV=0.2578
V=       512  D=32  CV=0.2615
V=     8,192  D=32  CV=0.2578
V=    65,536  D=32  CV=0.2663
V=   131,072  D=32  CV=0.2590
V=   500,000  D=32  CV=0.2745
V= 1,000,000  D=32  CV=0.2645
V= 4,000,000  D=32  CV=0.2541
V=13,000,000  D=32  CV=0.2681
```

Vocabulary size does not gate band membership. The CM determinant samples 5 points β€” the distribution of simplex volumes depends on ambient dimensionality, not on the number of points in the space.

## Implications for Architecture Design

The band is not a training outcome. It is a geometric property of dimensionality. This means:

1. **Embedding compartments must be D=32 to D=64** for Cayley-Menger volumes to carry discriminative information
2. **A 768-dim model** should decompose into 24Γ—32 or 12Γ—64 compartments, not operate as a monolithic vector
3. **The standard 64-dim attention head** may exist precisely because it sits inside this geometric band
4. **Scaling** comes from composing band-valid units with geometric linkages, not from widening dimensions beyond the band
5. **D=24 (CV=0.29154)** is the phase boundary β€” any component pushed above this threshold has crossed from structured into volatile geometry
6. **The 6Γ—6 CM determinant compiles** at any embedding dimension β€” the computational bottleneck was in spectral decomposition, not in the geometric measurement itself

## Reproducing

```python
# The sweep script that generated this data
# Requires: torch

import torch, torch.nn as nn, torch.nn.functional as F, math, json

def cayley_menger_vol2(points):
    B, N, D = points.shape
    gram = torch.bmm(points, points.transpose(1, 2))
    norms = torch.diagonal(gram, dim1=1, dim2=2)
    d2 = F.relu(norms.unsqueeze(2) + norms.unsqueeze(1) - 2 * gram)
    cm = torch.zeros(B, N+1, N+1, device=points.device, dtype=points.dtype)
    cm[:, 0, 1:] = 1; cm[:, 1:, 0] = 1; cm[:, 1:, 1:] = d2
    k = N - 1
    return ((-1)**(k+1)) * torch.linalg.det(cm.float()).to(points.dtype) / ((2**k) * (math.factorial(k)**2))

def cv_metric(weight, n_samples=300):
    V, D = weight.shape
    pool = min(V, 512)
    idx = torch.stack([torch.randperm(pool)[:5] for _ in range(n_samples)])
    vol2 = cayley_menger_vol2(weight[:pool][idx])
    valid = vol2 > 1e-20
    if valid.sum() < 10: return None
    vols = vol2[valid].sqrt()
    return (vols.std() / (vols.mean() + 1e-8)).item()
```

## Citation

Part of the [GeoLIP](https://huggingface.co/AbstractPhil) geometric deep learning research.