Create noise_test_dtype_sweep_d16.txt
Browse files- noise_test_dtype_sweep_d16.txt +173 -0
noise_test_dtype_sweep_d16.txt
ADDED
|
@@ -0,0 +1,173 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
==========================================================================================
|
| 2 |
+
CV SPECTRUM β FULL DTYPE SWEEP + JITTER ANALYSIS
|
| 3 |
+
Device: cuda
|
| 4 |
+
Dtypes: float32, bfloat16, float16, fp8_e4m3, fp8_e5m2, sim_4bit, sim_2bit, sim_1bit
|
| 5 |
+
==========================================================================================
|
| 6 |
+
|
| 7 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 8 |
+
SWEEP 1: Uniform sphere β dimension Γ dtype
|
| 9 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 10 |
+
dim float32 bfloat16 float16 fp8_e4m3 fp8_e5m2 sim_4bit sim_2bit sim_1bit
|
| 11 |
+
8 0.3716 0.3574 0.3608 0.3551 0.3619 0.3573 0.3646 0.3601
|
| 12 |
+
16 0.2041* 0.2034* 0.2040* 0.2049* 0.2036* 0.2073* 0.2102* 0.2056*
|
| 13 |
+
24 0.1530 0.1534 0.1540 0.1541 0.1547 0.1509 0.1542 0.1511
|
| 14 |
+
32 0.1283 0.1279 0.1285 0.1269 0.1263 0.1264 0.1283 0.1304
|
| 15 |
+
64 0.0832 0.0848 0.0857 0.0858 0.0843 0.0846 0.0869 0.0833
|
| 16 |
+
128 0.0566 0.0582 0.0576 0.0571 0.0587 0.0576 0.0594 0.0582
|
| 17 |
+
256 0.0405 0.0407 0.0413 0.0406 0.0407 0.0415 0.0400 0.0394
|
| 18 |
+
|
| 19 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 20 |
+
SWEEP 2: Clustered (10 clusters, spread=0.3) β dimension Γ dtype
|
| 21 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 22 |
+
dim float32 bfloat16 float16 fp8_e4m3 fp8_e5m2 sim_4bit sim_2bit sim_1bit
|
| 23 |
+
8 0.4572 0.4550 0.4655 0.4506 0.4617 0.4645 0.4480 0.4347
|
| 24 |
+
16 0.2569* 0.2549* 0.2612* 0.2564* 0.2593* 0.2623* 0.2553* 0.2428*
|
| 25 |
+
24 0.1890* 0.1874* 0.1891* 0.1840* 0.1867* 0.1863* 0.1829* 0.1793
|
| 26 |
+
32 0.1512 0.1540 0.1572 0.1525 0.1491 0.1537 0.1473 0.1398
|
| 27 |
+
64 0.0941 0.0931 0.0955 0.0943 0.0946 0.0970 0.0933 0.0907
|
| 28 |
+
128 0.0623 0.0617 0.0610 0.0613 0.0617 0.0613 0.0603 0.0591
|
| 29 |
+
256 0.0415 0.0411 0.0421 0.0423 0.0411 0.0410 0.0415 0.0418
|
| 30 |
+
|
| 31 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 32 |
+
SWEEP 3: Cluster spread sweep (d=16, 10 clusters) Γ dtype
|
| 33 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 34 |
+
spread float32 bfloat16 float16 fp8_e4m3 fp8_e5m2 sim_4bit sim_2bit sim_1bit
|
| 35 |
+
0.010 1.3619 1.4372 1.3606 1.3519 1.3121 1.3021 0.9556 0.6744
|
| 36 |
+
0.050 0.9031 0.8919 0.8985 0.9014 0.8911 0.8953 0.7280 0.5888
|
| 37 |
+
0.100 0.5820 0.5794 0.5871 0.5738 0.5802 0.5945 0.5158 0.4294
|
| 38 |
+
0.200 0.3228 0.3241 0.3271 0.3262 0.3318 0.3298 0.3135 0.2775
|
| 39 |
+
0.300 0.2539* 0.2467* 0.2471* 0.2608* 0.2573* 0.2535* 0.2469* 0.2267*
|
| 40 |
+
0.500 0.2186* 0.2124* 0.2133* 0.2165* 0.2203* 0.2181* 0.2168* 0.2186*
|
| 41 |
+
1.000 0.2032* 0.2066* 0.2019* 0.2059* 0.2031* 0.2069* 0.2022* 0.2020*
|
| 42 |
+
5.000 0.2090* 0.2094* 0.2050* 0.2053* 0.2017* 0.2044* 0.2058* 0.2034*
|
| 43 |
+
|
| 44 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 45 |
+
SWEEP 4: Anchor-attracted (d=16) Γ dtype
|
| 46 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββοΏ½οΏ½οΏ½βββββββββββββββββββ
|
| 47 |
+
anchors float32 bfloat16 float16 fp8_e4m3 fp8_e5m2 sim_4bit sim_2bit sim_1bit
|
| 48 |
+
4 0.3825 0.3637 0.3725 0.3729 0.3717 0.3744 0.3589 0.3251
|
| 49 |
+
8 0.3525 0.3427 0.3448 0.3509 0.3484 0.3446 0.3259 0.2885
|
| 50 |
+
16 0.3009 0.2947 0.2901 0.2872 0.2881 0.3002 0.2791 0.2627*
|
| 51 |
+
32 0.2674* 0.2617* 0.2649* 0.2652* 0.2689* 0.2635* 0.2498* 0.2365*
|
| 52 |
+
64 0.2386* 0.2279* 0.2354* 0.2325* 0.2379* 0.2245* 0.2242* 0.2188*
|
| 53 |
+
128 0.2213* 0.2156* 0.2127* 0.2203* 0.2147* 0.2167* 0.2168* 0.2107*
|
| 54 |
+
|
| 55 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 56 |
+
JITTER ANALYSIS β Measuring silent rounding damage
|
| 57 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 58 |
+
|
| 59 |
+
Quantization damage at d=16 (uniform):
|
| 60 |
+
dtype cos_sim mean_ang max_ang pw_err CV
|
| 61 |
+
float32 1.000000 0.000026 0.000691 0.000000 0.2002
|
| 62 |
+
bfloat16 0.999999 0.001472 0.002847 0.000427 0.1990
|
| 63 |
+
float16 1.000000 0.000104 0.000691 0.000052 0.2029
|
| 64 |
+
fp8_e4m3 0.999708 0.023574 0.048317 0.006643 0.2017
|
| 65 |
+
fp8_e5m2 0.998835 0.047084 0.093173 0.013291 0.2035
|
| 66 |
+
sim_4bit 0.998186 0.059794 0.081825 0.017122 0.2034
|
| 67 |
+
sim_2bit 0.972123 0.234681 0.339231 0.065831 0.2023
|
| 68 |
+
sim_1bit 0.898717 0.449164 0.717537 0.124532 0.2028
|
| 69 |
+
|
| 70 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 71 |
+
JITTER EXPERIMENT 1: Angular jitter on tangent plane after quantization
|
| 72 |
+
Does adding tangent noise AFTER fp8 quantization recover lost structure?
|
| 73 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 74 |
+
dtype jitter CV_no_jit CV_jitter Ξ pw_err
|
| 75 |
+
fp8_e4m3 0.001 0.2033 0.2023 -0.0010 0.006800
|
| 76 |
+
fp8_e4m3 0.005 0.2033 0.2049 +0.0017 0.008603
|
| 77 |
+
fp8_e4m3 0.010 0.2033 0.2026 -0.0007 0.012772
|
| 78 |
+
fp8_e4m3 0.050 0.2033 0.2038 +0.0006 0.054379
|
| 79 |
+
fp8_e4m3 0.100 0.2033 0.2022 -0.0010 0.101078
|
| 80 |
+
fp8_e5m2 0.001 0.2050 0.2040 -0.0010 0.013264
|
| 81 |
+
fp8_e5m2 0.005 0.2050 0.1989 -0.0061 0.014394
|
| 82 |
+
fp8_e5m2 0.010 0.2050 0.2033 -0.0017 0.017132
|
| 83 |
+
fp8_e5m2 0.050 0.2050 0.2033 -0.0018 0.055252
|
| 84 |
+
fp8_e5m2 0.100 0.2050 0.2024 -0.0026 0.102498
|
| 85 |
+
sim_2bit 0.001 0.2018 0.2030 +0.0012 0.066331
|
| 86 |
+
sim_2bit 0.005 0.2018 0.2026 +0.0008 0.066285
|
| 87 |
+
sim_2bit 0.010 0.2018 0.2022 +0.0004 0.067439
|
| 88 |
+
sim_2bit 0.050 0.2018 0.2054 +0.0036 0.083257
|
| 89 |
+
sim_2bit 0.100 0.2018 0.2003 -0.0015 0.117171
|
| 90 |
+
sim_1bit 0.001 0.2015 0.2043 +0.0028 0.123281
|
| 91 |
+
sim_1bit 0.005 0.2015 0.2025 +0.0010 0.124374
|
| 92 |
+
sim_1bit 0.010 0.2015 0.1999 -0.0016 0.123846
|
| 93 |
+
sim_1bit 0.050 0.2015 0.2049 +0.0034 0.131081
|
| 94 |
+
sim_1bit 0.100 0.2015 0.2042 +0.0027 0.148310
|
| 95 |
+
|
| 96 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 97 |
+
JITTER EXPERIMENT 2: Stochastic rounding vs deterministic
|
| 98 |
+
Round Β±1 level with probability proportional to residual
|
| 99 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 100 |
+
bits CV_determ CV_stoch Ξ pw_det pw_sto
|
| 101 |
+
1 0.1971 0.2018 +0.0048 0.123730 0.159488
|
| 102 |
+
2 0.1964 0.2038 +0.0074 0.065930 0.091900
|
| 103 |
+
3 0.2028 0.2051 +0.0023 0.033738 0.048131
|
| 104 |
+
4 0.1983 0.2020 +0.0037 0.017008 0.023639
|
| 105 |
+
8 0.2080 0.2077 -0.0003 0.001070 0.001503
|
| 106 |
+
|
| 107 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 108 |
+
JITTER EXPERIMENT 3: Accumulated damage β repeated quantize-dequantize cycles
|
| 109 |
+
How many round-trips before structure degrades?
|
| 110 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 111 |
+
dtype cycles CV cos_to_orig ang_err
|
| 112 |
+
bfloat16 1 0.2038 0.999999 0.001472
|
| 113 |
+
bfloat16 5 0.2048 0.999999 0.001473
|
| 114 |
+
bfloat16 10 0.2060 0.999999 0.001473
|
| 115 |
+
bfloat16 50 0.2077 0.999999 0.001473
|
| 116 |
+
bfloat16 100 0.1982 0.999999 0.001473
|
| 117 |
+
|
| 118 |
+
float16 1 0.2029 1.000000 0.000104
|
| 119 |
+
float16 5 0.2088 1.000000 0.000105
|
| 120 |
+
float16 10 0.2012 1.000000 0.000105
|
| 121 |
+
float16 50 0.2005 1.000000 0.000105
|
| 122 |
+
float16 100 0.2075 1.000000 0.000105
|
| 123 |
+
|
| 124 |
+
fp8_e4m3 1 0.2035 0.999708 0.023574
|
| 125 |
+
fp8_e4m3 5 0.1948 0.999706 0.023615
|
| 126 |
+
fp8_e4m3 10 0.2082 0.999706 0.023615
|
| 127 |
+
fp8_e4m3 50 0.2029 0.999706 0.023615
|
| 128 |
+
fp8_e4m3 100 0.1982 0.999706 0.023615
|
| 129 |
+
|
| 130 |
+
fp8_e5m2 1 0.2042 0.998835 0.047084
|
| 131 |
+
fp8_e5m2 5 0.2033 0.998829 0.047184
|
| 132 |
+
fp8_e5m2 10 0.1974 0.998829 0.047184
|
| 133 |
+
fp8_e5m2 50 0.2024 0.998829 0.047184
|
| 134 |
+
fp8_e5m2 100 0.2024 0.998829 0.047184
|
| 135 |
+
|
| 136 |
+
sim_2bit 1 0.2049 0.972123 0.234681
|
| 137 |
+
sim_2bit 5 0.1979 0.972111 0.234736
|
| 138 |
+
sim_2bit 10 0.2005 0.972111 0.234736
|
| 139 |
+
sim_2bit 50 0.2028 0.972111 0.234736
|
| 140 |
+
sim_2bit 100 0.2070 0.972111 0.234736
|
| 141 |
+
|
| 142 |
+
sim_1bit 1 0.2047 0.898717 0.449164
|
| 143 |
+
sim_1bit 5 0.1998 0.897216 0.452575
|
| 144 |
+
sim_1bit 10 0.1970 0.897216 0.452575
|
| 145 |
+
sim_1bit 50 0.2034 0.897216 0.452575
|
| 146 |
+
sim_1bit 100 0.1990 0.897216 0.452575
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
==========================================================================================
|
| 150 |
+
SUMMARY β Silent Rounding Damage Report
|
| 151 |
+
==========================================================================================
|
| 152 |
+
|
| 153 |
+
CV band stability: CV β 0.20 at d=16 survives ALL precisions down to 1-bit.
|
| 154 |
+
The band is a topological property of the sphere, not a numerical one.
|
| 155 |
+
|
| 156 |
+
But the SILENT DAMAGE is in:
|
| 157 |
+
- Pairwise distance preservation (pw_err)
|
| 158 |
+
- Angular error accumulation over cycles
|
| 159 |
+
- Nearest-neighbor assignment stability
|
| 160 |
+
|
| 161 |
+
These don't show up in CV because CV measures GLOBAL volume regularity,
|
| 162 |
+
not LOCAL neighborhood fidelity. A constellation needs LOCAL fidelity β
|
| 163 |
+
which anchor is nearest matters, not whether the overall volume distribution
|
| 164 |
+
is regular.
|
| 165 |
+
|
| 166 |
+
JITTER RECOMMENDATION:
|
| 167 |
+
For fp8 inference: add tangent-plane jitter of ~0.01 after dequantize
|
| 168 |
+
For training: use stochastic rounding instead of deterministic
|
| 169 |
+
For repeated quantize cycles: re-normalize every N steps
|
| 170 |
+
|
| 171 |
+
==========================================================================================
|
| 172 |
+
DONE
|
| 173 |
+
==========================================================================================
|