| --- |
| license: apache-2.0 |
| --- |
| |
| I **could not** speed up algo **today**, however I found out something more crucial. |
|
|
| The cudasolver variation... is actually **quite inaccurate** compared to the traditional math results expectation. |
| It loses on many spectrums to regressive deflation, and I've organized a specific algorithm that supplements this exact |
| behavior for improved accuracy. |
|
|
| Faster, maybe not. Can't improve the speed. It's basically hardware optimized to the T until I hit the cuda kernels directly. |
|
|
| However, **accuracy**... in direct relation to the traditional math in a way that defeats the pytorch variation, we can do that. |
|
|
| So what we have here, is a more accurate fp64 variation of eigh THAT COMPILES. |
|
|
| Yes, full-graph compiles at around 75% speed, with improved... occasionally highly improved accuracy. |
|
|
| At a **FRACTION** of the vram cost. |
|
|
| ``` |
| ======================================================================== |
| FL Hybrid Eigh β Algebraic + Geometric Optimal |
| ======================================================================== |
| NVIDIA RTX PRO 6000 Blackwell Server Edition | PyTorch 2.10.0+cu128 |
|
|
| ======================================================================== |
| MATHEMATICAL PURITY (no reference impl, only definitions) |
| ======================================================================== |
|
|
| n=3 B=2048: FL wins 12/12, cuSOLVER wins 0/12 |
| Property cuSOLVER FL |
| Eigenpair max 6.3e-07 1.8e-07 FL β |
| Eigenpair mean 9.6e-08 3.5e-08 FL β |
| Orthogonality max 1.1e-06 2.2e-07 FL β |
| Orthogonality mean 3.6e-07 8.3e-08 FL β |
| Reconstruction max 1.1e-06 3.1e-07 FL β |
| Reconstruction mean 2.8e-07 9.3e-08 FL β |
| Trace max 1.7e-06 8.9e-07 FL β |
| Determinant max 5.2e-04 9.5e-05 FL β |
| Char poly max 2.3e-05 1.7e-05 FL β |
| Char poly mean 6.3e-07 3.3e-07 FL β |
| |
| n=5 B=2048: FL wins 12/12, cuSOLVER wins 0/12 |
| Property cuSOLVER FL |
| Eigenpair max 5.7e-07 1.8e-07 FL β |
| Eigenpair mean 1.2e-07 3.1e-08 FL β |
| Orthogonality max 1.9e-06 5.7e-07 FL β |
| Orthogonality mean 7.2e-07 1.3e-07 FL β |
| Reconstruction max 1.1e-06 3.0e-07 FL β |
| Reconstruction mean 4.3e-07 1.1e-07 FL β |
| Trace max 2.5e-06 9.8e-07 FL β |
| Determinant max 1.8e-03 8.8e-05 FL β |
| Char poly max 1.1e-03 5.1e-04 FL β |
| Char poly mean 1.0e-05 4.4e-06 FL β |
| |
| n=6 B=2048: FL wins 12/12, cuSOLVER wins 0/12 |
| Property cuSOLVER FL |
| Eigenpair max 5.4e-07 1.8e-07 FL β |
| Eigenpair mean 1.3e-07 2.9e-08 FL β |
| Orthogonality max 2.1e-06 3.1e-07 FL β |
| Orthogonality mean 9.0e-07 1.5e-07 FL β |
| Reconstruction max 1.3e-06 3.0e-07 FL β |
| Reconstruction mean 4.9e-07 1.1e-07 FL β |
| Trace max 3.0e-06 1.2e-06 FL β |
| Determinant max 3.4e-03 3.7e-04 FL β |
| Char poly max 6.6e-03 2.1e-03 FL β |
| Char poly mean 4.3e-05 1.9e-05 FL β |
| |
| n=8 B=2048: FL wins 12/12, cuSOLVER wins 0/12 |
| Property cuSOLVER FL |
| Eigenpair max 6.7e-07 2.1e-07 FL β |
| Eigenpair mean 1.3e-07 2.6e-08 FL β |
| Orthogonality max 2.5e-06 2.0e-06 FL β |
| Orthogonality mean 1.2e-06 1.9e-07 FL β |
| Reconstruction max 1.6e-06 7.8e-07 FL β |
| Reconstruction mean 5.9e-07 1.1e-07 FL β |
| Trace max 3.8e-06 1.6e-06 FL β |
| Determinant max 3.0e-03 2.6e-04 FL β |
| Char poly max 2.7e-01 1.4e-01 FL β |
| Char poly mean 1.2e-03 4.6e-04 FL β |
| |
| n=10 B=1024: FL wins 12/12, cuSOLVER wins 0/12 |
| Property cuSOLVER FL |
| Eigenpair max 9.5e-07 1.4e-07 FL β |
| Eigenpair mean 1.3e-07 2.5e-08 FL β |
| Orthogonality max 2.8e-06 4.4e-07 FL β |
| Orthogonality mean 1.4e-06 2.2e-07 FL β |
| Reconstruction max 1.3e-06 3.0e-07 FL β |
| Reconstruction mean 6.2e-07 1.2e-07 FL β |
| Trace max 7.0e-06 1.7e-06 FL β |
| Determinant max 2.3e-04 1.3e-05 FL β |
| Char poly max 3.1e+01 6.5e+00 FL β |
| Char poly mean 3.9e-02 1.4e-02 FL β |
| |
| n=12 B=1024: FL wins 10/12, cuSOLVER wins 2/12 |
| Property cuSOLVER FL |
| Eigenpair max 8.5e-07 8.3e-07 FL β |
| Eigenpair mean 1.2e-07 2.3e-08 FL β |
| Orthogonality max 2.9e-06 4.5e-04 cuS βΊ |
| Orthogonality mean 1.6e-06 7.8e-07 FL β |
| Reconstruction max 1.4e-06 1.5e-04 cuS βΊ |
| Reconstruction mean 6.3e-07 2.8e-07 FL β |
| Trace max 9.0e-06 2.0e-06 FL β |
| Determinant max 3.0e-04 5.5e-05 FL β |
| Char poly max 9.9e+02 1.7e+02 FL β |
| Char poly mean 1.3e+00 4.7e-01 FL β |
| |
| ======================================================================== |
| ACCURACY PASS/FAIL |
| ======================================================================== |
| [OK] n= 3 val_diff=1.7e-06 align=1.000000 |
| [OK] n= 4 val_diff=1.9e-06 align=0.999999 |
| [OK] n= 5 val_diff=1.9e-06 align=0.999999 |
| [OK] n= 6 val_diff=2.4e-06 align=0.999998 |
| [OK] n= 8 val_diff=3.3e-06 align=0.999999 |
| [OK] n=10 val_diff=2.0e-05 align=0.999910 |
| [OK] n=12 val_diff=8.1e-06 align=0.999997 |
| [OK] n=16 val_diff=3.3e-05 align=0.999701 |
|
|
| ======================================================================== |
| THROUGHPUT (n=6 B=4096) |
| ======================================================================== |
| cuSOLVER: 243.4Β΅s |
| FL eager: 8.11ms (0.03Γ) |
| Compiling... /usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:1904: FutureWarning: `torch._prims_common.check` is deprecated and will be removed in the future. Please use `torch._check*` functions instead. |
| check( |
| done. |
| FL compiled: 338.0Β΅s (0.72Γ) |
| |
| MEMORY |
| cuSOLVER 1098.7MB |
| FL 32.3MB |
| |
| ======================================================================== |
| All pass: True |
| Compiled: 0.72Γ vs cuSOLVER |
| ======================================================================== |
| ``` |
| |
| |
| |
| Testing the barrage setup is reveals much information about parallel processing capacity. |
| |
| ``` |
| ============================================================================== |
| Diagnostic: Parallel Root-Finding |
| ============================================================================== |
| B=512 N=6 |
| True eigenvalue range: [-2.106, 2.099] |
| Diagonal init range: [-1.561, 1.722] |
| |
| --- Test 1: Pure Laguerre (no Aberth) --- |
| PurL it= 0 max_err=1.75e+00 min_gap=1.73e-06 |p(z)|=3.93e+00 |
| PurL it= 1 max_err=1.84e+00 min_gap=1.67e-16 |p(z)|=5.39e-01 |
| PurL it= 2 max_err=1.89e+00 min_gap=0.00e+00 |p(z)|=2.01e-02 |
| PurL it= 3 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=1.73e-04 |
| PurL it= 4 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=7.12e-07 |
| PurL it= 9 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=2.88e-15 |
| PurL it=14 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=2.88e-15 |
| PurL it=19 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=2.88e-15 |
| |
| --- Test 2: Laguerre + Aberth (full) --- |
| LA-F it= 0 max_err=1.03e+02 min_gap=5.69e-07 |p(z)|=1.33e+12 |
| LA-F it= 1 max_err=3.70e+02 min_gap=1.71e-06 |p(z)|=2.62e+15 |
| LA-F it= 2 max_err=4.63e+02 min_gap=5.12e-06 |p(z)|=1.00e+16 |
| LA-F it= 3 max_err=1.58e+03 min_gap=1.54e-05 |p(z)|=1.59e+19 |
| LA-F it= 4 max_err=1.98e+03 min_gap=4.61e-05 |p(z)|=6.05e+19 |
| LA-F it= 9 max_err=6.05e+03 min_gap=5.26e-04 |p(z)|=4.89e+22 |
| LA-F it=14 max_err=1.85e+04 min_gap=8.50e-04 |p(z)|=3.95e+25 |
| LA-F it=19 max_err=5.63e+04 min_gap=1.92e-02 |p(z)|=3.19e+28 |
| |
| --- Test 3: Laguerre + weak Aberth (0.1x) --- |
| LA.1 it= 0 max_err=2.89e+01 min_gap=5.69e-05 |p(z)|=7.50e+08 |
| LA.1 it= 1 max_err=2.09e+01 min_gap=2.84e-06 |p(z)|=1.23e+08 |
| LA.1 it= 2 max_err=1.35e+01 min_gap=6.74e-07 |p(z)|=1.06e+07 |
| LA.1 it= 3 max_err=6.44e+00 min_gap=4.80e-08 |p(z)|=2.25e+05 |
| LA.1 it= 4 max_err=1.89e+00 min_gap=4.16e-09 |p(z)|=1.99e+01 |
| LA.1 it= 9 max_err=1.90e+00 min_gap=1.45e-14 |p(z)|=4.90e-03 |
| LA.1 it=14 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=6.48e-04 |
| LA.1 it=19 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=8.53e-05 |
| |
| --- Test 4: Pure Laguerre + re-sort --- |
| PL+S it= 0 max_err=1.75e+00 min_gap=1.73e-06 |p(z)|=3.93e+00 |
| PL+S it= 1 max_err=1.84e+00 min_gap=1.67e-16 |p(z)|=5.39e-01 |
| PL+S it= 2 max_err=1.89e+00 min_gap=0.00e+00 |p(z)|=2.01e-02 |
| PL+S it= 3 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=1.73e-04 |
| PL+S it= 4 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=7.12e-07 |
| PL+S it= 9 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=2.88e-15 |
| PL+S it=14 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=2.88e-15 |
| PL+S it=19 max_err=1.90e+00 min_gap=0.00e+00 |p(z)|=2.88e-15 |
| |
| --- Test 5: Laguerre + Aberth damped (0.1 β 1.0) --- |
| LADa it= 0 max_err=2.89e+01 min_gap=5.69e-05 |p(z)|=7.50e+08 |
| LADa it= 1 max_err=3.72e+02 min_gap=3.82e-06 |p(z)|=2.69e+15 |
| LADa it= 2 max_err=1.13e+03 min_gap=4.82e-06 |p(z)|=2.08e+18 |
| LADa it= 3 max_err=2.26e+03 min_gap=1.90e-06 |p(z)|=1.34e+20 |
| LADa it= 4 max_err=3.77e+03 min_gap=9.02e-07 |p(z)|=2.88e+21 |
| LADa it= 9 max_err=1.70e+04 min_gap=2.34e-05 |p(z)|=2.38e+25 |
| LADa it=14 max_err=5.17e+04 min_gap=2.25e-03 |p(z)|=1.91e+28 |
| LADa it=19 max_err=1.58e+05 min_gap=8.72e-03 |p(z)|=1.54e+31 |
| |
| --- Test 6: Newton + Aberth --- |
| NwAb it= 0 max_err=4.35e+02 min_gap=3.29e-05 |p(z)|=6.91e+15 |
| NwAb it= 1 max_err=1.57e+01 min_gap=9.86e-05 |p(z)|=2.43e+07 |
| NwAb it= 2 max_err=5.28e+01 min_gap=1.70e-05 |p(z)|=2.54e+10 |
| NwAb it= 3 max_err=5.37e+01 min_gap=5.22e-05 |p(z)|=2.75e+10 |
| NwAb it= 4 max_err=3.34e+02 min_gap=1.91e-04 |p(z)|=1.41e+15 |
| NwAb it= 9 max_err=2.02e+00 min_gap=1.92e-02 |p(z)|=6.78e+02 |
| NwAb it=14 max_err=1.05e-06 min_gap=1.92e-02 |p(z)|=1.24e-14 |
| NwAb it=19 max_err=1.05e-06 min_gap=1.92e-02 |p(z)|=1.24e-14 |
| |
| --- Test 7: Pure Newton --- |
| PurN it= 0 max_err=3.51e+02 min_gap=6.24e-06 |p(z)|=1.93e+15 |
| PurN it= 1 max_err=2.93e+02 min_gap=1.69e-09 |p(z)|=6.46e+14 |
| PurN it= 2 max_err=2.44e+02 min_gap=0.00e+00 |p(z)|=2.16e+14 |
| PurN it= 3 max_err=2.03e+02 min_gap=0.00e+00 |p(z)|=7.25e+13 |
| PurN it= 4 max_err=1.69e+02 min_gap=0.00e+00 |p(z)|=2.43e+13 |
| PurN it= 9 max_err=6.69e+01 min_gap=0.00e+00 |p(z)|=1.02e+11 |
| PurN it=14 max_err=2.60e+01 min_gap=0.00e+00 |p(z)|=4.31e+08 |
| PurN it=19 max_err=9.63e+00 min_gap=0.00e+00 |p(z)|=1.81e+06 |
| ============================================================================== |
| |
| ``` |