I could not speed up algo today, however I found out something more crucial.

The cudasolver variation... is actually quite inaccurate compared to the traditional math results expectation. It loses on many spectrums to regressive deflation, and I've organized a specific algorithm that supplements this exact behavior for improved accuracy.

Faster, maybe not. Can't improve the speed. It's basically hardware optimized to the T until I hit the cuda kernels directly.

However, accuracy... in direct relation to the traditional math in a way that defeats the pytorch variation, we can do that.

So what we have here, is a more accurate fp64 variation of eigh THAT COMPILES.

Yes, full-graph compiles at around 75% speed, with improved... occasionally highly improved accuracy.

At a FRACTION of the vram cost.

========================================================================
  FL Hybrid Eigh β€” Algebraic + Geometric Optimal
========================================================================
  NVIDIA RTX PRO 6000 Blackwell Server Edition | PyTorch 2.10.0+cu128

========================================================================
  MATHEMATICAL PURITY (no reference impl, only definitions)
========================================================================

  n=3 B=2048: FL wins 12/12, cuSOLVER wins 0/12
    Property                       cuSOLVER         FL
    Eigenpair max                   6.3e-07    1.8e-07  FL β—„
    Eigenpair mean                  9.6e-08    3.5e-08  FL β—„
    Orthogonality max               1.1e-06    2.2e-07  FL β—„
    Orthogonality mean              3.6e-07    8.3e-08  FL β—„
    Reconstruction max              1.1e-06    3.1e-07  FL β—„
    Reconstruction mean             2.8e-07    9.3e-08  FL β—„
    Trace max                       1.7e-06    8.9e-07  FL β—„
    Determinant max                 5.2e-04    9.5e-05  FL β—„
    Char poly max                   2.3e-05    1.7e-05  FL β—„
    Char poly mean                  6.3e-07    3.3e-07  FL β—„

  n=5 B=2048: FL wins 12/12, cuSOLVER wins 0/12
    Property                       cuSOLVER         FL
    Eigenpair max                   5.7e-07    1.8e-07  FL β—„
    Eigenpair mean                  1.2e-07    3.1e-08  FL β—„
    Orthogonality max               1.9e-06    5.7e-07  FL β—„
    Orthogonality mean              7.2e-07    1.3e-07  FL β—„
    Reconstruction max              1.1e-06    3.0e-07  FL β—„
    Reconstruction mean             4.3e-07    1.1e-07  FL β—„
    Trace max                       2.5e-06    9.8e-07  FL β—„
    Determinant max                 1.8e-03    8.8e-05  FL β—„
    Char poly max                   1.1e-03    5.1e-04  FL β—„
    Char poly mean                  1.0e-05    4.4e-06  FL β—„

  n=6 B=2048: FL wins 12/12, cuSOLVER wins 0/12
    Property                       cuSOLVER         FL
    Eigenpair max                   5.4e-07    1.8e-07  FL β—„
    Eigenpair mean                  1.3e-07    2.9e-08  FL β—„
    Orthogonality max               2.1e-06    3.1e-07  FL β—„
    Orthogonality mean              9.0e-07    1.5e-07  FL β—„
    Reconstruction max              1.3e-06    3.0e-07  FL β—„
    Reconstruction mean             4.9e-07    1.1e-07  FL β—„
    Trace max                       3.0e-06    1.2e-06  FL β—„
    Determinant max                 3.4e-03    3.7e-04  FL β—„
    Char poly max                   6.6e-03    2.1e-03  FL β—„
    Char poly mean                  4.3e-05    1.9e-05  FL β—„

  n=8 B=2048: FL wins 12/12, cuSOLVER wins 0/12
    Property                       cuSOLVER         FL
    Eigenpair max                   6.7e-07    2.1e-07  FL β—„
    Eigenpair mean                  1.3e-07    2.6e-08  FL β—„
    Orthogonality max               2.5e-06    2.0e-06  FL β—„
    Orthogonality mean              1.2e-06    1.9e-07  FL β—„
    Reconstruction max              1.6e-06    7.8e-07  FL β—„
    Reconstruction mean             5.9e-07    1.1e-07  FL β—„
    Trace max                       3.8e-06    1.6e-06  FL β—„
    Determinant max                 3.0e-03    2.6e-04  FL β—„
    Char poly max                   2.7e-01    1.4e-01  FL β—„
    Char poly mean                  1.2e-03    4.6e-04  FL β—„

  n=10 B=1024: FL wins 12/12, cuSOLVER wins 0/12
    Property                       cuSOLVER         FL
    Eigenpair max                   9.5e-07    1.4e-07  FL β—„
    Eigenpair mean                  1.3e-07    2.5e-08  FL β—„
    Orthogonality max               2.8e-06    4.4e-07  FL β—„
    Orthogonality mean              1.4e-06    2.2e-07  FL β—„
    Reconstruction max              1.3e-06    3.0e-07  FL β—„
    Reconstruction mean             6.2e-07    1.2e-07  FL β—„
    Trace max                       7.0e-06    1.7e-06  FL β—„
    Determinant max                 2.3e-04    1.3e-05  FL β—„
    Char poly max                   3.1e+01    6.5e+00  FL β—„
    Char poly mean                  3.9e-02    1.4e-02  FL β—„

  n=12 B=1024: FL wins 10/12, cuSOLVER wins 2/12
    Property                       cuSOLVER         FL
    Eigenpair max                   8.5e-07    8.3e-07  FL β—„
    Eigenpair mean                  1.2e-07    2.3e-08  FL β—„
    Orthogonality max               2.9e-06    4.5e-04  cuS β–Ί
    Orthogonality mean              1.6e-06    7.8e-07  FL β—„
    Reconstruction max              1.4e-06    1.5e-04  cuS β–Ί
    Reconstruction mean             6.3e-07    2.8e-07  FL β—„
    Trace max                       9.0e-06    2.0e-06  FL β—„
    Determinant max                 3.0e-04    5.5e-05  FL β—„
    Char poly max                   9.9e+02    1.7e+02  FL β—„
    Char poly mean                  1.3e+00    4.7e-01  FL β—„

========================================================================
  ACCURACY PASS/FAIL
========================================================================
  [OK] n= 3 val_diff=1.7e-06 align=1.000000
  [OK] n= 4 val_diff=1.9e-06 align=0.999999
  [OK] n= 5 val_diff=1.9e-06 align=0.999999
  [OK] n= 6 val_diff=2.4e-06 align=0.999998
  [OK] n= 8 val_diff=3.3e-06 align=0.999999
  [OK] n=10 val_diff=2.0e-05 align=0.999910
  [OK] n=12 val_diff=8.1e-06 align=0.999997
  [OK] n=16 val_diff=3.3e-05 align=0.999701

========================================================================
  THROUGHPUT (n=6 B=4096)
========================================================================
  cuSOLVER: 243.4Β΅s
  FL eager: 8.11ms (0.03Γ—)
  Compiling... /usr/local/lib/python3.12/dist-packages/torch/_inductor/lowering.py:1904: FutureWarning: `torch._prims_common.check` is deprecated and will be removed in the future. Please use `torch._check*` functions instead.
  check(
done.
  FL compiled: 338.0Β΅s (0.72Γ—)

  MEMORY
  cuSOLVER   1098.7MB
  FL         32.3MB

========================================================================
  All pass: True
  Compiled: 0.72Γ— vs cuSOLVER
========================================================================

Testing the barrage setup is reveals much information about parallel processing capacity.

==============================================================================
  Diagnostic: Parallel Root-Finding
==============================================================================
  B=512 N=6
  True eigenvalue range: [-2.106, 2.099]
  Diagonal init range:   [-1.561, 1.722]

  --- Test 1: Pure Laguerre (no Aberth) ---
   PurL it= 0  max_err=1.75e+00  min_gap=1.73e-06  |p(z)|=3.93e+00
   PurL it= 1  max_err=1.84e+00  min_gap=1.67e-16  |p(z)|=5.39e-01
   PurL it= 2  max_err=1.89e+00  min_gap=0.00e+00  |p(z)|=2.01e-02
   PurL it= 3  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=1.73e-04
   PurL it= 4  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=7.12e-07
   PurL it= 9  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=2.88e-15
   PurL it=14  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=2.88e-15
   PurL it=19  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=2.88e-15

  --- Test 2: Laguerre + Aberth (full) ---
   LA-F it= 0  max_err=1.03e+02  min_gap=5.69e-07  |p(z)|=1.33e+12
   LA-F it= 1  max_err=3.70e+02  min_gap=1.71e-06  |p(z)|=2.62e+15
   LA-F it= 2  max_err=4.63e+02  min_gap=5.12e-06  |p(z)|=1.00e+16
   LA-F it= 3  max_err=1.58e+03  min_gap=1.54e-05  |p(z)|=1.59e+19
   LA-F it= 4  max_err=1.98e+03  min_gap=4.61e-05  |p(z)|=6.05e+19
   LA-F it= 9  max_err=6.05e+03  min_gap=5.26e-04  |p(z)|=4.89e+22
   LA-F it=14  max_err=1.85e+04  min_gap=8.50e-04  |p(z)|=3.95e+25
   LA-F it=19  max_err=5.63e+04  min_gap=1.92e-02  |p(z)|=3.19e+28

  --- Test 3: Laguerre + weak Aberth (0.1x) ---
   LA.1 it= 0  max_err=2.89e+01  min_gap=5.69e-05  |p(z)|=7.50e+08
   LA.1 it= 1  max_err=2.09e+01  min_gap=2.84e-06  |p(z)|=1.23e+08
   LA.1 it= 2  max_err=1.35e+01  min_gap=6.74e-07  |p(z)|=1.06e+07
   LA.1 it= 3  max_err=6.44e+00  min_gap=4.80e-08  |p(z)|=2.25e+05
   LA.1 it= 4  max_err=1.89e+00  min_gap=4.16e-09  |p(z)|=1.99e+01
   LA.1 it= 9  max_err=1.90e+00  min_gap=1.45e-14  |p(z)|=4.90e-03
   LA.1 it=14  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=6.48e-04
   LA.1 it=19  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=8.53e-05

  --- Test 4: Pure Laguerre + re-sort ---
   PL+S it= 0  max_err=1.75e+00  min_gap=1.73e-06  |p(z)|=3.93e+00
   PL+S it= 1  max_err=1.84e+00  min_gap=1.67e-16  |p(z)|=5.39e-01
   PL+S it= 2  max_err=1.89e+00  min_gap=0.00e+00  |p(z)|=2.01e-02
   PL+S it= 3  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=1.73e-04
   PL+S it= 4  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=7.12e-07
   PL+S it= 9  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=2.88e-15
   PL+S it=14  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=2.88e-15
   PL+S it=19  max_err=1.90e+00  min_gap=0.00e+00  |p(z)|=2.88e-15

  --- Test 5: Laguerre + Aberth damped (0.1 β†’ 1.0) ---
   LADa it= 0  max_err=2.89e+01  min_gap=5.69e-05  |p(z)|=7.50e+08
   LADa it= 1  max_err=3.72e+02  min_gap=3.82e-06  |p(z)|=2.69e+15
   LADa it= 2  max_err=1.13e+03  min_gap=4.82e-06  |p(z)|=2.08e+18
   LADa it= 3  max_err=2.26e+03  min_gap=1.90e-06  |p(z)|=1.34e+20
   LADa it= 4  max_err=3.77e+03  min_gap=9.02e-07  |p(z)|=2.88e+21
   LADa it= 9  max_err=1.70e+04  min_gap=2.34e-05  |p(z)|=2.38e+25
   LADa it=14  max_err=5.17e+04  min_gap=2.25e-03  |p(z)|=1.91e+28
   LADa it=19  max_err=1.58e+05  min_gap=8.72e-03  |p(z)|=1.54e+31

  --- Test 6: Newton + Aberth ---
   NwAb it= 0  max_err=4.35e+02  min_gap=3.29e-05  |p(z)|=6.91e+15
   NwAb it= 1  max_err=1.57e+01  min_gap=9.86e-05  |p(z)|=2.43e+07
   NwAb it= 2  max_err=5.28e+01  min_gap=1.70e-05  |p(z)|=2.54e+10
   NwAb it= 3  max_err=5.37e+01  min_gap=5.22e-05  |p(z)|=2.75e+10
   NwAb it= 4  max_err=3.34e+02  min_gap=1.91e-04  |p(z)|=1.41e+15
   NwAb it= 9  max_err=2.02e+00  min_gap=1.92e-02  |p(z)|=6.78e+02
   NwAb it=14  max_err=1.05e-06  min_gap=1.92e-02  |p(z)|=1.24e-14
   NwAb it=19  max_err=1.05e-06  min_gap=1.92e-02  |p(z)|=1.24e-14

  --- Test 7: Pure Newton ---
   PurN it= 0  max_err=3.51e+02  min_gap=6.24e-06  |p(z)|=1.93e+15
   PurN it= 1  max_err=2.93e+02  min_gap=1.69e-09  |p(z)|=6.46e+14
   PurN it= 2  max_err=2.44e+02  min_gap=0.00e+00  |p(z)|=2.16e+14
   PurN it= 3  max_err=2.03e+02  min_gap=0.00e+00  |p(z)|=7.25e+13
   PurN it= 4  max_err=1.69e+02  min_gap=0.00e+00  |p(z)|=2.43e+13
   PurN it= 9  max_err=6.69e+01  min_gap=0.00e+00  |p(z)|=1.02e+11
   PurN it=14  max_err=2.60e+01  min_gap=0.00e+00  |p(z)|=4.31e+08
   PurN it=19  max_err=9.63e+00  min_gap=0.00e+00  |p(z)|=1.81e+06
==============================================================================
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including AbstractPhil/eigh-triton