neullabs (Neul Labs)

posted an update about 2 hours ago

Post

LLM-generated GPU kernels pass the standard correctness test and are still wrong.

The industry oracle is one line: torch.allclose at one shape, one dtype, one seed. Every modern kernel benchmark uses it. It is blind to whole bug classes.

So I built the receipts:
- a 26-op corpus of correct and LLM-buggy kernels
- a differential fuzz vs an fp64 reference that catches what allclose misses
- a live demo you can click

The Correctness Illusion in LLM-Generated GPU Kernels (2606.20128)
dipankarsarkar/gpuemu-corpus
dipankarsarkar/the-correctness-illusion

What is your teams actual correctness oracle for generated kernels?

AI & ML interests

Team members 1

neullabs's activity