AI & ML interests

Applied AI. Agent infrastructure, local inference, and policy gates you can actually run.

dipankarsarkar 
posted an update about 2 hours ago
view post
Post
5
LLM-generated GPU kernels pass the standard correctness test and are still wrong.

The industry oracle is one line: torch.allclose at one shape, one dtype, one seed. Every modern kernel benchmark uses it. It is blind to whole bug classes.

So I built the receipts:
- a 26-op corpus of correct and LLM-buggy kernels
- a differential fuzz vs an fp64 reference that catches what allclose misses
- a live demo you can click

The Correctness Illusion in LLM-Generated GPU Kernels (2606.20128)
dipankarsarkar/gpuemu-corpus
dipankarsarkar/the-correctness-illusion

What is your teams actual correctness oracle for generated kernels?