AscendKernelGen commited on
Commit
73cbbc9
·
verified ·
1 Parent(s): af44bc1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ datasets:
8
+ - AscendKernelGen/Ascend-COT-v2-json
9
+ ---
10
+
11
+ # AscendKernelGen/KernelGen-LM-32B
12
+
13
+ ![License](https://img.shields.io/badge/License-Apache-yellow)
14
+ [![arXiv](https://img.shields.io/badge/arXiv-2601.07160-b31b1b.svg)](https://arxiv.org/abs/2601.07160)
15
+
16
+ ## Overview
17
+
18
+ **KernelGen-LM-32B** is a state-of-the-art domain-adaptive large language model designed for low-level NPU kernel generation, targeting Huawei Ascend hardware with the AscendC programming language.
19
+
20
+ Built upon the **Qwen3-Coder-30B (Mixture-of-Experts, MoE) backbone**, the model is further specialized through domain-adaptive post-training on the Ascend-CoT dataset, followed by reinforcement learning with execution feedback.
21
+
22
+ It achieves remarkable performance in generating complex and functional hardware kernels—boosting compilation success on Level-2 tasks from 0% (baseline) to **96.5% (Pass@10)**, and reaching **40.5% functional correctness**, where baseline models fail entirely.
23
+
24
+ ---
25
+
26
+ ## Links
27
+
28
+ - **Paper:** https://huggingface.co/papers/2601.07160
29
+ - **Code:** https://github.com/weich97/NPUKernelBench
30
+ - **Datasets:** https://huggingface.co/AscendKernelGen/datasets
31
+
32
+ ---
33
+
34
+ ## Introduction
35
+
36
+ Our framework, **AscendKernelGen (AKGen)**, systematically bridges the gap between general-purpose code generation and hardware-specific programming via a closed-loop pipeline of data construction, training, and evaluation.
37
+
38
+ ### Ascend-CoT Dataset
39
+
40
+ A high-quality, domain-specific dataset enriched with **Chain-of-Thought (CoT)** reasoning. It integrates:
41
+
42
+ - Documentation-grounded reasoning
43
+ - Code-centric reasoning from real-world kernel implementations
44
+ - General structured reasoning chains
45
+
46
+ This enables the model to capture the intricate logic required for low-level NPU kernel development.
47
+
48
+ ---
49
+
50
+ ### Domain-Adaptive Post-Training
51
+
52
+ We introduce a two-stage optimization pipeline to obtain **KernelGen-LM**:
53
+
54
+ - **Supervised Fine-Tuning (SFT):**
55
+ Leveraging error-derived supervision to correct API misuse and numerical inaccuracies
56
+
57
+ - **Reinforcement Learning (DPO):**
58
+ Guided by execution-based correctness and performance feedback
59
+
60
+ This combination significantly enhances both syntactic validity and runtime reliability.
61
+
62
+ ---
63
+
64
+ ### Hardware-Grounded Evaluation
65
+
66
+ We validate performance using **NPUKernelBench**, a comprehensive benchmark evaluating:
67
+
68
+ - Compilation success
69
+ - Functional correctness
70
+ - Runtime performance (latency)
71
+
72
+ All evaluations are conducted on real Ascend hardware across varying task complexities.
73
+
74
+ ---
75
+
76
+ ### Performance Highlights
77
+
78
+ KernelGen-LM demonstrates substantial improvements on complex Level-2 kernel generation tasks, successfully solving problems where general-purpose LLMs (e.g., Qwen3, Llama3.1) completely fail.
79
+
80
+ ---
81
+
82
+ ## Citation
83
+
84
+ ```bibtex
85
+ @article{cao2026ascendkernelgen,
86
+ title={AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units},
87
+ author={Xinzi Cao and Jianyang Zhai and Pengfei Li and Zhiheng Hu and Cen Yan and Bingxu Mu and Guanghuan Fang and Bin She and Jiayu Li and Yihan Su and Dongyang Tao and Xiansong Huang and Fan Xu and Feidiao Yang and Yao Lu and Chang-Dong Wang and Yutong Lu and Weicheng Xue and Bin Zhou and Yonghong Tian},
88
+ journal={arXiv preprint arXiv:2601.07160},
89
+ year={2026},
90
+ url={https://arxiv.org/abs/2601.07160}
91
+ }