nielsr HF Staff commited on
Commit
f80448b
·
verified ·
1 Parent(s): 009d33b

Add metadata and improve model card

Browse files

Hi! I'm Niels from the Hugging Face community science team. I've opened this PR to improve the model card for KernelGen-LM-1.7B.

The changes include:
- Adding `pipeline_tag: text-generation` to the YAML metadata.
- Specifying the `license: apache-2.0`.
- Adding `library_name: transformers` as the model files indicate compatibility.
- Adding a link to the official GitHub repository for the evaluation framework.
- Formatting the citation as a BibTeX block.

These updates will make the model more discoverable and useful to the community!

Files changed (1) hide show
  1. README.md +13 -7
README.md CHANGED
@@ -1,7 +1,9 @@
1
  ---
2
  language:
3
  - en
4
- ...
 
 
5
  ---
6
 
7
  # AscendKernelGen/KernelGen-LM-1.7B
@@ -11,9 +13,10 @@ language:
11
 
12
  KernelGen-LM-1.7B is a state-of-the-art domain-adaptive large language model specialized for low-level NPU kernel generation, specifically for the Huawei Ascend architecture using the AscendC programming language. Built upon the Qwen3-1.7B backbone, it is trained on the Ascend-CoT dataset and refined via reinforcement learning with execution feedback.
13
 
14
- **Other artifacts:**
15
- * The **AscendKernelGen Technical Report** is published at https://arxiv.org/abs/2601.07160.
16
- * The **NPUKernelBench** evaluation framework is published at https://git.openi.org.cn/PCL-Benchmark/NPUKernelBench.
 
17
 
18
  ## Introduction
19
 
@@ -22,13 +25,16 @@ Our framework, **AscendKernelGen (AKGen)**, bridges the gap between general-purp
22
  * **Ascend-CoT Dataset:** A high-quality, domain-specific dataset incorporating **Chain-of-Thought (CoT)** reasoning. It combines documentation-based reasoning, code-centric reasoning derived from real-world kernel implementations, and general reasoning chains to capture the structured logic required for low-level NPU programming.
23
  * **Domain-Adaptive Post-Training:** A two-stage optimization process that yields **KernelGen-LM**. We first employ **Supervised Fine-Tuning (SFT)** with error-derived supervision (correcting API misuse and numerical errors). This is followed by **Reinforcement Learning (RL)** using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
24
  * **Hardware-Grounded Evaluation:** Validated using **NPUKernelBench**, a comprehensive benchmark that assesses compilation success, functional correctness, and performance (latency) on real Ascend hardware across varying complexity levels.
25
- * **Performance:** The model demonstrates siginificant improvement on complex Level-2 kernels compared to baselines, and effectively solving tasks where general-purpose models (like Qwen3, Llama3.1) fail completely.
26
 
27
  ## Citation
 
 
28
  @article{cao2026ascendkernelgen,
29
  title={AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units},
30
  author={Xinzi Cao and Jianyang Zhai and Pengfei Li and Zhiheng Hu and Cen Yan and Bingxu Mu and Guanghuan Fang and Bin She and Jiayu Li and Yihan Su and Dongyang Tao and Xiansong Huang and Fan Xu and Feidiao Yang and Yao Lu and Chang-Dong Wang and Yutong Lu and Weicheng Xue and Bin Zhou and Yonghong Tian},
31
  journal={arXiv preprint arXiv:2601.07160},
32
  year={2026},
33
- url=https://arxiv.org/abs/2601.07160
34
- }
 
 
1
  ---
2
  language:
3
  - en
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
  ---
8
 
9
  # AscendKernelGen/KernelGen-LM-1.7B
 
13
 
14
  KernelGen-LM-1.7B is a state-of-the-art domain-adaptive large language model specialized for low-level NPU kernel generation, specifically for the Huawei Ascend architecture using the AscendC programming language. Built upon the Qwen3-1.7B backbone, it is trained on the Ascend-CoT dataset and refined via reinforcement learning with execution feedback.
15
 
16
+ **Resources:**
17
+ * **Paper:** [AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units](https://arxiv.org/abs/2601.07160)
18
+ * **GitHub Code:** [weich97/NPUKernelBench](https://github.com/weich97/NPUKernelBench)
19
+ * **Evaluation Framework:** [NPUKernelBench (OpenI)](https://git.openi.org.cn/PCL-Benchmark/NPUKernelBench)
20
 
21
  ## Introduction
22
 
 
25
  * **Ascend-CoT Dataset:** A high-quality, domain-specific dataset incorporating **Chain-of-Thought (CoT)** reasoning. It combines documentation-based reasoning, code-centric reasoning derived from real-world kernel implementations, and general reasoning chains to capture the structured logic required for low-level NPU programming.
26
  * **Domain-Adaptive Post-Training:** A two-stage optimization process that yields **KernelGen-LM**. We first employ **Supervised Fine-Tuning (SFT)** with error-derived supervision (correcting API misuse and numerical errors). This is followed by **Reinforcement Learning (RL)** using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
27
  * **Hardware-Grounded Evaluation:** Validated using **NPUKernelBench**, a comprehensive benchmark that assesses compilation success, functional correctness, and performance (latency) on real Ascend hardware across varying complexity levels.
28
+ * **Performance:** The model demonstrates significant improvement on complex Level-2 kernels compared to baselines, effectively solving tasks where general-purpose models (like Qwen3, Llama3.1) fail completely.
29
 
30
  ## Citation
31
+
32
+ ```bibtex
33
  @article{cao2026ascendkernelgen,
34
  title={AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units},
35
  author={Xinzi Cao and Jianyang Zhai and Pengfei Li and Zhiheng Hu and Cen Yan and Bingxu Mu and Guanghuan Fang and Bin She and Jiayu Li and Yihan Su and Dongyang Tao and Xiansong Huang and Fan Xu and Feidiao Yang and Yao Lu and Chang-Dong Wang and Yutong Lu and Weicheng Xue and Bin Zhou and Yonghong Tian},
36
  journal={arXiv preprint arXiv:2601.07160},
37
  year={2026},
38
+ url={https://arxiv.org/abs/2601.07160}
39
+ }
40
+ ```