ByteDance-Seed
/

cudaLLM-8B

Text Generation

Model card Files Files and versions

cudaLLM-8B / README.md

preminstrel's picture

Update README.md

844c7c0 verified 5 months ago

|

history blame contribute delete

2.46 kB

	---
	license: apache-2.0
	datasets:
	- ByteDance-Seed/cudaLLM-data
	base_model:
	- Qwen/Qwen3-8B
	pipeline_tag: text-generation
	tags:
	- code
	- CUDA
	---

	## CudaLLM: A Language Model for High-Performance CUDA Kernel Generation

	### Model Description
	cudaLLM-8B is a language model for generating high-performance and syntactically correct CUDA kernels. It is based on the Qwen3-8B model and has undergone a two-stage training process to master the complexities of parallel programming for GPUs.

	Performance on KernelBench:
	\| \| Bo1 \| Bo2 \| Bo4 \| Bo8 \| Bo16 \|
	\|---------\|-------\|-----\|-----\|-----\|------\|
	\| Level-1 \| 79.75 \| 83 \| 84 \| 86 \| 87 \|
	\| Level-2 \| 67.30 \| 70 \| 71 \| 72 \| 73 \|
	\| Level-3 \| 20.83 \| 26 \| 30 \| 34 \| 36 \|

	### Training Procedure
	The model was trained using the verl library. The model was trained and evaluated on:
	- SFT Dataset: A high-quality dataset of CUDA problem-solution pairs ([sft_cuda_llm_r1.parquet](https://huggingface.co/datasets/ByteDance-Seed/cudaLLM-data)), originally generated by DeepSeek R1, DeepSeel Coder-7B, and Qwen2-32B.
	- RL Dataset: A refined dataset ([rl_cuda_llm_0424.parquet](https://huggingface.co/datasets/ByteDance-Seed/cudaLLM-data)) used to provide performance-based rewards during the RL stage.
	- Evaluation Dataset: The model's performance was benchmarked against the KernelBench dataset.

	### Intended Use and Limitations
	#### Intended Use
	The primary use of CudaLLM is to assist developers in writing and optimizing high-performance CUDA kernels. It can be used for:
	- Accelerating scientific computing and machine learning workloads.
	- As a co-pilot or productivity tool for HPC and CUDA developers.
	- Research into AI-driven code generation and optimization.

	#### Limitations and Bias
	- Correctness is Not Guaranteed: While trained to produce correct code, the model's output should always be rigorously tested and verified before deployment in production systems.
	- Security Risks: The generated code is not guaranteed to be secure. Never run model-generated code from an untrusted source without careful inspection.
	- Performance Variability: Kernel performance can vary significantly depending on the target GPU architecture, input data sizes, and compiler version. The generated code may require further manual tuning.
	- Specialized Domain: This model is highly specialized for CUDA code generation. Its performance on general-purpose programming tasks or natural language conversation will be limited.