CodeV-SVA-8B / README.md

wyt2000

Update README.md

56b9fd6 verified 14 days ago

8.79 kB

	---
	language:
	- en
	base_model: Qwen/Qwen3-8B
	tags:
	- chat
	library_name: transformers
	license: apache-2.0
	---

	# CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis

	<div align="center">
	<a href="https://huggingface.co/wyt2000/CodeV-SVA-14B"><img src="https://img.shields.io/static/v1?label=Model&message=HuggingFace&color=yellow"></a> &ensp;
	<a href="https://github.com/wyt2000/CodeV-SVA"><img src="https://img.shields.io/static/v1?label=Code&message=Github&color=blue"></a> &ensp;
	</div>

	## Introduction

	We introduce CodeV-SVA, a family of large language models designed to translate natural-language verification properties into SystemVerilog Assertions (SVAs).

	Open-Source Plan:

	- Model ✓
	- Evaluation code ✓
	- Paper
	- Dataset
	- Data synthesis and training code

	## Evaluation Results

	\| Model \| \| NL2SVA-Human \| \| \| \| NL2SVA-Machine \| \|
	\| :-------------------- \| :---------: \| :----------: \| :---------: \| ---- \| :---------: \| :------------: \| :---------: \|
	\| \| Func.@1 \| Func.@16 \| Func.@32 \| \| Func.@1 \| Func.@16 \| Func.@32 \|
	\| \| \| \| \| \| \| \| \|
	\| DeepSeek-R1-671B \| <u>74.6</u> \| 90.3 \| <u>90.4</u> \| \| 81.0 \| 93.3 \| 94.3 \|
	\| GPT-5 \| 71.8 \| <u>90.2</u> \| 92.7 \| \| 81.8 \| 93.2 \| 94.3 \|
	\| DeepSeek-V3.1-671B \| 63.1 \| 81.4 \| 84.9 \| \| <u>83.8</u> \| 92.9 \| 93.6 \|
	\| GPT-4o \| 64.1 \| 75.2 \| 78.1 \| \| 68.5 \| 81.3 \| 83.7 \|
	\| \| \| \| \| \| \| \| \|
	\| RTLCoder-DS-v1.1-6.7B \| 25.9 \| 58.8 \| 65.8 \| \| 21.7 \| 54.8 \| 60.8 \|
	\| CodeV-R1-Qwen-7B \| 25.2 \| 55.8 \| 61.6 \| \| 37.4 \| 76.6 \| 83.0 \|
	\| \| \| \| \| \| \| \| \|
	\| Qwen3-8B \| 32.3 \| 71.6 \| 74.0 \| \| 46.1 \| 88.0 \| 90.5 \|
	\| Qwen3-14B \| 61.6 \| 86.1 \| 87.7 \| \| 75.3 \| 92.7 \| 94.3 \|
	\| \| \| \| \| \| \| \| \|
	\| SVACoder-no-think-8B \| 65.8 \| 84.4 \| 86.3 \| \| 78.7 \| 90.9 \| 91.9 \|
	\| SVACoder-8B \| 72.0 \| 88.8 \| <u>90.4</u> \| \| 83.5 \| 96.3 \| 97.2 \|
	\| SVACoder-14B \| 75.8 \| 89.4 \| <u>90.4</u> \| \| 84.0 \| <u>94.9</u> \| <u>95.8</u> \|

	See our GitHub [repo](https://github.com/wyt2000/CodeV-SVA/?tab=readme-ov-file#evaluation) for reproduction.

	## Models

	\| Model \| Download \|
	\| -------- \| -------- \|
	\| CodeV-SVA-8B \| [🤗HuggingFace](https://huggingface.co/wyt2000/CodeV-SVA-8B) \|
	\| CodeV-SVA-no-think-8B \| [🤗HuggingFace](https://huggingface.co/wyt2000/CodeV-SVA-no-think-8B) \|
	\| CodeV-SVA-14B \| [🤗HuggingFace](https://huggingface.co/wyt2000/CodeV-SVA-14B) \|

	## Usage

	````python
	from vllm import LLM, SamplingParams
	from transformers import AutoTokenizer

	MODEL_DIR = "path/to/CodeV-SVA/"

	def get_nl2sva_prompt(testbench: str, problem: str, reset_signal=None) -> str:
	prompt = ""
	prompt += "Here is the testbench to perform your translation:\n"
	prompt += testbench
	prompt += "\nQuestion: Create a SVA assertion that checks: "
	prompt += problem + "\n"

	if reset_signal is not None:
	prompt += f"You should use `{reset_signal}` as the disable condition signal. "
	example = f'''asrt: assert property (@(posedge clk) disable iff ({reset_signal})
	(a && b) != 1'b1
	);'''
	else:
	example = '''asrt: assert property (@(posedge clk)
	(a && b) != 1'b1
	);'''
	prompt += f"""Do not add code to output an error message string.
	Enclose your SVA code with ```systemverilog and ```. Only output the code snippet and do NOT output anything else.
	For example,
	```systemverilog
	{example}
	```
	Answer:"""
	return prompt

	if __name__ == "__main__":
	system_prompt = "You are an AI assistant tasked with formal verification of register transfer level (RTL) designs.\nYour job is to translate a description of an assertion to concrete SystemVerilog Assertion (SVA) implementation.\n"
	testbench = "module fifo_1r1w_tb (\n clk,\n reset_,\n wr_vld,\n wr_data,\n wr_ready,\n rd_vld,\n rd_data,\n rd_ready\n );\n \n parameter FIFO_DEPTH = 4;\n parameter DATA_WIDTH = 1;\n\nlocalparam FIFO_DEPTH_log2 = $clog2(FIFO_DEPTH); \nlocalparam DATA_WIDTH_log2 = $clog2(DATA_WIDTH); \n\n input clk;\n input reset_;\n input wr_vld;\n input [DATA_WIDTH-1:0] wr_data;\n input wr_ready;\n input rd_vld;\n input [DATA_WIDTH-1:0] rd_data;\n input rd_ready;\n\nwire wr_push;\nwire rd_pop;\n\nwire tb_reset;\nassign tb_reset = (reset_ == 1'b0);\n\nwire fifo_full;\nassign wr_push = wr_vld && wr_ready;\nassign rd_pop = rd_vld && rd_ready; \n\nreg [DATA_WIDTH-1:0] fifo_array [FIFO_DEPTH-1:0]; //fifo array - shift register\nreg [FIFO_DEPTH_log2-1:0] fifo_rd_ptr; //fifo array - rd_ptr\nwire actual_fifo_pop; // actual pop == pop\nreg fifo_empty; // fifo empty\nwire [DATA_WIDTH-1:0] fifo_out_data; // dout\n\n// ---- shift register code start ----\nalways @(posedge clk) begin\n if (!reset_) fifo_array[0] <= 'd0;\n else if (wr_push) begin\n fifo_array[0] <= wr_data;\n end else fifo_array[0] <= fifo_array[0];\nend\nfor (genvar i = 1; i < FIFO_DEPTH; i++ ) begin : loop_id\n always @(posedge clk) begin\n if (!reset_) fifo_array[i] <= 'd0;\n else if (wr_push) begin\n fifo_array[i] <= fifo_array[i-1];\n end else fifo_array[i] <= fifo_array[i];\n end\nend\n\n// ---- read pointer/fifo empty code start ----\nalways @(posedge clk) begin\n if (!reset_) begin\n fifo_rd_ptr <= 'd0;\n end else if (wr_push && fifo_empty) begin\n fifo_rd_ptr <= 'd0;\n end else if (rd_pop && !fifo_empty && (fifo_rd_ptr == 'd0)) begin\n fifo_rd_ptr <= 'd0;\n end else begin\n fifo_rd_ptr <= fifo_rd_ptr + wr_push - rd_pop;\n end\n if (!reset_) begin\n fifo_empty <= 'd1;\n end else if (rd_pop && !fifo_empty && (fifo_rd_ptr == 'd0) && !wr_push) begin\n fifo_empty <= 'd1;\n end else if ((fifo_rd_ptr != 'd0) \|\| wr_push && !rd_pop) begin\n fifo_empty <= 'd0;\n end\nend\n// ---- fifo full and dout code start ----\nassign fifo_full = (fifo_rd_ptr == (FIFO_DEPTH - 1)) && !fifo_empty;\nassign fifo_out_data = fifo_array[fifo_rd_ptr];\n\nendmodule"
	problem = "When there is a write push to the FIFO, data is eventually popped. Use the signals 'rd_pop' and 'wr_push'."
	reset_signal = "tb_reset"
	user_prompt = get_nl2sva_prompt(testbench, problem, reset_signal)

	dialog = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": user_prompt}
	]

	tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
	prompt = tokenizer.apply_chat_template(dialog, tokenize=False, add_generation_prompt=True)
	print(f"### Prompt:\n{prompt}")

	model = LLM(
	MODEL_DIR,
	tensor_parallel_size=4
	)

	sampling_params = SamplingParams(
	temperature=0.8,
	top_p=0.95,
	max_tokens=32768,
	n=1
	)

	responses = model.generate(prompt, sampling_params)
	formal_statements = []
	for i, output in enumerate(responses[0].outputs):
	response_text = output.text
	print(f"### Response:\n{response_text}")
	````

	## Citation

	```latex
	@misc{CodeV-SVA,
	title={CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis},
	author={Yutong Wu and Chenrui Cao and Pengwei Jin and Di Huang and Rui Zhang and Xishan Zhang and Zidong Du and Qi Guo and Xing Hu},
	year={2025},
	howpublished={\url{https://huggingface.co/wyt2000/CodeV-SVA-14B}},
	}
	```