RTL-PPA: RTL Power, Performance, Area Prediction via Chain-of-Thought Fine-Tuning

Model Overview

This model predicts Power, Performance (delay), and Area (PPA) of RTL designs synthesized for Skywater 130nm technology. Given a Verilog module, it performs step-by-step chain-of-thought reasoning about gate-level synthesis and outputs structured PPA estimates with [area], [delay], and [static_power] tags directly usable as reinforcement learning reward signals.

Model Details

  • Developed by: Zhu Wenlong
  • Model type: Qwen3-8B fine-tuned with LoRA (rank=256, alpha=512) and with RL
  • License: MIT
  • Finetuned from: Qwen/Qwen3-8B

Uses

Direct Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "/path/to/merged/model"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)

SYSTEM_PROMPT = (
    "Your task is to estimate area, delay, and static power for RTL designs in Skywater 130nm technology node.\n"
    "For the given RTL design, reason about the number and type of gates that would be present after synthesis, "
    "then output all four tags:\n"
    "<synth> ... </synth>\n"
    "<area> ... [area]value[/area] </area>\n"
    "<delay> ... [delay]value[/delay] </delay>\n"
    "<static_power> ... [static_power]value[/static_power] </static_power>"
)

rtl_code = "module top_module (...); ..."
messages = [{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": rtl_code}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=16384, do_sample=False)
result = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
# Extract: [area]VALUE[/area], [delay]VALUE[/delay], [static_power]VALUE[/static_power]

Out-of-Scope Use

  • Not suitable for non-Verilog hardware description languages
  • Not suitable for RTL targeting technology nodes other than Skywater 130nm

Training Details

Training Data

  • Base dataset: scale-lab/MetRex (MetRex benchmark)
  • Augmentation: Semantic-preserving RTL transformations (signal renaming, constant base conversion, declaration shuffling, whitespace randomization, begin/end insertion, module renaming)
  • Format: Alpaca format with verbose chain-of-thought output tags

Training Procedure

Stage 1: Base SFT

Fine-tuned on 23,816 samples (small circuits, <1000 gates) to establish fundamental CoT reasoning capability.

SFT Training Configuration:

Parameter Value
Base Model Qwen3-8B
LoRA Rank 256
LoRA Alpha 512
LoRA Target all
Learning Rate 5e-5
Batch Size 64 (8 GPU × 1 × 8 accum)
Cutoff Length 8192
Epochs 3
Precision BF16 + DeepSpeed ZeRO-3

Stage 2: Reinforcement Learning (GRPO)

Refined via Group Relative Policy Optimization (GRPO) using the verl framework. The reward signal is computed from MAPE on [area], [delay], and [static_power] tags extracted from model outputs, enabling iterative improvement on hard circuits.

RL Training Configuration:

Parameter Value
Algorithm GRPO (Group Relative Policy Optimization)
Framework verl
Train Batch Size 256
Max Prompt Length 3072
Max Response Length 4096
Actor Learning Rate 1e-6
PPO Mini Batch Size 32
PPO Micro Batch Size per GPU 2
Rollout Samples (n) 4
KL Loss Coefficient 0.0 (disabled)
Entropy Coefficient 0
Gradient Checkpointing Enabled
Precision BF16
Rollout Engine vLLM
Total Epochs 5

RL Launch Command:

HF_HUB_OFFLINE=1 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=/path/to/train.parquet \
    data.val_files=/path/to/val.parquet \
    data.train_batch_size=256 \
    data.max_prompt_length=3072 \
    data.max_response_length=4096 \
    actor_rollout_ref.model.path=/path/to/metrex_merged_full \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.ppo_mini_batch_size=32 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 \
    actor_rollout_ref.actor.use_kl_loss=False \
    actor_rollout_ref.actor.kl_loss_coef=0.0 \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.rollout.n=4 \
    actor_rollout_ref.rollout.temperature=1.0 \
    actor_rollout_ref.rollout.top_p=1.0 \
    actor_rollout_ref.rollout.val_kwargs.temperature=1.0 \
    actor_rollout_ref.rollout.val_kwargs.top_p=0.7 \
    actor_rollout_ref.rollout.val_kwargs.do_sample=True \
    actor_rollout_ref.rollout.val_kwargs.n=1 \
    reward_model.reward_manager=metrex \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.project_name=MetRex-RL \
    trainer.experiment_name=GRPO-Qwen3-8B-MetRex \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.val_before_train=True \
    trainer.test_freq=5 \
    trainer.save_freq=10 \
    trainer.total_epochs=5

Technical Specifications

Model Architecture

  • Architecture: Qwen3ForCausalLM with LoRA adapters (rank=256, target=all modules)
  • RL model: qwen3-8b with GRPO ,reward=[format]*[result]
  • Reward calculation:
    • Format reward (0-1): 0.25 per correctly formatted tag out of 4 tags (<synth>, <area>, <delay>, <static_power>)
    • Result reward (0-3): MAPE for each of the 3 metrics (area, delay, static_power), each capped at 1.0
    • Total reward = format_score × result_score, range [0, 4]

Compute Infrastructure

  • SFT Training: 8× NVIDIA GPU, DeepSpeed ZeRO-3, BF16 precision
  • RL Training: 8× NVIDIA GPU, FSDP with gradient/optimizer offloading, BF16
  • Inference: Single GPU sufficient (16GB VRAM for merged model)

## References

- [MetRex: A Benchmark for RTL Code Generation with LLMs](https://github.com/scale-lab/MetRex/tree/main) — Chain-of-thought PPA prediction baseline
- [ChipGPT: How Far Are We From Natural Language Hardware Design](https://arxiv.org/abs/2305.14019) — LLM-assisted hardware design framework
- [Data is All You Need: Finetuning LLMs for Chip Design via Automated Design-Data Augmentation](https://arxiv.org/abs/2403.11202) — Automated RTL data augmentation framework
- [verl: Versatile RL Framework](https://github.com/verl-project/verl) — GRPO training framework
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stupid-zwl/rtlmul

Finetuned
Qwen/Qwen3-8B
Finetuned
(1366)
this model

Datasets used to train stupid-zwl/rtlmul

Papers for stupid-zwl/rtlmul