Ellora Recipe #7: Self-Verifying Code Generation LoRA

This adapter implements the DeepSeekMath-V2 self-verification approach for code generation.

Overview

The model has been trained with three internalized skills:

Verifier: Predict whether code passes tests
Meta-Verifier: Assess reliability of verification predictions
Self-Verification Generation: Generate code with built-in verification in <think> tags

Training Curriculum

Phase 1: Verifier SFT (20 samples)
Phase 2: Meta-Verifier SFT (10 samples)
Phase 3: Self-Verification Generation SFT (15 samples)
Phase 4: GRPO with test execution (5 samples)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
model = PeftModel.from_pretrained(base_model, "codelion/Qwen3-4B-Instruct-2507-self-verify-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")

prompt = """<start_of_turn>user
<problem>
Given a number n, return the sum of all numbers from 1 to n.
</problem>

Solve this problem. Show your reasoning in <think> tags, verify your solution, then provide the final code.<end_of_turn>
<start_of_turn>model
"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))

Configuration

LoRA rank: 64
LoRA alpha: 128
Target modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
Max sequence length: 2048

Key Design Decisions

No truncation: Only trained on complete examples that fit context window
Streaming dataset: No disk storage needed for CodeContests
Single LoRA: All skills in one adapter with multi-task training
Test execution reward: GRPO uses actual test results as signal

Ellora Recipe #7: Self-Verifying Code Generation LoRA

🤖 Generated with Ellora

Downloads last month: 21

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codelion/Qwen3-4B-Instruct-2507-self-verify-lora

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(113)

this model

codelion
/

Qwen3-4B-Instruct-2507-self-verify-lora