Ellora Recipe #7: Self-Verifying Code Generation LoRA

This adapter implements the DeepSeekMath-V2 self-verification approach for code generation.

Overview

The model has been trained with three internalized skills:

  1. Verifier: Predict whether code passes tests
  2. Meta-Verifier: Assess reliability of verification predictions
  3. Self-Verification Generation: Generate code with built-in verification in <think> tags

Training Curriculum

  • Phase 1: Verifier SFT (20 samples)
  • Phase 2: Meta-Verifier SFT (10 samples)
  • Phase 3: Self-Verification Generation SFT (15 samples)
  • Phase 4: GRPO with test execution (5 samples)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
model = PeftModel.from_pretrained(base_model, "codelion/Qwen3-4B-Instruct-2507-self-verify-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")

prompt = """<start_of_turn>user
<problem>
Given a number n, return the sum of all numbers from 1 to n.
</problem>

Solve this problem. Show your reasoning in <think> tags, verify your solution, then provide the final code.<end_of_turn>
<start_of_turn>model
"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))

Configuration

  • LoRA rank: 64
  • LoRA alpha: 128
  • Target modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
  • Max sequence length: 2048

Key Design Decisions

  • No truncation: Only trained on complete examples that fit context window
  • Streaming dataset: No disk storage needed for CodeContests
  • Single LoRA: All skills in one adapter with multi-task training
  • Test execution reward: GRPO uses actual test results as signal

Ellora Recipe #7: Self-Verifying Code Generation LoRA

🤖 Generated with Ellora

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codelion/Qwen3-4B-Instruct-2507-self-verify-lora

Adapter
(113)
this model

Dataset used to train codelion/Qwen3-4B-Instruct-2507-self-verify-lora