Ellora Recipe #7: Self-Verifying Code Generation LoRA
This adapter implements the DeepSeekMath-V2 self-verification approach for code generation.
Overview
The model has been trained with three internalized skills:
- Verifier: Predict whether code passes tests
- Meta-Verifier: Assess reliability of verification predictions
- Self-Verification Generation: Generate code with built-in verification in
<think>tags
Training Curriculum
- Phase 1: Verifier SFT (20 samples)
- Phase 2: Meta-Verifier SFT (10 samples)
- Phase 3: Self-Verification Generation SFT (15 samples)
- Phase 4: GRPO with test execution (5 samples)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
model = PeftModel.from_pretrained(base_model, "codelion/Qwen3-4B-Instruct-2507-self-verify-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
prompt = """<start_of_turn>user
<problem>
Given a number n, return the sum of all numbers from 1 to n.
</problem>
Solve this problem. Show your reasoning in <think> tags, verify your solution, then provide the final code.<end_of_turn>
<start_of_turn>model
"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))
Configuration
- LoRA rank: 64
- LoRA alpha: 128
- Target modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
- Max sequence length: 2048
Key Design Decisions
- No truncation: Only trained on complete examples that fit context window
- Streaming dataset: No disk storage needed for CodeContests
- Single LoRA: All skills in one adapter with multi-task training
- Test execution reward: GRPO uses actual test results as signal
Ellora Recipe #7: Self-Verifying Code Generation LoRA
🤖 Generated with Ellora
- Downloads last month
- 21
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for codelion/Qwen3-4B-Instruct-2507-self-verify-lora
Base model
Qwen/Qwen3-4B-Instruct-2507