|
|
--- |
|
|
base_model: Qwen/Qwen3-4B-Thinking-2507 |
|
|
tags: |
|
|
- ellora |
|
|
- lora |
|
|
- code-execution |
|
|
- execution-tracing |
|
|
- world-model |
|
|
- cwm |
|
|
- grpo |
|
|
- thinking |
|
|
- code-understanding |
|
|
- peft |
|
|
- qwen |
|
|
library_name: peft |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-generation |
|
|
inference: true |
|
|
model_type: qwen3 |
|
|
datasets: |
|
|
- codelion/execution-world-model-dataset |
|
|
--- |
|
|
# codelion/Qwen3-4B-execution-world-model-lora |
|
|
|
|
|
## π Execution-Aware World Model LoRA |
|
|
|
|
|
This LoRA adapter adds **execution awareness** capabilities to Qwen/Qwen3-4B-Thinking-2507. Inspired by Meta's CWM (Code World Model) research, it enables the model to predict and understand program execution step-by-step. |
|
|
|
|
|
## π― Key Features |
|
|
|
|
|
- **Step-by-Step Execution Prediction**: Predicts variable states at each line |
|
|
- **Dynamic World Model**: Understands how code behaves at runtime |
|
|
- **Execution Tracing**: Generates detailed execution traces with variable states |
|
|
- **Debugging Support**: Can identify and explain execution behavior |
|
|
- **GRPO-Trained**: Uses preference learning with real execution feedback |
|
|
|
|
|
## π Performance Metrics |
|
|
|
|
|
- **Base Model**: Qwen/Qwen3-4B-Thinking-2507 |
|
|
- **Training Method**: GRPO (Group Relative Policy Optimization) with Real Execution Traces |
|
|
- **LoRA Rank**: 64 |
|
|
- **LoRA Alpha**: 128 |
|
|
- **Training Samples**: 298 |
|
|
- **Evaluation Samples**: 323 |
|
|
- **Execution Prediction Accuracy**: 20.0% |
|
|
- **Mean State Accuracy**: 33.3% |
|
|
|
|
|
## π§ Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
|
|
|
# Load base model |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"Qwen/Qwen3-4B-Thinking-2507", |
|
|
torch_dtype="auto", |
|
|
device_map="auto", |
|
|
trust_remote_code=True |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Thinking-2507") |
|
|
|
|
|
# Load execution world model LoRA |
|
|
model = PeftModel.from_pretrained(model, "codelion/Qwen3-4B-execution-world-model-lora") |
|
|
|
|
|
# Analyze code execution |
|
|
prompt = """Analyze this code and predict its execution trace: |
|
|
|
|
|
\`\`\`python |
|
|
x = 10 |
|
|
y = x * 2 |
|
|
z = x + y |
|
|
\`\`\` |
|
|
|
|
|
Show variable states at each line.""" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## π Example Output |
|
|
|
|
|
``` |
|
|
<execution_trace> |
|
|
Line 1: State: {x=10} |
|
|
Line 2: State: {x=10, y=20} |
|
|
Line 3: State: {x=10, y=20, z=30} |
|
|
</execution_trace> |
|
|
``` |
|
|
|
|
|
## π§ͺ Training Details |
|
|
|
|
|
- **Method**: GRPO (Group Relative Policy Optimization) |
|
|
- **Data**: Self-generated code with real execution traces |
|
|
- **Epochs**: 3 |
|
|
- **Reward**: Gradual scoring (0.0-1.0) based on execution accuracy |
|
|
|
|
|
## π Dataset |
|
|
|
|
|
[codelion/execution-world-model-dataset](https://huggingface.co/datasets/codelion/execution-world-model-dataset) |
|
|
|
|
|
- Python code (3-20 lines) |
|
|
- Real execution traces via `sys.settrace()` |
|
|
- Ground truth variable states |
|
|
|
|
|
## π·οΈ Related |
|
|
|
|
|
- **Dataset**: [codelion/execution-world-model-dataset](https://huggingface.co/datasets/codelion/execution-world-model-dataset) |
|
|
- **Base Model**: [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) |
|
|
- **Project**: [Ellora Recipes](https://github.com/codelion/ellora) |
|
|
|
|
|
--- |
|
|
|
|
|
*Part of the [Ellora project](https://github.com/codelion/ellora) - standardized recipes for enhancing LLM capabilities.* |
|
|
|