--- base_model: Qwen/Qwen3-4B-Thinking-2507 tags: - ellora - lora - code-execution - execution-tracing - world-model - cwm - grpo - thinking - code-understanding - peft - qwen library_name: peft license: apache-2.0 pipeline_tag: text-generation inference: true model_type: qwen3 datasets: - codelion/execution-world-model-dataset --- # codelion/Qwen3-4B-execution-world-model-lora ## ๐ŸŒ Execution-Aware World Model LoRA This LoRA adapter adds **execution awareness** capabilities to Qwen/Qwen3-4B-Thinking-2507. Inspired by Meta's CWM (Code World Model) research, it enables the model to predict and understand program execution step-by-step. ## ๐ŸŽฏ Key Features - **Step-by-Step Execution Prediction**: Predicts variable states at each line - **Dynamic World Model**: Understands how code behaves at runtime - **Execution Tracing**: Generates detailed execution traces with variable states - **Debugging Support**: Can identify and explain execution behavior - **GRPO-Trained**: Uses preference learning with real execution feedback ## ๐Ÿ“Š Performance Metrics - **Base Model**: Qwen/Qwen3-4B-Thinking-2507 - **Training Method**: GRPO (Group Relative Policy Optimization) with Real Execution Traces - **LoRA Rank**: 64 - **LoRA Alpha**: 128 - **Training Samples**: 298 - **Evaluation Samples**: 323 - **Execution Prediction Accuracy**: 20.0% - **Mean State Accuracy**: 33.3% ## ๐Ÿ”ง Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen3-4B-Thinking-2507", torch_dtype="auto", device_map="auto", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Thinking-2507") # Load execution world model LoRA model = PeftModel.from_pretrained(model, "codelion/Qwen3-4B-execution-world-model-lora") # Analyze code execution prompt = """Analyze this code and predict its execution trace: \`\`\`python x = 10 y = x * 2 z = x + y \`\`\` Show variable states at each line.""" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## ๐Ÿ“ˆ Example Output ``` Line 1: State: {x=10} Line 2: State: {x=10, y=20} Line 3: State: {x=10, y=20, z=30} ``` ## ๐Ÿงช Training Details - **Method**: GRPO (Group Relative Policy Optimization) - **Data**: Self-generated code with real execution traces - **Epochs**: 3 - **Reward**: Gradual scoring (0.0-1.0) based on execution accuracy ## ๐Ÿ“š Dataset [codelion/execution-world-model-dataset](https://huggingface.co/datasets/codelion/execution-world-model-dataset) - Python code (3-20 lines) - Real execution traces via `sys.settrace()` - Ground truth variable states ## ๐Ÿท๏ธ Related - **Dataset**: [codelion/execution-world-model-dataset](https://huggingface.co/datasets/codelion/execution-world-model-dataset) - **Base Model**: [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507) - **Project**: [Ellora Recipes](https://github.com/codelion/ellora) --- *Part of the [Ellora project](https://github.com/codelion/ellora) - standardized recipes for enhancing LLM capabilities.*