|
|
--- |
|
|
tags: |
|
|
- code |
|
|
- python |
|
|
- code-generation |
|
|
- bug-injection |
|
|
- education |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# Squash Code Corruptor Model |
|
|
|
|
|
T5-based model for generating realistic Python code bugs for educational purposes. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is trained to introduce realistic bugs into Python code, including: |
|
|
- Logic errors (operator swaps, off-by-one errors, wrong variables) |
|
|
- Syntax errors (missing colons, indentation issues) |
|
|
|
|
|
Trained on 1500 examples: |
|
|
- 1000 syntax error pairs |
|
|
- 500 logic error pairs (7 different categories) |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForSeq2SeqLM.from_pretrained("onegaiosu/squash-code-corruptor") |
|
|
tokenizer = AutoTokenizer.from_pretrained("onegaiosu/squash-code-corruptor") |
|
|
|
|
|
# Corrupt code |
|
|
code = "def add(a, b):\n return a + b" |
|
|
inputs = tokenizer(code, return_tensors="pt", max_length=512, truncation=True) |
|
|
outputs = model.generate(**inputs, max_length=512, temperature=0.8) |
|
|
corrupted = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
Custom dataset of Python code pairs (correct → buggy) focusing on common programming mistakes |
|
|
for beginner and intermediate learners. |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
Educational tool for the Squash app - helping students learn Python by fixing intentionally buggy code. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained specifically on Python code |
|
|
- May not work well with very long or complex code snippets |
|
|
- Best for code snippets under 50 lines |
|
|
|
|
|
## Citation |
|
|
|
|
|
``` |
|
|
@misc{squash-code-corruptor, |
|
|
author = {Mao Abel}, |
|
|
title = {Squash Code Corruptor}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/onegaiosu/squash-code-corruptor}} |
|
|
} |
|
|
``` |
|
|
|