File size: 1,717 Bytes
be68d67 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
tags:
- code
- python
- code-generation
- bug-injection
- education
license: mit
---
# Squash Code Corruptor Model
T5-based model for generating realistic Python code bugs for educational purposes.
## Model Description
This model is trained to introduce realistic bugs into Python code, including:
- Logic errors (operator swaps, off-by-one errors, wrong variables)
- Syntax errors (missing colons, indentation issues)
Trained on 1500 examples:
- 1000 syntax error pairs
- 500 logic error pairs (7 different categories)
## Usage
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("onegaiosu/squash-code-corruptor")
tokenizer = AutoTokenizer.from_pretrained("onegaiosu/squash-code-corruptor")
# Corrupt code
code = "def add(a, b):\n return a + b"
inputs = tokenizer(code, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=512, temperature=0.8)
corrupted = tokenizer.decode(outputs[0], skip_special_tokens=True)
```
## Training Data
Custom dataset of Python code pairs (correct → buggy) focusing on common programming mistakes
for beginner and intermediate learners.
## Intended Use
Educational tool for the Squash app - helping students learn Python by fixing intentionally buggy code.
## Limitations
- Trained specifically on Python code
- May not work well with very long or complex code snippets
- Best for code snippets under 50 lines
## Citation
```
@misc{squash-code-corruptor,
author = {Mao Abel},
title = {Squash Code Corruptor},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/onegaiosu/squash-code-corruptor}}
}
```
|