--- tags: - code - python - code-generation - bug-injection - education license: mit --- # Squash Code Corruptor Model T5-based model for generating realistic Python code bugs for educational purposes. ## Model Description This model is trained to introduce realistic bugs into Python code, including: - Logic errors (operator swaps, off-by-one errors, wrong variables) - Syntax errors (missing colons, indentation issues) Trained on 1500 examples: - 1000 syntax error pairs - 500 logic error pairs (7 different categories) ## Usage ```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer model = AutoModelForSeq2SeqLM.from_pretrained("onegaiosu/squash-code-corruptor") tokenizer = AutoTokenizer.from_pretrained("onegaiosu/squash-code-corruptor") # Corrupt code code = "def add(a, b):\n return a + b" inputs = tokenizer(code, return_tensors="pt", max_length=512, truncation=True) outputs = model.generate(**inputs, max_length=512, temperature=0.8) corrupted = tokenizer.decode(outputs[0], skip_special_tokens=True) ``` ## Training Data Custom dataset of Python code pairs (correct → buggy) focusing on common programming mistakes for beginner and intermediate learners. ## Intended Use Educational tool for the Squash app - helping students learn Python by fixing intentionally buggy code. ## Limitations - Trained specifically on Python code - May not work well with very long or complex code snippets - Best for code snippets under 50 lines ## Citation ``` @misc{squash-code-corruptor, author = {Mao Abel}, title = {Squash Code Corruptor}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/onegaiosu/squash-code-corruptor}} } ```