File size: 2,562 Bytes

3cc41e1

---

language: 
- code

license: mit
tags:
- code-generation
- bug-fixing
- code-repair
- codet5
- debugging
datasets:
- custom
metrics:
- accuracy
- exact-match
library_name: transformers
pipeline_tag: text2text-generation
---


# brainbug

## Model Description

This is a fine-tuned **CodeT5** model for automatic bug detection and code repair. The model has been trained to identify and fix various types of programming errors in Python code.

## Supported Error Types

- **WVAV**: Wrong Variable Used in Variable Assignment
- **MLAC**: Missing Line After Call
- **WPFV**: Wrong Parameter in Function/Method Call
- And more...

## Model Details

- **Base Model**: `Salesforce/codet5-base`
- **Fine-tuned on**: Custom bug-fix dataset
- **Task**: Code-to-Code generation (bug fixing)
- **Language**: Python
- **Model Size**: 220M parameters

## Usage

```python

from transformers import T5ForConditionalGeneration, RobertaTokenizer



# Load model and tokenizer

model = T5ForConditionalGeneration.from_pretrained("Sagar123x/brainbug")

tokenizer = RobertaTokenizer.from_pretrained("Sagar123x/brainbug")



# Example: Fix buggy code

faulty_code = """

def check_for_file(self, file_path):

    files = self.connection.glob(file_path)

    return len(files) == 1

"""



# Prepare input

input_text = f"Fix WVAV: {faulty_code}"

inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True)



# Generate fix

outputs = model.generate(**inputs, max_length=256, num_beams=5)

fixed_code = tokenizer.decode(outputs[0], skip_special_tokens=True)



print(fixed_code)

```

## Training Details

- **Training Epochs**: 10
- **Batch Size**: 1 (with gradient accumulation)
- **Learning Rate**: 3e-5
- **Optimizer**: AdamW
- **Hardware**: NVIDIA RTX 4050 (6GB)

## Performance Metrics

- **Exact Match Accuracy**: 2.60%
- **Token-Level Accuracy**: 28.52%
- **Average Similarity**: 76.75%


## Limitations

- Trained primarily on Python code
- Best performance on error types seen during training
- May not handle very long code snippets (>256 tokens)
- Requires error type specification for optimal results

## Citation

```bibtex

@misc{brainbug-codet5,

  author = {Your Name},

  title = {BrainBug: CodeT5 for Automatic Bug Repair},

  year = {2025},

  publisher = {HuggingFace},

  howpublished = {\url{https://huggingface.co/Sagar123x/brainbug}}

}

```

## License

MIT License

## Contact

For questions or issues, please open an issue on the model repository.