brainbug / README.md
Sagar123x's picture
Upload trained BrainBug CodeT5 model
3cc41e1 verified
---
language:
- code
license: mit
tags:
- code-generation
- bug-fixing
- code-repair
- codet5
- debugging
datasets:
- custom
metrics:
- accuracy
- exact-match
library_name: transformers
pipeline_tag: text2text-generation
---
# brainbug
## Model Description
This is a fine-tuned **CodeT5** model for automatic bug detection and code repair. The model has been trained to identify and fix various types of programming errors in Python code.
## Supported Error Types
- **WVAV**: Wrong Variable Used in Variable Assignment
- **MLAC**: Missing Line After Call
- **WPFV**: Wrong Parameter in Function/Method Call
- And more...
## Model Details
- **Base Model**: `Salesforce/codet5-base`
- **Fine-tuned on**: Custom bug-fix dataset
- **Task**: Code-to-Code generation (bug fixing)
- **Language**: Python
- **Model Size**: 220M parameters
## Usage
```python
from transformers import T5ForConditionalGeneration, RobertaTokenizer
# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("Sagar123x/brainbug")
tokenizer = RobertaTokenizer.from_pretrained("Sagar123x/brainbug")
# Example: Fix buggy code
faulty_code = """
def check_for_file(self, file_path):
files = self.connection.glob(file_path)
return len(files) == 1
"""
# Prepare input
input_text = f"Fix WVAV: {faulty_code}"
inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True)
# Generate fix
outputs = model.generate(**inputs, max_length=256, num_beams=5)
fixed_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(fixed_code)
```
## Training Details
- **Training Epochs**: 10
- **Batch Size**: 1 (with gradient accumulation)
- **Learning Rate**: 3e-5
- **Optimizer**: AdamW
- **Hardware**: NVIDIA RTX 4050 (6GB)
## Performance Metrics
- **Exact Match Accuracy**: 2.60%
- **Token-Level Accuracy**: 28.52%
- **Average Similarity**: 76.75%
## Limitations
- Trained primarily on Python code
- Best performance on error types seen during training
- May not handle very long code snippets (>256 tokens)
- Requires error type specification for optimal results
## Citation
```bibtex
@misc{brainbug-codet5,
author = {Your Name},
title = {BrainBug: CodeT5 for Automatic Bug Repair},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Sagar123x/brainbug}}
}
```
## License
MIT License
## Contact
For questions or issues, please open an issue on the model repository.