|
|
---
|
|
|
language:
|
|
|
- code
|
|
|
|
|
|
license: mit
|
|
|
tags:
|
|
|
- code-generation
|
|
|
- bug-fixing
|
|
|
- code-repair
|
|
|
- codet5
|
|
|
- debugging
|
|
|
datasets:
|
|
|
- custom
|
|
|
metrics:
|
|
|
- accuracy
|
|
|
- exact-match
|
|
|
library_name: transformers
|
|
|
pipeline_tag: text2text-generation
|
|
|
---
|
|
|
|
|
|
# brainbug
|
|
|
|
|
|
## Model Description
|
|
|
|
|
|
This is a fine-tuned **CodeT5** model for automatic bug detection and code repair. The model has been trained to identify and fix various types of programming errors in Python code.
|
|
|
|
|
|
## Supported Error Types
|
|
|
|
|
|
- **WVAV**: Wrong Variable Used in Variable Assignment
|
|
|
- **MLAC**: Missing Line After Call
|
|
|
- **WPFV**: Wrong Parameter in Function/Method Call
|
|
|
- And more...
|
|
|
|
|
|
## Model Details
|
|
|
|
|
|
- **Base Model**: `Salesforce/codet5-base`
|
|
|
- **Fine-tuned on**: Custom bug-fix dataset
|
|
|
- **Task**: Code-to-Code generation (bug fixing)
|
|
|
- **Language**: Python
|
|
|
- **Model Size**: 220M parameters
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
```python
|
|
|
from transformers import T5ForConditionalGeneration, RobertaTokenizer
|
|
|
|
|
|
# Load model and tokenizer
|
|
|
model = T5ForConditionalGeneration.from_pretrained("Sagar123x/brainbug")
|
|
|
tokenizer = RobertaTokenizer.from_pretrained("Sagar123x/brainbug")
|
|
|
|
|
|
# Example: Fix buggy code
|
|
|
faulty_code = """
|
|
|
def check_for_file(self, file_path):
|
|
|
files = self.connection.glob(file_path)
|
|
|
return len(files) == 1
|
|
|
"""
|
|
|
|
|
|
# Prepare input
|
|
|
input_text = f"Fix WVAV: {faulty_code}"
|
|
|
inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True)
|
|
|
|
|
|
# Generate fix
|
|
|
outputs = model.generate(**inputs, max_length=256, num_beams=5)
|
|
|
fixed_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
|
|
|
|
print(fixed_code)
|
|
|
```
|
|
|
|
|
|
## Training Details
|
|
|
|
|
|
- **Training Epochs**: 10
|
|
|
- **Batch Size**: 1 (with gradient accumulation)
|
|
|
- **Learning Rate**: 3e-5
|
|
|
- **Optimizer**: AdamW
|
|
|
- **Hardware**: NVIDIA RTX 4050 (6GB)
|
|
|
|
|
|
## Performance Metrics
|
|
|
|
|
|
- **Exact Match Accuracy**: 2.60%
|
|
|
- **Token-Level Accuracy**: 28.52%
|
|
|
- **Average Similarity**: 76.75%
|
|
|
|
|
|
|
|
|
## Limitations
|
|
|
|
|
|
- Trained primarily on Python code
|
|
|
- Best performance on error types seen during training
|
|
|
- May not handle very long code snippets (>256 tokens)
|
|
|
- Requires error type specification for optimal results
|
|
|
|
|
|
## Citation
|
|
|
|
|
|
```bibtex
|
|
|
@misc{brainbug-codet5,
|
|
|
author = {Your Name},
|
|
|
title = {BrainBug: CodeT5 for Automatic Bug Repair},
|
|
|
year = {2025},
|
|
|
publisher = {HuggingFace},
|
|
|
howpublished = {\url{https://huggingface.co/Sagar123x/brainbug}}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## License
|
|
|
|
|
|
MIT License
|
|
|
|
|
|
## Contact
|
|
|
|
|
|
For questions or issues, please open an issue on the model repository.
|
|
|
|