File size: 2,562 Bytes
3cc41e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---

language: 
- code

license: mit
tags:
- code-generation
- bug-fixing
- code-repair
- codet5
- debugging
datasets:
- custom
metrics:
- accuracy
- exact-match
library_name: transformers
pipeline_tag: text2text-generation
---


# brainbug

## Model Description

This is a fine-tuned **CodeT5** model for automatic bug detection and code repair. The model has been trained to identify and fix various types of programming errors in Python code.

## Supported Error Types

- **WVAV**: Wrong Variable Used in Variable Assignment
- **MLAC**: Missing Line After Call
- **WPFV**: Wrong Parameter in Function/Method Call
- And more...

## Model Details

- **Base Model**: `Salesforce/codet5-base`
- **Fine-tuned on**: Custom bug-fix dataset
- **Task**: Code-to-Code generation (bug fixing)
- **Language**: Python
- **Model Size**: 220M parameters

## Usage

```python

from transformers import T5ForConditionalGeneration, RobertaTokenizer



# Load model and tokenizer

model = T5ForConditionalGeneration.from_pretrained("Sagar123x/brainbug")

tokenizer = RobertaTokenizer.from_pretrained("Sagar123x/brainbug")



# Example: Fix buggy code

faulty_code = """

def check_for_file(self, file_path):

    files = self.connection.glob(file_path)

    return len(files) == 1

"""



# Prepare input

input_text = f"Fix WVAV: {faulty_code}"

inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True)



# Generate fix

outputs = model.generate(**inputs, max_length=256, num_beams=5)

fixed_code = tokenizer.decode(outputs[0], skip_special_tokens=True)



print(fixed_code)

```

## Training Details

- **Training Epochs**: 10
- **Batch Size**: 1 (with gradient accumulation)
- **Learning Rate**: 3e-5
- **Optimizer**: AdamW
- **Hardware**: NVIDIA RTX 4050 (6GB)

## Performance Metrics

- **Exact Match Accuracy**: 2.60%
- **Token-Level Accuracy**: 28.52%
- **Average Similarity**: 76.75%


## Limitations

- Trained primarily on Python code
- Best performance on error types seen during training
- May not handle very long code snippets (>256 tokens)
- Requires error type specification for optimal results

## Citation

```bibtex

@misc{brainbug-codet5,

  author = {Your Name},

  title = {BrainBug: CodeT5 for Automatic Bug Repair},

  year = {2025},

  publisher = {HuggingFace},

  howpublished = {\url{https://huggingface.co/Sagar123x/brainbug}}

}

```

## License

MIT License

## Contact

For questions or issues, please open an issue on the model repository.