Sagar123x
/

brainbug

Text Generation

text2text-generation

code-generation

text-generation-inference

Model card Files Files and versions

brainbug / README.md

Sagar123x's picture

Upload trained BrainBug CodeT5 model

3cc41e1 verified 2 months ago

|

history blame contribute delete

2.56 kB

	---
	language:
	- code

	license: mit
	tags:
	- code-generation
	- bug-fixing
	- code-repair
	- codet5
	- debugging
	datasets:
	- custom
	metrics:
	- accuracy
	- exact-match
	library_name: transformers
	pipeline_tag: text2text-generation
	---

	# brainbug

	## Model Description

	This is a fine-tuned CodeT5 model for automatic bug detection and code repair. The model has been trained to identify and fix various types of programming errors in Python code.

	## Supported Error Types

	- WVAV: Wrong Variable Used in Variable Assignment
	- MLAC: Missing Line After Call
	- WPFV: Wrong Parameter in Function/Method Call
	- And more...

	## Model Details

	- Base Model: `Salesforce/codet5-base`
	- Fine-tuned on: Custom bug-fix dataset
	- Task: Code-to-Code generation (bug fixing)
	- Language: Python
	- Model Size: 220M parameters

	## Usage

	```python
	from transformers import T5ForConditionalGeneration, RobertaTokenizer

	# Load model and tokenizer
	model = T5ForConditionalGeneration.from_pretrained("Sagar123x/brainbug")
	tokenizer = RobertaTokenizer.from_pretrained("Sagar123x/brainbug")

	# Example: Fix buggy code
	faulty_code = """
	def check_for_file(self, file_path):
	files = self.connection.glob(file_path)
	return len(files) == 1
	"""

	# Prepare input
	input_text = f"Fix WVAV: {faulty_code}"
	inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True)

	# Generate fix
	outputs = model.generate(**inputs, max_length=256, num_beams=5)
	fixed_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(fixed_code)
	```

	## Training Details

	- Training Epochs: 10
	- Batch Size: 1 (with gradient accumulation)
	- Learning Rate: 3e-5
	- Optimizer: AdamW
	- Hardware: NVIDIA RTX 4050 (6GB)

	## Performance Metrics

	- Exact Match Accuracy: 2.60%
	- Token-Level Accuracy: 28.52%
	- Average Similarity: 76.75%


	## Limitations

	- Trained primarily on Python code
	- Best performance on error types seen during training
	- May not handle very long code snippets (>256 tokens)
	- Requires error type specification for optimal results

	## Citation

	```bibtex
	@misc{brainbug-codet5,
	author = {Your Name},
	title = {BrainBug: CodeT5 for Automatic Bug Repair},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/Sagar123x/brainbug}}
	}
	```

	## License

	MIT License

	## Contact

	For questions or issues, please open an issue on the model repository.