khulnasoft
/

aifixcode-model

code-generation

text2text-generation

code-correction

Model card Files Files and versions

aifixcode-model / README.md

khulnasoft's picture

Update README.md

ec3d977 verified 5 months ago

|

2.59 kB

	datasets:

	nvidia/OpenCodeReasoning

	future-technologies/Universal-Transformers-Dataset
	metrics:

	bleu

	AI FixCode: A Code Repair Model 🛠️
	AI FixCode is a Transformer-based model designed to automatically identify and correct errors in source code. Built upon the powerful CodeT5 architecture, it's trained on a diverse dataset of real-world buggy and fixed code pairs to address both syntactic and semantic issues.

	📌 Key Features
	Base Model: Salesforce/codet5p-220m

	Architecture: Encoder-Decoder (Seq2Seq)

	Target Languages: Primarily Python, with future plans to expand to other languages like JavaScript and Go.

	Task: Code repair and error correction.

	🔧 How to Use
	Simply provide a faulty code snippet, and the model will return a corrected version. It's intended for use in code editors, IDEs, or automated pipelines to assist developers in debugging.

	Example:

	# Input:
	def add(x, y)
	return x + y

	# Output:
	def add(x, y):
	return x + y

	🧠 Under the Hood

	The model operates as a sequence-to-sequence system. During training, it learns to map a sequence of buggy code tokens to a sequence of correct code tokens. This approach allows it to "reason" about the necessary changes at a granular level, effectively predicting the patches needed to fix the code.

	🚀 Getting Started with Inference
	You can easily use the model with the Hugging Face transformers library.

	Python Code

	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

	model_name = "khulnasoft/aifixcode-model" # Replace with your model name
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

	input_code = "def foo(x):\n print(x"
	inputs = tokenizer(input_code, return_tensors="pt")

	# Generate the corrected code
	outputs = model.generate(**inputs, max_length=512)
	corrected_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(corrected_code)

	📂 Dataset and Format
	The model was trained on a custom dataset following a simple format:

	[
	{
	"input": "def add(x, y)\n return x + y",
	"output": "def add(x, y):\n return x + y"
	},
	{
	"input": "for i in range(10): \n if i == 5 \n print(i)",
	"output": "for i in range(10): \n if i == 5: \n print(i)"
	}
	]

	This format allows the model to learn the direct mapping between erroneous and corrected code.

	🛡️ License and Acknowledgements
	License: MIT License

	Acknowledgements: This project was made possible by the incredible work of the Hugging Face team for their Transformers library and Salesforce for the CodeT5 model.