File size: 2,591 Bytes
ac5d95d ec3d977 ac5d95d b7605ea ec3d977 b7605ea ec3d977 b7605ea ec3d977 a029535 ec3d977 b7605ea ec3d977 b7605ea 80f7eae ec3d977 80f7eae ec3d977 80f7eae ec3d977 80f7eae ec3d977 80f7eae ec3d977 80f7eae ec3d977 80f7eae ec3d977 80f7eae ec3d977 80f7eae ec3d977 80f7eae ec3d977 80f7eae | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | datasets:
nvidia/OpenCodeReasoning
future-technologies/Universal-Transformers-Dataset
metrics:
bleu
AI FixCode: A Code Repair Model ๐ ๏ธ
AI FixCode is a Transformer-based model designed to automatically identify and correct errors in source code. Built upon the powerful CodeT5 architecture, it's trained on a diverse dataset of real-world buggy and fixed code pairs to address both syntactic and semantic issues.
๐ Key Features
Base Model: Salesforce/codet5p-220m
Architecture: Encoder-Decoder (Seq2Seq)
Target Languages: Primarily Python, with future plans to expand to other languages like JavaScript and Go.
Task: Code repair and error correction.
๐ง How to Use
Simply provide a faulty code snippet, and the model will return a corrected version. It's intended for use in code editors, IDEs, or automated pipelines to assist developers in debugging.
Example:
# Input:
def add(x, y)
return x + y
# Output:
def add(x, y):
return x + y
๐ง Under the Hood
The model operates as a sequence-to-sequence system. During training, it learns to map a sequence of buggy code tokens to a sequence of correct code tokens. This approach allows it to "reason" about the necessary changes at a granular level, effectively predicting the patches needed to fix the code.
๐ Getting Started with Inference
You can easily use the model with the Hugging Face transformers library.
Python Code
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "khulnasoft/aifixcode-model" # Replace with your model name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
input_code = "def foo(x):\n print(x"
inputs = tokenizer(input_code, return_tensors="pt")
# Generate the corrected code
outputs = model.generate(**inputs, max_length=512)
corrected_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(corrected_code)
๐ Dataset and Format
The model was trained on a custom dataset following a simple format:
[
{
"input": "def add(x, y)\n return x + y",
"output": "def add(x, y):\n return x + y"
},
{
"input": "for i in range(10): \n if i == 5 \n print(i)",
"output": "for i in range(10): \n if i == 5: \n print(i)"
}
]
This format allows the model to learn the direct mapping between erroneous and corrected code.
๐ก๏ธ License and Acknowledgements
License: MIT License
Acknowledgements: This project was made possible by the incredible work of the Hugging Face team for their Transformers library and Salesforce for the CodeT5 model.
|