File size: 2,591 Bytes

ac5d95d
ec3d977
 
 
 
ac5d95d
b7605ea
ec3d977
 
 
 
 
 
 
 
 
b7605ea
ec3d977
b7605ea
ec3d977
a029535
ec3d977
 
 
 
b7605ea
 
 
ec3d977
b7605ea
 
 
 
80f7eae
ec3d977
80f7eae
ec3d977
80f7eae
ec3d977
 
80f7eae
ec3d977
80f7eae
 
 
ec3d977
 
 
80f7eae
 
 
 
ec3d977
 
 
 
 
 
 
 
80f7eae
 
 
 
ec3d977
 
 
 
 
80f7eae
 
 
ec3d977
80f7eae
ec3d977
 
80f7eae
ec3d977
80f7eae

datasets:

nvidia/OpenCodeReasoning

future-technologies/Universal-Transformers-Dataset
metrics:

bleu

AI FixCode: A Code Repair Model 🛠️
AI FixCode is a Transformer-based model designed to automatically identify and correct errors in source code. Built upon the powerful CodeT5 architecture, it's trained on a diverse dataset of real-world buggy and fixed code pairs to address both syntactic and semantic issues.

📌 Key Features
Base Model: Salesforce/codet5p-220m

Architecture: Encoder-Decoder (Seq2Seq)

Target Languages: Primarily Python, with future plans to expand to other languages like JavaScript and Go.

Task: Code repair and error correction.

🔧 How to Use
Simply provide a faulty code snippet, and the model will return a corrected version. It's intended for use in code editors, IDEs, or automated pipelines to assist developers in debugging.

Example:

# Input:
def add(x, y)
    return x + y

# Output:
def add(x, y):
    return x + y

🧠 Under the Hood

The model operates as a sequence-to-sequence system. During training, it learns to map a sequence of buggy code tokens to a sequence of correct code tokens. This approach allows it to "reason" about the necessary changes at a granular level, effectively predicting the patches needed to fix the code.

🚀 Getting Started with Inference
You can easily use the model with the Hugging Face transformers library.

Python Code

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "khulnasoft/aifixcode-model" # Replace with your model name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

input_code = "def foo(x):\n print(x"
inputs = tokenizer(input_code, return_tensors="pt")

# Generate the corrected code
outputs = model.generate(**inputs, max_length=512)
corrected_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(corrected_code)

📂 Dataset and Format
The model was trained on a custom dataset following a simple format:

[
  {
    "input": "def add(x, y)\n return x + y",
    "output": "def add(x, y):\n     return x + y"
  },
  {
    "input": "for i in range(10): \n  if i == 5 \n    print(i)",
    "output": "for i in range(10): \n  if i == 5: \n    print(i)"
  }
]

This format allows the model to learn the direct mapping between erroneous and corrected code.

🛡️ License and Acknowledgements
License: MIT License

Acknowledgements: This project was made possible by the incredible work of the Hugging Face team for their Transformers library and Salesforce for the CodeT5 model.