aifixcode-model / README.md
khulnasoft's picture
Update README.md
0be6f93 verified
|
raw
history blame
2.85 kB

datasets:

  • nvidia/OpenCodeReasoning
  • future-technologies/Universal-Transformers-Dataset metrics:
  • bleu

AI FixCode: A Code Repair Model πŸ› οΈ

AI FixCode is a Transformer-based model designed to automatically identify and correct errors in source code. Built upon the powerful CodeT5 architecture, it's trained on a diverse dataset of real-world buggy and fixed code pairs to address both syntactic and semantic issues.

πŸ“Œ Key Features

  • Base Model: Salesforce/codet5p-220m
  • Architecture: Encoder-Decoder (Seq2Seq)
  • Target Languages: Primarily Python, with future plans to expand to other languages like JavaScript and Go.
  • Task: Code repair and error correction.

πŸ”§ How to Use

Simply provide a faulty code snippet, and the model will return a corrected version. It's intended for use in code editors, IDEs, or automated pipelines to assist developers in debugging.

Example:

# Input:
def add(x, y)
 return x + y

# Output:
def add(x, y):
    return x + y

🧠 Under the Hood

The model operates as a sequence-to-sequence system. During training, it learns to map a sequence of buggy code tokens to a sequence of correct code tokens. This approach allows it to "reason" about the necessary changes at a granular level, effectively predicting the patches needed to fix the code.


πŸš€ Getting Started with Inference

You can easily use the model with the Hugging Face transformers library.

Python Code

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "khulnasoft/aifixcode-model" # Replace with your model name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

input_code = "def foo(x):\n print(x"
inputs = tokenizer(input_code, return_tensors="pt")

# Generate the corrected code
outputs = model.generate(**inputs, max_length=512)
corrected_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(corrected_code)

πŸ“‚ Dataset and Format

The model was trained on a custom dataset following a simple format:

[
  {
    "input": "def add(x, y)\n return x + y",
    "output": "def add(x, y):\n    return x + y"
  },
  {
    "input": "for i in range(10): \n  if i == 5 \n    print(i)",
    "output": "for i in range(10): \n  if i == 5: \n    print(i)"
  }
]

This format allows the model to learn the direct mapping between erroneous and corrected code.


πŸ›‘οΈ License and Acknowledgements

  • License: MIT License
  • Acknowledgements: This project was made possible by the incredible work of the Hugging Face team for their Transformers library and Salesforce for the CodeT5 model.