| --- |
| {} |
| --- |
| |
| # CodeGPT Fine-tuned for Code Generation |
|
|
| ## Model Description |
| This model is a fine-tuned version of [microsoft/CodeGPT-small-py](https://huggingface.co/microsoft/CodeGPT-small-py) |
| trained on coding problems and solutions for code generation tasks. |
|
|
| ## Training Details |
| - **Base Model:** microsoft/CodeGPT-small-py (124M parameters) |
| - **Dataset:** Rabinovich/Code-Generation-LLM-LoRA (500 examples) |
| - **Epochs:** 2 |
| - **Learning Rate:** 5e-5 |
| - **Batch Size:** 4 |
| - **Hardware:** CPU |
|
|
| ## Training Results |
| | Step | Training Loss | |
| |------|--------------| |
| | 25 | 4.4322 | |
| | 50 | 3.4648 | |
| | 100 | 3.1430 | |
| | 150 | 2.7050 | |
| | 200 | 2.7491 | |
| | 250 | 2.7126 | |
|
|
| **Loss improved from 4.43 → 2.71 (39% reduction)** |
|
|
| ## How to Use |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("Pradnya27/codegpt-finetuned-code-generation") |
| tokenizer = AutoTokenizer.from_pretrained("Pradnya27/codegpt-finetuned-code-generation") |
| |
| prompt = "Generate code: Write a function to check if a number is prime" |
| inputs = tokenizer(prompt, return_tensors="pt") |
| outputs = model.generate(inputs["input_ids"], max_new_tokens=100) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ## Limitations |
| - Trained on a small subset (500 examples) — larger training would improve results |
| - Works best with competitive programming style problems |
| - Output quality improves with more specific prompts |
|
|
| ## Future Work |
| - Train on full dataset (34,727 examples) |
| - Experiment with LoRA fine-tuning |
| - Evaluate on HumanEval benchmark |
|
|