auto-commit / README.md
rajtiwariee's picture
Upload fine-tuned model
01eb821 verified
---
language: en
license: mit
tags:
- code
- git
- commit-message
- qwen2
- lora
datasets:
- bigcode/commitpackft
---
# Git Commit Message Generator
Fine-tuned Qwen-0.5B model for generating professional Git commit messages from code diffs.
## Model Description
This model was fine-tuned using LoRA (Low-Rank Adaptation) on the CommitPackFT dataset to generate concise, professional commit messages from git diffs.
**Base Model**: Qwen-0.5B
**Fine-tuning Method**: LoRA (r=16, alpha=32)
**Training Data**: 55K filtered commits from CommitPackFT
**Languages**: Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more
## Intended Use
Generate commit messages for staged changes in a Git repository.
### Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "rajtiwariee/auto-commit"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
# Prepare your diff
diff = """
Diff:
File: src/auth.py
Language: Python
Old content:
def login(username, password):
user = get_user(username)
if user.password == password:
return True
return False
New content:
def login(username, password):
user = get_user(username)
if user and user.password == password:
return True
return False
"""
# Generate commit message
prompt = f"Write a git commit message:\n\n{diff}\n\nCommit message:\n"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=30,
do_sample=False, # Deterministic
pad_token_id=tokenizer.eos_token_id,
)
message = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(message.split("Commit message:")[-1].strip())
# Output: "Check for user existence before accessing password"
```
### CLI Tool
For easier usage, install the companion CLI tool from the [GitHub repository](https://github.com/rajtiwariee/GitCommitGenerator):
```bash
pip install -e .
commit-gen generate --commit
```
## Training Details
### Training Data
- **Dataset**: CommitPackFT (filtered subset)
- **Training samples**: 55,730
- **Validation samples**: 6,966
- **Test samples**: 6,967
### Training Procedure
- **Epochs**: 3
- **Batch Size**: 4 (effective batch size: 32 with gradient accumulation)
- **Learning Rate**: 5e-5
- **Optimizer**: AdamW
- **LoRA Config**:
- r: 16
- alpha: 32
- dropout: 0.05
- target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
### Hardware
- **GPU**: NVIDIA Tesla T4 (16GB)
- **Precision**: Mixed Precision (FP32 weights + FP16 compute)
- **Training Time**: ~7.5 hours
## Evaluation Results
- **BLEU Score**: 0.0244
- **ROUGE-1**: 0.1968
- **ROUGE-2**: 0.0420
- **ROUGE-L**: 0.1816
- **Exact Match Rate**: 0.00%
## Limitations
- The model is trained primarily on English commit messages
- Best suited for code changes in common programming languages
- May not handle very large diffs well (>384 tokens)
- Generated messages should be reviewed before committing
## Ethical Considerations
This model is intended to assist developers in writing commit messages, not replace human judgment. Users should:
- Review generated messages for accuracy
- Ensure messages accurately describe the changes
- Follow their team's commit message conventions
## Citation
```bibtex
@misc{git-commit-generator,
author = {Raj Tiwari},
title = {Git Commit Message Generator},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/rajtiwariee/auto-commit}},
}
```
## License
MIT License