File size: 3,724 Bytes
cb50faf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01eb821
cb50faf
 
 
 
 
01eb821
cb50faf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01eb821
 
 
cb50faf
 
 
 
01eb821
cb50faf
 
 
 
01eb821
cb50faf
 
01eb821
cb50faf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01eb821
cb50faf
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
language: en
license: mit
tags:
- code
- git
- commit-message
- qwen2
- lora
datasets:
- bigcode/commitpackft
---

# Git Commit Message Generator

Fine-tuned Qwen-0.5B model for generating professional Git commit messages from code diffs.

## Model Description

This model was fine-tuned using LoRA (Low-Rank Adaptation) on the CommitPackFT dataset to generate concise, professional commit messages from git diffs.

**Base Model**: Qwen-0.5B  
**Fine-tuning Method**: LoRA (r=16, alpha=32)  
**Training Data**: 55K filtered commits from CommitPackFT  
**Languages**: Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more

## Intended Use

Generate commit messages for staged changes in a Git repository.

### Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "rajtiwariee/auto-commit"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Prepare your diff
diff = """
Diff:
File: src/auth.py
Language: Python

Old content:
def login(username, password):
    user = get_user(username)
    if user.password == password:
        return True
    return False

New content:
def login(username, password):
    user = get_user(username)
    if user and user.password == password:
        return True
    return False
"""

# Generate commit message
prompt = f"Write a git commit message:\n\n{diff}\n\nCommit message:\n"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=30,
        do_sample=False,  # Deterministic
        pad_token_id=tokenizer.eos_token_id,
    )

message = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(message.split("Commit message:")[-1].strip())
# Output: "Check for user existence before accessing password"
```

### CLI Tool

For easier usage, install the companion CLI tool from the [GitHub repository](https://github.com/rajtiwariee/GitCommitGenerator):

```bash
pip install -e .
commit-gen generate --commit
```

## Training Details

### Training Data

- **Dataset**: CommitPackFT (filtered subset)
- **Training samples**: 55,730
- **Validation samples**: 6,966
- **Test samples**: 6,967

### Training Procedure

- **Epochs**: 3
- **Batch Size**: 4 (effective batch size: 32 with gradient accumulation)
- **Learning Rate**: 5e-5
- **Optimizer**: AdamW
- **LoRA Config**:
  - r: 16
  - alpha: 32
  - dropout: 0.05
  - target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

### Hardware

- **GPU**: NVIDIA Tesla T4 (16GB)
- **Precision**: Mixed Precision (FP32 weights + FP16 compute)
- **Training Time**: ~7.5 hours

## Evaluation Results

- **BLEU Score**: 0.0244
- **ROUGE-1**: 0.1968
- **ROUGE-2**: 0.0420
- **ROUGE-L**: 0.1816
- **Exact Match Rate**: 0.00%


## Limitations

- The model is trained primarily on English commit messages
- Best suited for code changes in common programming languages
- May not handle very large diffs well (>384 tokens)
- Generated messages should be reviewed before committing

## Ethical Considerations

This model is intended to assist developers in writing commit messages, not replace human judgment. Users should:
- Review generated messages for accuracy
- Ensure messages accurately describe the changes
- Follow their team's commit message conventions

## Citation

```bibtex
@misc{git-commit-generator,
  author = {Raj Tiwari},
  title = {Git Commit Message Generator},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/rajtiwariee/auto-commit}},
}
```

## License

MIT License