lfm2_350m_commit_diff_summarizer / README.md

ethanker

Update Readme.md card

72d2c9b verified 4 months ago

preview code

raw

history blame

5.53 kB

metadata

base_model: unsloth/LFM2-350M-unsloth-bnb-4bit
library_name: peft
pipeline_tag: text-generation
tags:
  - base_model:adapter:unsloth/LFM2-350M-unsloth-bnb-4bit
  - lora
  - qlora
  - sft
  - transformers
  - trl
  - conventional-commits
  - code

lfm2_350m_commit_diff_summarizer (LoRA)

A lightweight helper model that turns Git diffs into Conventional Commit–style messages. It outputs strict JSON with a short title (≤ 65 chars) and up to 3 bullets, so your CLI/agents can parse it deterministically.

Model Details

Model Description

Purpose: Summarize git diff patches into concise, Conventional Commit–compliant titles with optional bullets.
I/O format:
- Input: prompt containing the diff (plain text).
- Output: JSON object: {"title": "...", "bullets": ["...", "..."]}.
Developed by: Ethan (HF: ethanke)
Shared by: Ethan (HF: ethanke)
Model type: LoRA adapter for causal LM (text generation)
Language(s): English (commit message conventions)
License: Inherits base model’s license; dataset has non-commercial terms (see Training Data). Review before production/commercial use.
Finetuned from: unsloth/LFM2-350M-unsloth-bnb-4bit (4-bit quantized base, trained with QLoRA)

Model Sources

Repository: This model card + adapter on the Hub under ethanke/lfm2_350m_commit_diff_summarizer

Uses

Direct Use

Convert patch diffs into Conventional Commit messages for PR titles, commits, and changelogs.
Provide human-readable summaries in agent UIs with guaranteed JSON structure.

Downstream Use

Plug into CI to auto-suggest commit titles after tests pass.
Use as a helper in a larger agent system (router/planner stays in a bigger model).

Out-of-Scope Use

General code generation or deep refactoring explanations.
Non-English commit conventions.
Knowledge-intensive narrative summaries.

Bias, Risks, and Limitations

Trained on public commits filtered to Conventional Commit titles; may prefer certain styles/projects.
Long diffs are truncated to max_length; summarization may miss edge changes.
Dataset license may restrict commercial usage; verify for your case.

Recommendations

Enforce JSON validation; if invalid, retry with a JSON-repair prompt.
Keep a regex gate for Conventional Commit titles in your pipeline.

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch, json

BASE = "unsloth/LFM2-350M-unsloth-bnb-4bit"
ADAPTER = "ethanke/lfm2_350m_commit_diff_summarizer"  # replace with your repo id

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                         bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.float16)

tok = AutoTokenizer.from_pretrained(BASE, use_fast=True)
mdl = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto")
mdl = PeftModel.from_pretrained(mdl, ADAPTER)

diff = "...your git diff text..."
prompt = (
  "You are a commit message summarizer.\n"
  "Return a concise JSON object with fields 'title' (<=65 chars) and 'bullets' (0-3 items).\n"
  "Follow the Conventional Commit style for the title.\n\n"
  "### DIFF\n" + diff + "\n\n### OUTPUT JSON\n"
)

inputs = tok(prompt, return_tensors="pt").to(mdl.device)
with torch.no_grad():
    out = mdl.generate(**inputs, max_new_tokens=200, do_sample=False)
text = tok.decode(out[0], skip_special_tokens=True)

# naive JSON extraction
js = text[text.rfind("{"): text.rfind("}")+1]
obj = json.loads(js)
print(obj)

Training Details

Training Data

Dataset: Maxscha/commitbench (diff → commit message).
Filtering: kept only samples whose first non-empty line of the message matches Conventional Commits: ^(feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert)($[^)]+$)?(!)?:\s.+$
Note: The dataset card indicates non-commercial licensing. Confirm before commercial deployment.

Training Procedure

Method: Supervised fine-tuning (SFT) with TRL SFTTrainer + QLoRA (PEFT).
Prompting: Instruction + ### DIFF + ### OUTPUT JSON target (title/bullets).
Precision: fp16 compute on 4-bit base.
Hyperparameters (v0.1):
- max_length=2048, per_device_train_batch_size=2, grad_accum=4
- lr=2e-4, scheduler=cosine, warmup_ratio=0.03
- epochs=1 over capped subset
- LoRA: r=16, alpha=32, dropout=0.05, targets: q/k/v/o + MLP proj

Evaluation

Validation: filtered split from CommitBench.
Metrics (example run):
- eval_loss ≈ 1.18 → perplexity ≈ 3.26
- eval_mean_token_accuracy ≈ 0.77
- Suggested task metrics: JSON validity rate, CC-title compliance, title length ≤ 65 chars, bullets ≤ 3.

Environmental Impact

Hardware: 1× NVIDIA GTX 3060 12 GB (local)
Hours used: ~1–2 h (prototype)

Technical Specifications

Architecture: LFM2-350M (decoder-only) + LoRA adapter
Libraries: transformers, trl, peft, bitsandbytes, datasets, unsloth

Citation

If you use this model, please cite the base model and dataset authors according to their cards.

Model Card Authors

Ethan (ethanke) and contributors

Contact

Open an issue on the Hub repo or message ethanke on Hugging Face.

Framework versions

PEFT 0.17.1
TRL (SFTTrainer)
Transformers (recent version)