GOT-OCR2_0 β€” LoRA fine-tune on UniMER-1M

LoRA adapter (r=16) fine-tuned on m4xi/unimer-merged for math formula recognition (image β†’ LaTeX).

Base model

stepfun-ai/GOT-OCR2_0 β€” 568M parameter vision-language model (ViTDet encoder + Qwen-0.5B decoder).

Training

Dataset m4xi/unimer-merged (~1.04M train samples, 98/2 train/val split)
LoRA rank / alpha 16 / 32
Target modules q/k/v/o/gate/up/down_proj (vision encoder frozen)
Effective batch size 16 (4 per device Γ— 4 grad accum)
Learning rate 2e-4, cosine decay, 3% warmup
Precision bf16
Steps 115,500 (~1.78 epochs, ~1.85M samples seen)
Hardware NVIDIA L40S

Usage

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel

model = AutoModel.from_pretrained("stepfun-ai/GOT-OCR2_0", trust_remote_code=True, dtype=torch.bfloat16)
model = PeftModel.from_pretrained(model, "maximuskiii/got-mer-lora-r16")
model = model.merge_and_unload()
Downloads last month
58
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for m4xi/got-mer-lora-r16

Adapter
(3)
this model