manchu-ocr-pixtral-final
LoRA adapters for Manchu script OCR, fine-tuned from unsloth/pixtral-12b-2409-unsloth-bnb-4bit on the
mic7ch/manchu-2025-0033 dataset.
Part of a replication / ablation study on VLM-based OCR for Manchu.
The VLM is asked to output both the Manchu graph and a romanized transliteration
in a structured format (Manchu: {text}\nRoman: {text}).
Contents
best_model/— trainer's final-step save (selected byload_best_model_at_endonmanchu_cer).checkpoints/checkpoint-{step}/— every intermediate save atsave_steps=500(10 checkpoints total).
Paper selection uses the sweep winner on held-out
real_valrather than the trainer'sbest_model/, becausebest_model/is selected on training-timemanchu_cerand does not always coincide with the real-test peak. See the table below.
Best checkpoint by split
| Split | Checkpoint path | manchu_word_accuracy |
|---|---|---|
| real_val | checkpoints/checkpoint-4000/ |
0.9940 |
| test | checkpoints/checkpoint-5000/ |
0.9602 |
| validation | checkpoints/checkpoint-5000/ |
0.8980 |
Training recipe
- Base model:
unsloth/pixtral-12b-2409-unsloth-bnb-4bit(4-bit, bitsandbytes) - Framework: Unsloth + TRL
SFTTrainer - LoRA: r=32, alpha=64, dropout=0.05; targets include all attention + MLP projections plus vision/language heads
- Optimizer:
paged_adamw_8bit, lr=2e-4, cosine_with_restarts, warmup=100 - Batch:
per_device_train_batch_size=4,gradient_accumulation_steps=2, 4× GPU DDP → effective 32 - Save cadence:
save_steps=500,save_total_limit=50 - Primary training metric:
manchu_cer - Selection metric for paper:
manchu_word_accuracyon held-outreal_val
Usage (inference)
from transformers import AutoProcessor, AutoModelForVision2Seq
from peft import PeftModel
base = "unsloth/pixtral-12b-2409-unsloth-bnb-4bit"
adapter = "mic7ch/manchu-ocr-pixtral-final" # loads best_model by default; or use subfolder=
processor = AutoProcessor.from_pretrained(base)
model = AutoModelForVision2Seq.from_pretrained(base, device_map="auto", load_in_4bit=True)
model = PeftModel.from_pretrained(model, adapter, subfolder="best_model")
# For a specific step:
# model = PeftModel.from_pretrained(model, adapter, subfolder="checkpoints/checkpoint-{N}")
The model outputs:
Manchu: <manchu glyphs>
Roman: <romanized transliteration>
Checkpoints NOT included
We strip optimizer state (optimizer.pt), scheduler state (scheduler.pt), RNG state
(rng_state_*.pth), and TensorBoard events from each checkpoint. The uploaded files are
sufficient for inference and for continued LoRA fine-tuning from any step, but not for an
exact bit-identical training resume.
Citation
Paper forthcoming. Please cite the repository meanwhile:
@software{manchu_ocr_2026,
author = {Chung, H.-M. and collaborators},
title = {Vision-language-model OCR for Manchu script},
year = {2026},
url = {https://huggingface.co/mic7ch/manchu-ocr-pixtral-final}
}
- Downloads last month
- -