manchu-ocr-pixtral-final

LoRA adapters for Manchu script OCR, fine-tuned from unsloth/pixtral-12b-2409-unsloth-bnb-4bit on the mic7ch/manchu-2025-0033 dataset. Part of a replication / ablation study on VLM-based OCR for Manchu.

The VLM is asked to output both the Manchu graph and a romanized transliteration in a structured format (Manchu: {text}\nRoman: {text}).

best_model/ — trainer's final-step save (selected by load_best_model_at_end on manchu_cer).
checkpoints/checkpoint-{step}/ — every intermediate save at save_steps=500 (10 checkpoints total).

Paper selection uses the sweep winner on held-out real_val rather than the trainer's best_model/, because best_model/ is selected on training-time manchu_cer and does not always coincide with the real-test peak. See the table below.

Best checkpoint by split

Split	Checkpoint path	manchu_word_accuracy
real_val	`checkpoints/checkpoint-4000/`	0.9940
test	`checkpoints/checkpoint-5000/`	0.9602
validation	`checkpoints/checkpoint-5000/`	0.8980

Training recipe

Base model: unsloth/pixtral-12b-2409-unsloth-bnb-4bit (4-bit, bitsandbytes)
Framework: Unsloth + TRL SFTTrainer
LoRA: r=32, alpha=64, dropout=0.05; targets include all attention + MLP projections plus vision/language heads
Optimizer: paged_adamw_8bit, lr=2e-4, cosine_with_restarts, warmup=100
Batch: per_device_train_batch_size=4, gradient_accumulation_steps=2, 4× GPU DDP → effective 32
Save cadence: save_steps=500, save_total_limit=50
Primary training metric: manchu_cer
Selection metric for paper: manchu_word_accuracy on held-out real_val

Usage (inference)

from transformers import AutoProcessor, AutoModelForVision2Seq
from peft import PeftModel

base = "unsloth/pixtral-12b-2409-unsloth-bnb-4bit"
adapter = "mic7ch/manchu-ocr-pixtral-final"            # loads best_model by default; or use subfolder=
processor = AutoProcessor.from_pretrained(base)
model = AutoModelForVision2Seq.from_pretrained(base, device_map="auto", load_in_4bit=True)
model = PeftModel.from_pretrained(model, adapter, subfolder="best_model")

# For a specific step:
# model = PeftModel.from_pretrained(model, adapter, subfolder="checkpoints/checkpoint-{N}")

The model outputs:

Manchu: <manchu glyphs>
Roman: <romanized transliteration>

Checkpoints NOT included

We strip optimizer state (optimizer.pt), scheduler state (scheduler.pt), RNG state (rng_state_*.pth), and TensorBoard events from each checkpoint. The uploaded files are sufficient for inference and for continued LoRA fine-tuning from any step, but not for an exact bit-identical training resume.

Citation

Paper forthcoming. Please cite the repository meanwhile:

@software{manchu_ocr_2026,
  author = {Chung, H.-M. and collaborators},
  title  = {Vision-language-model OCR for Manchu script},
  year   = {2026},
  url    = {https://huggingface.co/mic7ch/manchu-ocr-pixtral-final}
}

Downloads last month: -

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

mic7ch
/

manchu-ocr-pixtral-final