| --- |
| tags: |
| - math-ocr |
| - latex |
| - unimer |
| - pytorch-lightning |
| --- |
| |
| # Texo — UniMER replication (step 1750 checkpoint) |
|
|
| Full fine-tune of [PP-FormulaNet-S](https://huggingface.co/alephpi/FormulaNet) on [m4xi/unimer-merged](https://huggingface.co/datasets/m4xi/unimer-merged) for math formula recognition (image → LaTeX). |
|
|
| ## Base model |
|
|
| PP-FormulaNet-S (`formulanet.pt` from alephpi/FormulaNet): 58M parameter model with HGNetV2 encoder + MBart decoder. This is the distilled checkpoint from the original Texo paper, prior to any UniMER training. |
|
|
| ## Training |
|
|
| | | | |
| |---|---| |
| | Dataset | m4xi/unimer-merged (~1.04M train samples, 98/2 train/val split) | |
| | Fine-tune strategy | Full (no LoRA) | |
| | Effective batch size | 64 (16 per device × 4 grad accum) | |
| | Learning rate | 1e-4, cosine decay | |
| | Precision | bf16 | |
| | Steps | 1,750 (~0.11 epochs, ~112K samples seen) | |
| | Hardware | NVIDIA RTX 4090, ~42h total runtime (crashed) | |
|
|
| ## Checkpoint selection |
|
|
| The run crashed after ~70k steps. Validation BLEU peaked at step 1750 (BLEU=0.7247, edit_distance=0.0582) and degraded monotonically after that. `save_top_k=5` retained only the five best-BLEU checkpoints, all of which fell within the first ~15% of epoch 1; consistent with rapid convergence from a strong pretrained base. |
| |
| ## Usage |
| |
| ```python |
| import torch |
| from task import FormulaNetLit # Texo src/task.py |
| |
| model = FormulaNetLit.load_from_checkpoint("texo-cp-1750.ckpt", map_location="cpu") |
| model.eval() |
|
|
| outputs = model.generate(pixel_values, num_beams=1, do_sample=False, max_new_tokens=512) |
| pred = model.tokenizer.batch_decode(outputs, skip_special_tokens=True) |
| ``` |
| |