Finetune `nvidia/nemotron-ocr-v1` recognition model

by johnlockejrr - opened Dec 23, 2025

Dec 23, 2025

Is it possible to finetune the nvidia/nemotron-ocr-v1 recognition model on new data/languages? Is there any training code released? Thank you!

johnlockejrr

Dec 27, 2025

•

edited Dec 27, 2025

Ok. Wrote a finetuning script from scratch.

(nemotron-ocr-v1) incognito@DESKTOP-H1BS9PO:~/nemotron-ocr-v1$ python train_recognizer_detector_fixed.py   --train_file heb_synth_pangoline-xml_train.json   --val_file heb_synth_pangoline-xml_val.json   --image_dir heb_synth_pangoline-xml   --model_dir checkpoints   --output_dir checkpoints_hebrew_fixed   --epochs 50   --learning_rate 1e-4   --weight_decay 1e-4   --log_dir runs/hebrew_fixed   --patience 8
INFO:__main__:Loaded 63633 pages from heb_synth_pangoline-xml_train.json
INFO:__main__:Loaded 7071 pages from heb_synth_pangoline-xml_val.json
Epoch 1 [train]: 100%|█████████████████████████████████████████████| 63633/63633 [3:00:23<00:00,  5.88it/s, loss=0.8506]
Epoch 1 [val]: 100%|████████████████████████████████████████| 7071/7071 [23:10<00:00,  5.08it/s, cer=0.5669, wer=0.8741]
INFO:__main__:Epoch 1: train_loss=1.3973, val_loss=1.9529, val_cer=0.5858, val_wer=0.8783
INFO:__main__:New best CER 0.5858 — saving best model
Epoch 2 [train]: 100%|█████████████████████████████████████████████| 63633/63633 [3:03:52<00:00,  5.77it/s, loss=0.8970]
Epoch 2 [val]: 100%|████████████████████████████████████████| 7071/7071 [23:21<00:00,  5.05it/s, cer=0.5641, wer=0.8813]
INFO:__main__:Epoch 2: train_loss=0.7032, val_loss=2.0183, val_cer=0.5598, val_wer=0.8109
INFO:__main__:New best CER 0.5598 — saving best model
Epoch 3 [train]: 100%|█████████████████████████████████████████████| 63633/63633 [3:04:01<00:00,  5.76it/s, loss=0.1227]
Epoch 3 [val]: 100%|████████████████████████████████████████| 7071/7071 [23:20<00:00,  5.05it/s, cer=0.5555, wer=0.8669]
INFO:__main__:Epoch 3: train_loss=0.4435, val_loss=2.3166, val_cer=0.5485, val_wer=0.7858
INFO:__main__:New best CER 0.5485 — saving best model
Epoch 4 [train]: 100%|█████████████████████████████████████████████| 63633/63633 [3:04:13<00:00,  5.76it/s, loss=0.3979]
Epoch 4 [val]: 100%|████████████████████████████████████████| 7071/7071 [23:24<00:00,  5.03it/s, cer=0.5583, wer=0.8489]
INFO:__main__:Epoch 4: train_loss=0.3155, val_loss=2.4229, val_cer=0.5412, val_wer=0.7653
INFO:__main__:New best CER 0.5412 — saving best model
Epoch 5 [train]: 100%|█████████████████████████████████████████████| 63633/63633 [3:07:39<00:00,  5.65it/s, loss=0.1290]
Epoch 5 [val]: 100%|████████████████████████████████████████| 7071/7071 [23:54<00:00,  4.93it/s, cer=0.5512, wer=0.8525]
INFO:__main__:Epoch 5: train_loss=0.2401, val_loss=1.9812, val_cer=0.5375, val_wer=0.7490
INFO:__main__:New best CER 0.5375 — saving best model
...
Epoch 12 [train]: 100%|████████████████████████████████████████████| 63633/63633 [3:09:34<00:00,  5.59it/s, loss=0.0143]
Epoch 12 [val]: 100%|███████████████████████████████████████| 7071/7071 [24:22<00:00,  4.84it/s, cer=0.4681, wer=0.6547]
INFO:__main__:Epoch 12: train_loss=0.0805, val_loss=1.8426, val_cer=0.5065, val_wer=0.6866
INFO:__main__:New best CER 0.5065 — saving best model
Epoch 13 [train]:  29%|████████████▋                               | 18437/63633 [55:19<2:18:51,  5.42it/s, loss=0.1313]

Still training.

emelryan

NVIDIA org Jan 6

Awesome that you were able to get this working. We do not have any immediate plans to released training code, but we are working on a new multilingual model. Out of curiosity, which language did you fine-tune this for? Maybe we can add support in the next model.

johnlockejrr

Jan 7

Awesome that you were able to get this working. We do not have any immediate plans to released training code, but we are working on a new multilingual model. Out of curiosity, which language did you fine-tune this for? Maybe we can add support in the next model.

I trained it for Classical/Medieval Hebrew.

maximvlah

Jan 21

hey @johnlockejrr , would you mind sharing the training script?

johnlockejrr

Jan 21

@maximvlah I didn't have much luck, something I might have overlooked because val_loss was too fluctuating.

thedoctor87

Mar 1

I assume it only recognizes English text?

johnlockejrr

Mar 1

•

edited Mar 1

I assume it only recognizes English text?

You are right, is useless, they keep the finetuning sources closed so is a dead project. Better use LightOnOCR-2, Chandra, GLM-OCR or other OCR models that you can finetune or are already multiligual.

emelryan changed discussion status to closed Mar 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment