Finetune `nvidia/nemotron-ocr-v1` recognition model

#7
by johnlockejrr - opened

Is it possible to finetune the nvidia/nemotron-ocr-v1 recognition model on new data/languages? Is there any training code released? Thank you!

Ok. Wrote a finetuning script from scratch.

(nemotron-ocr-v1) incognito@DESKTOP-H1BS9PO:~/nemotron-ocr-v1$ python train_recognizer_detector_fixed.py   --train_file heb_synth_pangoline-xml_train.json   --val_file heb_synth_pangoline-xml_val.json   --image_dir heb_synth_pangoline-xml   --model_dir checkpoints   --output_dir checkpoints_hebrew_fixed   --epochs 50   --learning_rate 1e-4   --weight_decay 1e-4   --log_dir runs/hebrew_fixed   --patience 8
INFO:__main__:Loaded 63633 pages from heb_synth_pangoline-xml_train.json
INFO:__main__:Loaded 7071 pages from heb_synth_pangoline-xml_val.json
Epoch 1 [train]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 63633/63633 [3:00:23<00:00,  5.88it/s, loss=0.8506]
Epoch 1 [val]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7071/7071 [23:10<00:00,  5.08it/s, cer=0.5669, wer=0.8741]
INFO:__main__:Epoch 1: train_loss=1.3973, val_loss=1.9529, val_cer=0.5858, val_wer=0.8783
INFO:__main__:New best CER 0.5858 β€” saving best model
Epoch 2 [train]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 63633/63633 [3:03:52<00:00,  5.77it/s, loss=0.8970]
Epoch 2 [val]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7071/7071 [23:21<00:00,  5.05it/s, cer=0.5641, wer=0.8813]
INFO:__main__:Epoch 2: train_loss=0.7032, val_loss=2.0183, val_cer=0.5598, val_wer=0.8109
INFO:__main__:New best CER 0.5598 β€” saving best model
Epoch 3 [train]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 63633/63633 [3:04:01<00:00,  5.76it/s, loss=0.1227]
Epoch 3 [val]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7071/7071 [23:20<00:00,  5.05it/s, cer=0.5555, wer=0.8669]
INFO:__main__:Epoch 3: train_loss=0.4435, val_loss=2.3166, val_cer=0.5485, val_wer=0.7858
INFO:__main__:New best CER 0.5485 β€” saving best model
Epoch 4 [train]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 63633/63633 [3:04:13<00:00,  5.76it/s, loss=0.3979]
Epoch 4 [val]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7071/7071 [23:24<00:00,  5.03it/s, cer=0.5583, wer=0.8489]
INFO:__main__:Epoch 4: train_loss=0.3155, val_loss=2.4229, val_cer=0.5412, val_wer=0.7653
INFO:__main__:New best CER 0.5412 β€” saving best model
Epoch 5 [train]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 63633/63633 [3:07:39<00:00,  5.65it/s, loss=0.1290]
Epoch 5 [val]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7071/7071 [23:54<00:00,  4.93it/s, cer=0.5512, wer=0.8525]
INFO:__main__:Epoch 5: train_loss=0.2401, val_loss=1.9812, val_cer=0.5375, val_wer=0.7490
INFO:__main__:New best CER 0.5375 β€” saving best model
...
Epoch 12 [train]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 63633/63633 [3:09:34<00:00,  5.59it/s, loss=0.0143]
Epoch 12 [val]: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7071/7071 [24:22<00:00,  4.84it/s, cer=0.4681, wer=0.6547]
INFO:__main__:Epoch 12: train_loss=0.0805, val_loss=1.8426, val_cer=0.5065, val_wer=0.6866
INFO:__main__:New best CER 0.5065 β€” saving best model
Epoch 13 [train]:  29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                               | 18437/63633 [55:19<2:18:51,  5.42it/s, loss=0.1313]

Still training.

NVIDIA org

Awesome that you were able to get this working. We do not have any immediate plans to released training code, but we are working on a new multilingual model. Out of curiosity, which language did you fine-tune this for? Maybe we can add support in the next model.

Awesome that you were able to get this working. We do not have any immediate plans to released training code, but we are working on a new multilingual model. Out of curiosity, which language did you fine-tune this for? Maybe we can add support in the next model.

I trained it for Classical/Medieval Hebrew.

Sign up or log in to comment