Finetune `nvidia/nemotron-ocr-v1` recognition model
Is it possible to finetune the nvidia/nemotron-ocr-v1 recognition model on new data/languages? Is there any training code released? Thank you!
Ok. Wrote a finetuning script from scratch.
(nemotron-ocr-v1) incognito@DESKTOP-H1BS9PO:~/nemotron-ocr-v1$ python train_recognizer_detector_fixed.py --train_file heb_synth_pangoline-xml_train.json --val_file heb_synth_pangoline-xml_val.json --image_dir heb_synth_pangoline-xml --model_dir checkpoints --output_dir checkpoints_hebrew_fixed --epochs 50 --learning_rate 1e-4 --weight_decay 1e-4 --log_dir runs/hebrew_fixed --patience 8
INFO:__main__:Loaded 63633 pages from heb_synth_pangoline-xml_train.json
INFO:__main__:Loaded 7071 pages from heb_synth_pangoline-xml_val.json
Epoch 1 [train]: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 63633/63633 [3:00:23<00:00, 5.88it/s, loss=0.8506]
Epoch 1 [val]: 100%|ββββββββββββββββββββββββββββββββββββββββ| 7071/7071 [23:10<00:00, 5.08it/s, cer=0.5669, wer=0.8741]
INFO:__main__:Epoch 1: train_loss=1.3973, val_loss=1.9529, val_cer=0.5858, val_wer=0.8783
INFO:__main__:New best CER 0.5858 β saving best model
Epoch 2 [train]: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 63633/63633 [3:03:52<00:00, 5.77it/s, loss=0.8970]
Epoch 2 [val]: 100%|ββββββββββββββββββββββββββββββββββββββββ| 7071/7071 [23:21<00:00, 5.05it/s, cer=0.5641, wer=0.8813]
INFO:__main__:Epoch 2: train_loss=0.7032, val_loss=2.0183, val_cer=0.5598, val_wer=0.8109
INFO:__main__:New best CER 0.5598 β saving best model
Epoch 3 [train]: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 63633/63633 [3:04:01<00:00, 5.76it/s, loss=0.1227]
Epoch 3 [val]: 100%|ββββββββββββββββββββββββββββββββββββββββ| 7071/7071 [23:20<00:00, 5.05it/s, cer=0.5555, wer=0.8669]
INFO:__main__:Epoch 3: train_loss=0.4435, val_loss=2.3166, val_cer=0.5485, val_wer=0.7858
INFO:__main__:New best CER 0.5485 β saving best model
Epoch 4 [train]: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 63633/63633 [3:04:13<00:00, 5.76it/s, loss=0.3979]
Epoch 4 [val]: 100%|ββββββββββββββββββββββββββββββββββββββββ| 7071/7071 [23:24<00:00, 5.03it/s, cer=0.5583, wer=0.8489]
INFO:__main__:Epoch 4: train_loss=0.3155, val_loss=2.4229, val_cer=0.5412, val_wer=0.7653
INFO:__main__:New best CER 0.5412 β saving best model
Epoch 5 [train]: 100%|βββββββββββββββββββββββββββββββββββββββββββββ| 63633/63633 [3:07:39<00:00, 5.65it/s, loss=0.1290]
Epoch 5 [val]: 100%|ββββββββββββββββββββββββββββββββββββββββ| 7071/7071 [23:54<00:00, 4.93it/s, cer=0.5512, wer=0.8525]
INFO:__main__:Epoch 5: train_loss=0.2401, val_loss=1.9812, val_cer=0.5375, val_wer=0.7490
INFO:__main__:New best CER 0.5375 β saving best model
...
Epoch 12 [train]: 100%|ββββββββββββββββββββββββββββββββββββββββββββ| 63633/63633 [3:09:34<00:00, 5.59it/s, loss=0.0143]
Epoch 12 [val]: 100%|βββββββββββββββββββββββββββββββββββββββ| 7071/7071 [24:22<00:00, 4.84it/s, cer=0.4681, wer=0.6547]
INFO:__main__:Epoch 12: train_loss=0.0805, val_loss=1.8426, val_cer=0.5065, val_wer=0.6866
INFO:__main__:New best CER 0.5065 β saving best model
Epoch 13 [train]: 29%|βββββββββββββ | 18437/63633 [55:19<2:18:51, 5.42it/s, loss=0.1313]
Still training.
Awesome that you were able to get this working. We do not have any immediate plans to released training code, but we are working on a new multilingual model. Out of curiosity, which language did you fine-tune this for? Maybe we can add support in the next model.
Awesome that you were able to get this working. We do not have any immediate plans to released training code, but we are working on a new multilingual model. Out of curiosity, which language did you fine-tune this for? Maybe we can add support in the next model.
I trained it for Classical/Medieval Hebrew.