Finetuning scripts

by johnlockejrr - opened 12 days ago
Do you intend to release the scripts for finetuning the model or the recipe? Opensource models without finetuning scripts are dead even being good ones, look at Nemotron OCR v1.
johnlockejrr
6 days ago
•
edited 6 days ago
No problem.
$ nemo-ocr train --model_config configs/catmus.yaml
╭────────── Finetune ──────────╮
│ Extended charset: +111 chars │
╰──────────────────────────────╯
[WARN] Missing keys: ['stem.weight', 'stem.bias', 'classifier.weight', 'classifier.bias']
Using bfloat16 Automatic Mixed Precision (AMP)
Trainer already configured with model summary callbacks: [<class 'nemo_ocr.train.callbacks.DelayedRichModelSummary'>]. Skipping setting a default `ModelSummary` callback.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
💡 Tip: For seamless cloud logging and experiment tracking, try installing [litlogger](https://pypi.org/project/litlogger/) to enable LitLogger, which logs metrics and artifacts automatically to the Lightning Experiments platform.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
┏━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃    ┃ Name             ┃ Type               ┃ Params ┃ Mode  ┃ FLOPs ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ 0  │ model            │ NemoRecognizer     │ 36.2 M │ train │     0 │
│ 1  │ model.stem       │ Conv2d             │    256 │ train │     0 │
│ 2  │ model.encoder    │ NemoCNNEncoder     │  9.8 M │ train │     0 │
│ 3  │ model.encoder.0  │ Sequential         │  147 K │ train │     0 │
│ 4  │ model.encoder.1  │ Sequential         │  295 K │ train │     0 │
│ 5  │ model.encoder.2  │ MaxPool2d          │      0 │ train │     0 │
│ 6  │ model.encoder.3  │ Sequential         │  590 K │ train │     0 │
│ 7  │ model.encoder.4  │ Sequential         │  1.2 M │ train │     0 │
│ 8  │ model.encoder.5  │ MaxPool2d          │      0 │ train │     0 │
│ 9  │ model.encoder.6  │ Sequential         │  2.4 M │ train │     0 │
│ 10 │ model.encoder.7  │ Sequential         │  4.7 M │ train │     0 │
│ 11 │ model.encoder.8  │ MaxPool2d          │      0 │ train │     0 │
│ 12 │ model.encoder.9  │ Sequential         │  525 K │ train │     0 │
│ 13 │ model.tx         │ TransformerEncoder │ 18.9 M │ train │     0 │
│ 14 │ model.tx.layers  │ ModuleList         │ 18.9 M │ train │     0 │
│ 15 │ model.tx.norm    │ LayerNorm          │  1.0 K │ train │     0 │
│ 16 │ model.classifier │ Linear             │  7.4 M │ train │     0 │
│ 17 │ criterion        │ CTCLoss            │      0 │ train │     0 │
└────┴──────────────────┴────────────────────┴────────┴───────┴───────┘
Trainable params: 26.3 M
Non-trainable params: 9.8 M
Total params: 36.2 M
Total estimated model params size (MB): 144
Modules in train mode: 92
Modules in eval mode: 0
Total FLOPs: 0
Epoch 0/149 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4280/4280 0:12:19 • 0:00:00 5.81it/s v_num: 0.000 tr_loss_step: 2.192 va_loss: 2.308 va_cer: 0.698 va_wer: 1.051 tr_loss_epoch: 2.698 early_stop: 0/10 0.69808
Epoch 1/149 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4280/4280 0:12:22 • 0:00:00 5.88it/s v_num: 0.000 tr_loss_step: 1.974 va_loss: 1.939 va_cer: 0.577 va_wer: 0.977 tr_loss_epoch: 2.066 early_stop: 0/10 0.57666
Epoch 2/149 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4280/4280 0:12:21 • 0:00:00 5.86it/s v_num: 0.000 tr_loss_step: 0.904 va_loss: 1.906 va_cer: 0.532 va_wer: 0.975 tr_loss_epoch: 1.766 early_stop: 0/10 0.53154
Epoch 3/149 ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1202/4280 0:03:29 • 0:08:54 5.77it/s v_num: 0.000 tr_loss_step: 1.616 va_loss: 1.906 va_cer: 0.532 va_wer: 0.975 tr_loss_epoch: 1.766 early_stop: 0/10 0.53154
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
· Sign up or log in to comment