boatbomber commited on
Commit
e4df893
·
1 Parent(s): 88a3806

Update readme

Browse files
Files changed (2) hide show
  1. README.md +7 -3
  2. assets/sft-grad-norm.png +0 -0
README.md CHANGED
@@ -62,15 +62,19 @@ The images are in color with dimensions between 100px and 2048px, inclusive.
62
 
63
  ### SFT
64
 
65
- TODO: details about fft & loss
 
 
66
 
67
  ### GRPO
68
 
69
- TODO: details about rslora & rewards
 
 
70
 
71
  ### Story
72
 
73
- For the more detailed story of how this model was trained, see [STORY.md](https://huggingface.co/boatbomber/NabuOCR/blob/main/STORY.md).
74
 
75
  ## Performance
76
 
 
62
 
63
  ### SFT
64
 
65
+ For SFT pre-training, the model was trained using full parameter fine-tuning for 2 epochs with a batch size of 2.
66
+
67
+ ![sft-loss](./assets/sft-loss.png)
68
 
69
  ### GRPO
70
 
71
+ For GRPO post-training, the model was trained using Rank Stabilized LoRA (r=256) for 1 epoch with 5 completions per prompt and a batch size of 30, then the adapter was merged back into the base at 16 bit precision.
72
+
73
+ ![grpo-reward](./assets/grpo-reward.png)
74
 
75
  ### Story
76
 
77
+ For the more detailed story of how this model was trained, see [STORY.md](https://huggingface.co/boatbomber/NabuOCR/blob/main/STORY.md). To read the code used for training with the specific hyperparameters and reward functions, see [training/](https://huggingface.co/boatbomber/NabuOCR/blob/main/training).
78
 
79
  ## Performance
80
 
assets/sft-grad-norm.png DELETED
Binary file (67.2 kB)