Update README.md

d0cb8cb verified 11 months ago

376 Bytes

license: cc-by-nc-4.0

Nemo base model pretrained on 2billion of 4billion tokens. Intended for additional conversational/instruct tuning.

Eval Loss: 1.95439 -> 1.92584

Stopped this run at 12k steps out of about 21k. Main issues found were with DCLM dataset which is too low quality to use for such a small training job. I'll go back to this with higher quality data.