nihalaninihal's picture
Update Colab notebook: 1.5B model, scaled rewards, tuned hyperparameters
ee8c2d4