sdar4b-random-mask-esft-summary / train_results.json
autoprogrammer's picture
SDAR-4B random-mask SFT on ESFT-summary (final)
3ea2af5 verified
{
"effective_tokens_per_sec": 700.7739741426446,
"epoch": 3.0,
"total_flos": 2.906371548002648e+18,
"train_loss": 4.493374055164515,
"train_runtime": 8201.4497,
"train_samples_per_second": 7.165,
"train_steps_per_second": 0.112
}