Was the 32B model trained on 25K or 200K Trajectories?
On Tuesday the SERA model mentioned being trained on 200K Trajectories and now it says 25K trajectories. Which one is it? There was mention that the dataset was coming soon.. So I'm confused because not it says it was trained on one of the datasets you've provided under 40K rows.
Great work by the way! Really like what you have done!
It was 25K from that dataset (GLM 4.6 distilled). We generate a total of 200K across all configurations though. Currently working on cleaning up the cards so this should be more clear soon!
@elchulito89 Made a bunch of improvements to the training dataset which should clarify things! Also added a bunch of metadata fields in case thats useful. The HF counts are a little off so I wrote the actual counts in the cards, but the aggregate across the datasets is a little over 200K.
Awesome! Thank you for the clarification! This makes sense to me now! I really appreciate it! Great work again!