tm23hgf/wordle-qwen3-4b-sft
Updated
awesome work, i am going to start some research on reasoning SLM on rust wanted to know is the dataset publicly released?
Chinchilla paper actually shows that for a fixed compute budget, it is better to train a smaller model on more data rather than training a larger model for fewer steps.