Finally, someone made GPT look good, Jackpot.

#1
by Trilogix1 - opened

This is a breaking point for the small models I believe. You just broke the bottleneck of the small and fast but ineffective LLM models.
I am wondering if you can apply it already to 0.3b-0.6b LFM2 or Qwen3 would be an excellent example with it (because of high training tokens originally).

Again great job.

Sign up or log in to comment