Finally, someone made GPT look good, Jackpot.
#1
by
Trilogix1
- opened
This is a breaking point for the small models I believe. You just broke the bottleneck of the small and fast but ineffective LLM models.
I am wondering if you can apply it already to 0.3b-0.6b LFM2 or Qwen3 would be an excellent example with it (because of high training tokens originally).
Again great job.