Text Generation
• 2B • Updated • 15
• 1
Text Generation
• 0.2B • Updated • 14
• 2
Text Generation
• 0.5B • Updated • 37
• 1
Text Generation
• 1B • Updated • 1.79k
• 1
Text Generation
• 7B • Updated • 60
• 3
fla-hub/SmolLM-1.7b-predecay
2B • Updated • 2
Text Generation
• 0.2B • Updated • 255
• 5
Text Generation
• 0.5B • Updated • 1.62k
Text Generation
• 2B • Updated • 265
Text Generation
• 3B • Updated • 289
• 2
Text Generation
• 7B • Updated • 41
• 2
fla-hub/Qwen2.5-3B-Instruct
3B • Updated • 28
8B • Updated • 2
fla-hub/Qwen2.5-7B-Instruct
8B • Updated • 3
Text Generation
• 3B • Updated • 124
• 3
Text Generation
• 2B • Updated • 620
• 9
Text Generation
• 0.2B • Updated • 65
• 1
Text Generation
• 0.5B • Updated • 284
• 1
Text Generation
• 1B • Updated • 13
Text Generation
• 0.4B • Updated • 10
Text Generation
• 0.2B • Updated • 15
• 4
fla-hub/transformer-340M-4K-0.5B-20480-lr3e-4-decay0.1-sqrt
0.4B • Updated • 3
fla-hub/transformer-340M-4K-0.5B-20480-lr3e-4-cosine
0.4B • Updated • 4
• 1
fla-hub/transformer-3B-qwen2.5
3B • Updated • 6
fla-hub/transformer-3B-qwen2.5-instruct
3B • Updated • 12
fla-hub/transformer-1.5B-qwen2.5-instruct
2B • Updated • 2
fla-hub/transformer-1.5B-qwen2.5
2B • Updated • 5
• 1
fla-hub/transformer-340M-10B
Text Generation
• 0.3B • Updated • 5
fla-hub/delta_net-1.3B-100B
Text Generation
• 1B • Updated • 879
Text Generation
• 3B • Updated • 6