tomg-group-umd 's Collections Retrofitting Recurrence
updated
Teaching Pretrained Language Models to Think Deeper with Retrofitted
Recurrence
Paper
• 2511.07384
• Published • 19
smcleish/Recurrent-Llama-3.2-train-recurrence-32
Text Generation
• 1B • Updated • 5.82k
• 1
smcleish/Recurrent-Llama-3.2-train-recurrence-16
Text Generation
• 1B • Updated • 494
smcleish/Recurrent-Llama-3.2-train-recurrence-8
Text Generation
• 1B • Updated • 211
smcleish/Recurrent-Llama-3.2-train-recurrence-4
Text Generation
• 1B • Updated • 125
smcleish/Recurrent-TinyLlama-3T-train-recurrence-32
Text Generation
• 0.8B • Updated • 738
• 1
smcleish/Recurrent-TinyLlama-3T-train-recurrence-16
Text Generation
• 0.8B • Updated • 39
• 1
smcleish/Recurrent-TinyLlama-3T-train-recurrence-8
Text Generation
• 0.8B • Updated • 83
smcleish/Recurrent-TinyLlama-3T-train-recurrence-4
Text Generation
• 0.8B • Updated • 65
smcleish/Recurrent-OLMo-2-0425-train-recurrence-32
Text Generation
• 1B • Updated • 395
• 2
smcleish/Recurrent-OLMo-2-0425-train-recurrence-16
Text Generation
• 1B • Updated • 1
smcleish/Recurrent-OLMo-2-0425-train-recurrence-8
Text Generation
• 1B • Updated • 7
smcleish/Recurrent-OLMo-2-0425-train-recurrence-4
Text Generation
• 1B • Updated • 1
smcleish/Recurrent-TinyLlama-3T-train-recurrence-4-single-phase
Text Generation
• 0.8B • Updated • 2
smcleish/Recurrent-TinyLlama-3T-train-recurrence-4-two-phase
Text Generation
• 0.8B • Updated • 3
smcleish/Recurrent-Llama-3.2-untrained
Text Generation
• 1B • Updated • 49
smcleish/Recurrent-TinyLlama-3T-untrained
Text Generation
• 0.8B • Updated • 6
smcleish/Recurrent-OLMo-2-0425-untrained
Text Generation
• 1B • Updated • 7
smcleish/Recurrent-Llama-3.2-2-4-2-untrained
Text Generation
• 1B • Updated • 2
• 1
smcleish/retrofitting-llama-fineweb-edu-tokenized
Viewer
• Updated • 332M • 854