enochlev/llm-grpo-toddler-small-11
Text Generation • 0.4B • Updated
• 1
These are the models used for the paper: https://aclanthology.org/2025.sigdial-1.30/
Note Fine-tuned Smol small model.
Note pre-trained on CHILDES; best reported model in paper
Note used for training; reward model for determining age of generated outputs from toddler model
Note used for training; reward model for determining coherence of generated outputs from toddler model
Note model used to filter data