datasets: - EleutherAI/pile language: - en
Based model but uses layernorm instead of QK.sum(-1) for the normalization, for better hardware efficiency.