Ethahtz's picture
Random-init hybrid GPT-2-small / Gated DeltaNet (full_linear); layer_types=['linear_attention', 'linear_attention', 'linear_attention', 'linear_attention', 'linear_attention', 'linear_attention', 'linear_attention', 'linear_attention', 'linear_attention', 'linear_attention', 'linear_attention', 'linear_attention']
ffe311a verified