v2: Training with real datasets, Mamba SSM, push-to-hub 962ddaa verified krystv commited on 22 days ago
v1.4: replace SSM scan with parallel causal linear attn, batch wavelet subbands, use F.scaled_dot_product_attention, fix AMP on CPU 7e78255 verified krystv commited on 22 days ago
Add training script with modern APIs (no deprecations) 7584952 verified krystv commited on 23 days ago