nics-efc/R2R_Router_Training_Qwen3-0.6B_Qwen3-30B-A3B
Viewer
•
Updated
•
9.3M
•
275
None defined yet.
SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models