interesting architecture
updated
FAN: Fourier Analysis Networks
Paper
• 2410.02675
• Published • 29
Tensor Product Attention Is All You Need
Paper
• 2501.06425
• Published • 90
Scalable-Softmax Is Superior for Attention
Paper
• 2501.19399
• Published • 25
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative
Image Modeling
Paper
• 2502.09509
• Published • 9
YOLOv12: Attention-Centric Real-Time Object Detectors
Paper
• 2502.12524
• Published • 12
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic
Understanding, Localization, and Dense Features
Paper
• 2502.14786
• Published • 164
Large Language Diffusion Models
Paper
• 2502.09992
• Published • 127
ObjectMover: Generative Object Movement with Video Prior
Paper
• 2503.08037
• Published • 5
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
• 2503.09573
• Published • 77
Transformers without Normalization
Paper
• 2503.10622
• Published • 172
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
• 2503.14456
• Published • 154
Scaling Vision Pre-Training to 4K Resolution
Paper
• 2503.19903
• Published • 41
Paper
• 2504.00927
• Published • 56
TransMamba: Flexibly Switching between Transformer and Mamba
Paper
• 2503.24067
• Published • 21
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
• 2504.20966
• Published • 31
MMaDA: Multimodal Large Diffusion Language Models
Paper
• 2505.15809
• Published • 98
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper
• 2506.07900
• Published • 96
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for
Long Video Generation
Paper
• 2506.19852
• Published • 42
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Paper
• 2508.11598
• Published • 17
Paper
• 2508.10104
• Published • 303
2D Gaussian Splatting with Semantic Alignment for Image Inpainting
Paper
• 2509.01964
• Published • 7
Sequential Diffusion Language Models
Paper
• 2509.24007
• Published • 47
Paper
• 2510.13998
• Published • 59
AnyUp: Universal Feature Upsampling
Paper
• 2510.12764
• Published • 12
Latent Diffusion Model without Variational Autoencoder
Paper
• 2510.15301
• Published • 50
Stronger Normalization-Free Transformers
Paper
• 2512.10938
• Published • 22
Bolmo: Byteifying the Next Generation of Language Models
Paper
• 2512.15586
• Published • 17
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation
Paper
• 2601.03955
• Published • 3
AnyDepth: Depth Estimation Made Easy
Paper
• 2601.02760
• Published • 10
ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers
Paper
• 2601.05741
• Published • 2
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
Paper
• 2512.12167
• Published • 5
Implicit Neural Representation Facilitates Unified Universal Vision Encoding
Paper
• 2601.14256
• Published • 7
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
• 2506.16035
• Published • 89
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper
• 2601.21204
• Published • 102
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
Paper
• 2602.03216
• Published • 13
dLLM: Simple Diffusion Language Modeling
Paper
• 2602.22661
• Published • 152
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
Paper
• 2603.21986
• Published • 121