interesting architecture
updated
FAN: Fourier Analysis Networks
Paper
•
2410.02675
•
Published
•
29
Tensor Product Attention Is All You Need
Paper
•
2501.06425
•
Published
•
90
Scalable-Softmax Is Superior for Attention
Paper
•
2501.19399
•
Published
•
24
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative
Image Modeling
Paper
•
2502.09509
•
Published
•
8
YOLOv12: Attention-Centric Real-Time Object Detectors
Paper
•
2502.12524
•
Published
•
12
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic
Understanding, Localization, and Dense Features
Paper
•
2502.14786
•
Published
•
157
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
126
ObjectMover: Generative Object Movement with Video Prior
Paper
•
2503.08037
•
Published
•
5
Block Diffusion: Interpolating Between Autoregressive and Diffusion
Language Models
Paper
•
2503.09573
•
Published
•
75
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
170
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
•
2503.14456
•
Published
•
153
Scaling Vision Pre-Training to 4K Resolution
Paper
•
2503.19903
•
Published
•
41
Paper
•
2504.00927
•
Published
•
56
TransMamba: Flexibly Switching between Transformer and Mamba
Paper
•
2503.24067
•
Published
•
21
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
•
2504.20966
•
Published
•
31
MMaDA: Multimodal Large Diffusion Language Models
Paper
•
2505.15809
•
Published
•
97
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper
•
2506.07900
•
Published
•
93
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for
Long Video Generation
Paper
•
2506.19852
•
Published
•
42
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Paper
•
2508.11598
•
Published
•
17
Paper
•
2508.10104
•
Published
•
295
2D Gaussian Splatting with Semantic Alignment for Image Inpainting
Paper
•
2509.01964
•
Published
•
7
Sequential Diffusion Language Models
Paper
•
2509.24007
•
Published
•
46
Paper
•
2510.13998
•
Published
•
59
AnyUp: Universal Feature Upsampling
Paper
•
2510.12764
•
Published
•
12
Latent Diffusion Model without Variational Autoencoder
Paper
•
2510.15301
•
Published
•
49
Stronger Normalization-Free Transformers
Paper
•
2512.10938
•
Published
•
20
Bolmo: Byteifying the Next Generation of Language Models
Paper
•
2512.15586
•
Published
•
17
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation
Paper
•
2601.03955
•
Published
•
3
AnyDepth: Depth Estimation Made Easy
Paper
•
2601.02760
•
Published
•
10
ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers
Paper
•
2601.05741
•
Published
•
2
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
Paper
•
2512.12167
•
Published
•
4
Implicit Neural Representation Facilitates Unified Universal Vision Encoding
Paper
•
2601.14256
•
Published
•
6
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
•
2506.16035
•
Published
•
89
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper
•
2601.21204
•
Published
•
97
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
Paper
•
2602.03216
•
Published
•
12