interesting architecture - a hbkang Collection

hbkang 's Collections

korean-language

synthetic-data-generation

Makeup Transfer

ID-Preserving Generation

interesting architecture

generative-model-training

talking-head-generation

artistic rendering

full-body-generation

interesting architecture

updated about 7 hours ago

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 91
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 25
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9
YOLOv12: Attention-Centric Real-Time Object Detectors

Paper • 2502.12524 • Published Feb 18, 2025 • 12
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20, 2025 • 165
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 128
ObjectMover: Generative Object Movement with Video Prior

Paper • 2503.08037 • Published Mar 11, 2025 • 5
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12, 2025 • 77
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18, 2025 • 153
Scaling Vision Pre-Training to 4K Resolution

Paper • 2503.19903 • Published Mar 25, 2025 • 42
Multi-Token Attention

Paper • 2504.00927 • Published Apr 1, 2025 • 56
TransMamba: Flexibly Switching between Transformer and Mamba

Paper • 2503.24067 • Published Mar 31, 2025 • 21
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Paper • 2504.20966 • Published Apr 29, 2025 • 31
MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21, 2025 • 98
MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 97
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation

Paper • 2506.19852 • Published Jun 24, 2025 • 43
Representing Speech Through Autoregressive Prediction of Cochlear Tokens

Paper • 2508.11598 • Published Aug 15, 2025 • 18
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 308
2D Gaussian Splatting with Semantic Alignment for Image Inpainting

Paper • 2509.01964 • Published Sep 2, 2025 • 7
Sequential Diffusion Language Models

Paper • 2509.24007 • Published Sep 28, 2025 • 47
BitNet Distillation

Paper • 2510.13998 • Published Oct 15, 2025 • 61
AnyUp: Universal Feature Upsampling

Paper • 2510.12764 • Published Oct 14, 2025 • 12
Latent Diffusion Model without Variational Autoencoder

Paper • 2510.15301 • Published Oct 17, 2025 • 50
Stronger Normalization-Free Transformers

Paper • 2512.10938 • Published Dec 11, 2025 • 23
Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 18
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation

Paper • 2601.03955 • Published Jan 7 • 3
AnyDepth: Depth Estimation Made Easy

Paper • 2601.02760 • Published Jan 6 • 11
ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers

Paper • 2601.05741 • Published Jan 9 • 2
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Paper • 2512.12167 • Published Dec 13, 2025 • 5
Implicit Neural Representation Facilitates Unified Universal Vision Encoding

Paper • 2601.14256 • Published Jan 20 • 7
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89
Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published Jan 29 • 104
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Paper • 2602.03216 • Published Feb 3 • 13
dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 153
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125
Continuous Latent Diffusion Language Model

Paper • 2605.06548 • Published 21 days ago • 79
Qwen-Image-VAE-2.0 Technical Report

Paper • 2605.13565 • Published 15 days ago • 59
Asymmetric Flow Models

Paper • 2605.12964 • Published 15 days ago • 21
Channel-wise Vector Quantization

Paper • 2605.26089 • Published 3 days ago • 12
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Paper • 2605.27365 • Published 1 day ago • 88