16 24 15

Gabriel Mongaras

gmongaras

https://gmongaras.me/

AI & ML interests

None yet

Recent Activity

updated a collection 1 day ago

Papers I'm going to read

updated a collection 8 days ago

Papers I'm going to read

upvoted a paper 8 days ago

Rethinking the Role of Efficient Attention in Hybrid Architectures

View all activity

Organizations

upvoted a paper 8 days ago

Rethinking the Role of Efficient Attention in Hybrid Architectures

Paper • 2606.15378 • Published 13 days ago • 17

upvoted a paper 29 days ago

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

Paper • 2605.26494 • Published about 1 month ago • 41

upvoted 2 papers about 1 month ago

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Paper • 2605.22791 • Published May 21 • 33

Stable Audio 3

Paper • 2605.17991 • Published May 18 • 20

upvoted a paper 2 months ago

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Paper • 2604.10098 • Published Apr 11 • 82

upvoted a paper 4 months ago

Memory Caching: RNNs with Growing Memory

Paper • 2602.24281 • Published Feb 27 • 13

upvoted 3 papers 7 months ago

InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

Paper • 2512.08829 • Published Dec 9, 2025 • 23

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 269

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published Nov 12, 2025 • 218

upvoted a paper 8 months ago

Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer

Paper • 2510.25976 • Published Oct 29, 2025 • 16

upvoted an article 8 months ago

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

MiniMax-AI

•

Oct 30, 2025

• 80

upvoted 2 papers 9 months ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 517

Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30, 2025 • 59

upvoted an article 9 months ago

Article

There is no such thing as a tokenizer-free lunch

catherinearnett

•

Sep 25, 2025

• 100

upvoted 2 papers 12 months ago

A Systematic Analysis of Hybrid Linear Attention

Paper • 2507.06457 • Published Jul 8, 2025 • 26

Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3, 2025 • 25

upvoted 4 papers over 1 year ago

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Paper • 2503.14456 • Published Mar 18, 2025 • 153

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16, 2025 • 170

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14, 2025 • 305