17 21 8

khtsly

AI & ML interests

None yet

Recent Activity

upvoted a paper about 10 hours ago

Looped World Models

upvoted a paper 6 days ago

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

upvoted a paper 6 days ago

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

View all activity

Organizations

None yet

upvoted a paper about 10 hours ago

Looped World Models

Paper • 2606.18208 • Published 6 days ago • 348

upvoted 2 papers 6 days ago

Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

Paper • 2606.06036 • Published 18 days ago • 71

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

Paper • 2606.06574 • Published 18 days ago • 23

upvoted a paper 9 days ago

MiniMax Sparse Attention

Paper • 2606.13392 • Published 11 days ago • 141

upvoted a paper 10 days ago

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Paper • 2606.12397 • Published 12 days ago • 87

upvoted a paper 11 days ago

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Paper • 2606.11052 • Published 13 days ago • 16

upvoted a paper 12 days ago

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

Paper • 2606.09079 • Published 14 days ago • 62

New activity in sapientinc/HRM-Text-1B 16 days ago

Hrm can't calculate 2+2

#8 opened 17 days ago by

Xhub1880

commented a paper 17 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 517 •

upvoted a paper 17 days ago

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 517

upvoted a paper 18 days ago

Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding

Paper • 2605.29707 • Published 25 days ago • 147

upvoted 2 papers 19 days ago

dMoE: dLLMs with Learnable Block Experts

Paper • 2605.30876 • Published 24 days ago • 38

NITP: Next Implicit Token Prediction for LLM Pre-training

Paper • 2605.24956 • Published 29 days ago • 35

upvoted a paper 28 days ago

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Paper • 2605.23901 • Published about 1 month ago • 13

upvoted 2 papers 30 days ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published May 12 • 196

HRM-Text: Efficient Pretraining Beyond Scaling

Paper • 2605.20613 • Published May 20 • 318

upvoted 2 papers about 1 month ago

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Paper • 2605.22791 • Published May 21 • 33

Generative Recursive Reasoning

Paper • 2605.19376 • Published May 20 • 30

liked a model about 2 months ago

khtsly/luau-coder-preview-28B-A3B-noft

Text Generation • 28B • Updated Apr 26 • 81 • 2

published a model about 2 months ago