Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2601.00417

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Paper • 2509.20427 • Published Sep 24, 2025 • 82
Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25, 2025 • 92
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Paper • 2510.06917 • Published Oct 8, 2025 • 34
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6, 2025 • 128

NN Arch Components

Deep Delta Learning

Paper • 2601.00417 • Published 17 days ago • 30
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 18 days ago • 259
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Paper • 2512.14531 • Published Dec 16, 2025 • 13
Stronger Normalization-Free Transformers

Paper • 2512.10938 • Published Dec 11, 2025 • 19

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 18 days ago • 259
Deep Delta Learning

Paper • 2601.00417 • Published 17 days ago • 30
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Paper • 2512.24615 • Published 18 days ago • 113

Foundational Deep Learning - Architecture

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3, 2025 • 32
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

Paper • 2503.04725 • Published Mar 6, 2025 • 21
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

Deep Delta Learning

Paper • 2601.00417 • Published 17 days ago • 30

Self-Correcting Delta Transformer - Adaptive LLMs

Self-Correcting Delta Transformer - DDL provides the Hardware mechanism (The Erazor), NL solves the software problem.

Nested Learning: The Illusion of Deep Learning Architectures

Paper • 2512.24695 • Published 18 days ago • 35
Deep Delta Learning

Paper • 2601.00417 • Published 17 days ago • 30
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 18 days ago • 259

Representation & Optimization

Understanding about representation sheds light on optimization

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1, 2025 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31, 2025 • 1

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 60
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25

Seedream 4.0: Toward Next-generation Multimodal Image Generation

Paper • 2509.20427 • Published Sep 24, 2025 • 82
Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25, 2025 • 92
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Paper • 2510.06917 • Published Oct 8, 2025 • 34
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6, 2025 • 128

Deep Delta Learning

Paper • 2601.00417 • Published 17 days ago • 30

NN Arch Components

Deep Delta Learning

Paper • 2601.00417 • Published 17 days ago • 30
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 18 days ago • 259
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Paper • 2512.14531 • Published Dec 16, 2025 • 13
Stronger Normalization-Free Transformers

Paper • 2512.10938 • Published Dec 11, 2025 • 19

Self-Correcting Delta Transformer - Adaptive LLMs

Self-Correcting Delta Transformer - DDL provides the Hardware mechanism (The Erazor), NL solves the software problem.

Nested Learning: The Illusion of Deep Learning Architectures

Paper • 2512.24695 • Published 18 days ago • 35
Deep Delta Learning

Paper • 2601.00417 • Published 17 days ago • 30
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 18 days ago • 259

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 18 days ago • 259
Deep Delta Learning

Paper • 2601.00417 • Published 17 days ago • 30
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Paper • 2512.24615 • Published 18 days ago • 113

Representation & Optimization

Understanding about representation sheds light on optimization

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1, 2025 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31, 2025 • 1

Foundational Deep Learning - Architecture

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3, 2025 • 32
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

Paper • 2503.04725 • Published Mar 6, 2025 • 21
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 52
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 60
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 24
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs