Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
lzhbrian 's Collections
NN Arch
NN Arch Components
Loop
Linear Attention
TTT

NN Arch Components

updated about 4 hours ago
Upvote
-

  • Deep Delta Learning

    Paper • 2601.00417 • Published 11 days ago • 29

  • mHC: Manifold-Constrained Hyper-Connections

    Paper • 2512.24880 • Published 12 days ago • 240

  • VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

    Paper • 2512.14531 • Published 27 days ago • 12

  • Stronger Normalization-Free Transformers

    Paper • 2512.10938 • Published Dec 11, 2025 • 19

  • Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

    Paper • 2505.06708 • Published May 10, 2025 • 10

  • Transformers without Normalization

    Paper • 2503.10622 • Published Mar 13, 2025 • 170

  • Forgetting Transformer: Softmax Attention with a Forget Gate

    Paper • 2503.02130 • Published Mar 3, 2025 • 32

  • Hyper-Connections

    Paper • 2409.19606 • Published Sep 29, 2024 • 25

  • Virtual Width Networks

    Paper • 2511.11238 • Published Nov 14, 2025 • 37
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs