Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.24601

Published papers on LLM architecture and updates.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 435
Recursive Language Models

Paper • 2512.24601 • Published 16 days ago • 63
Geospatial Mechanistic Interpretability of Large Language Models

Paper • 2505.03368 • Published May 6, 2025 • 12
GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI

Paper • 2511.15658 • Published Nov 19, 2025 • 1

Recursive Language Models

Paper • 2512.24601 • Published 16 days ago • 63

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published 16 days ago • 56
Recursive Language Models

Paper • 2512.24601 • Published 16 days ago • 63
Nested Learning: The Illusion of Deep Learning Architectures

Paper • 2512.24695 • Published 16 days ago • 35
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 251

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 75
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 106
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 504
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 30

Foundational Deep Learning - Architecture

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3, 2025 • 32
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

Paper • 2503.04725 • Published Mar 6, 2025 • 21
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Paper • 2601.02427 • Published 12 days ago • 41
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 16 days ago • 254
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published 17 days ago • 48
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Paper • 2601.02151 • Published 11 days ago • 98

Inference improvements

Recursive Language Models

Paper • 2512.24601 • Published 16 days ago • 63

Toolkit - AI Papers

Neural Machine Translation by Jointly Learning to Align and Translate

Paper • 1409.0473 • Published Sep 1, 2014 • 7
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 110
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 25
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 46

Good research papers

Good research papers collection

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29, 2025 • 72
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 203
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10, 2025 • 105
Small Language Models are the Future of Agentic AI

Paper • 2506.02153 • Published Jun 2, 2025 • 23

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 79
Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18, 2024 • 33
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58

Published papers on LLM architecture and updates.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 435
Recursive Language Models

Paper • 2512.24601 • Published 16 days ago • 63
Geospatial Mechanistic Interpretability of Large Language Models

Paper • 2505.03368 • Published May 6, 2025 • 12
GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI

Paper • 2511.15658 • Published Nov 19, 2025 • 1

NitroGen: An Open Foundation Model for Generalist Gaming Agents

Paper • 2601.02427 • Published 12 days ago • 41
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 16 days ago • 254
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published 17 days ago • 48
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

Paper • 2601.02151 • Published 11 days ago • 98

Recursive Language Models

Paper • 2512.24601 • Published 16 days ago • 63

Inference improvements

Recursive Language Models

Paper • 2512.24601 • Published 16 days ago • 63

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published 16 days ago • 56
Recursive Language Models

Paper • 2512.24601 • Published 16 days ago • 63
Nested Learning: The Illusion of Deep Learning Architectures

Paper • 2512.24695 • Published 16 days ago • 35
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 251

Toolkit - AI Papers

Neural Machine Translation by Jointly Learning to Align and Translate

Paper • 1409.0473 • Published Sep 1, 2014 • 7
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 110
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 25
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 46

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 75
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7, 2025 • 106
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 504
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 30

Good research papers

Good research papers collection

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29, 2025 • 72
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 203
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10, 2025 • 105
Small Language Models are the Future of Agentic AI

Paper • 2506.02153 • Published Jun 2, 2025 • 23

Foundational Deep Learning - Architecture

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper • 2503.02130 • Published Mar 3, 2025 • 32
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling

Paper • 2503.04725 • Published Mar 6, 2025 • 21
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
I-Con: A Unifying Framework for Representation Learning

Paper • 2504.16929 • Published Apr 23, 2025 • 30

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 79
Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18, 2024 • 33
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs