-
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper • 2512.17351 • Published • 28 -
facebook/PhysicsLM4.2__LlamaCanon-8B-Nemo-1T-lr0.003
Updated • 4 -
facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.002
Updated • 2 -
facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.003
Updated • 1
Zeyuan Allen-Zhu
zhuzeyuan
AI & ML interests
None yet
Organizations
"Physics of Language Models" series
-
Physics of Language Models: Part 1, Context-Free Grammar
Paper • 2305.13673 • Published • 7 -
Physics of Language Models: Part 3.2, Knowledge Manipulation
Paper • 2309.14402 • Published • 7 -
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper • 2404.05405 • Published • 10 -
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Paper • 2309.14316 • Published • 9
Physics of Language Models: Part 4.2
-
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper • 2512.17351 • Published • 28 -
facebook/PhysicsLM4.2__LlamaCanon-8B-Nemo-1T-lr0.003
Updated • 4 -
facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.002
Updated • 2 -
facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.003
Updated • 1
"Physics of Language Models" series
-
Physics of Language Models: Part 1, Context-Free Grammar
Paper • 2305.13673 • Published • 7 -
Physics of Language Models: Part 3.2, Knowledge Manipulation
Paper • 2309.14402 • Published • 7 -
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Paper • 2404.05405 • Published • 10 -
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Paper • 2309.14316 • Published • 9
models 0
None public yet
datasets 0
None public yet