Theorical - a tmarechaux Collection

tmarechaux 's Collections

Theorical

updated Jan 5

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 85
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

Paper • 2309.15129 • Published Sep 25, 2023 • 7
Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 86
The Consensus Game: Language Model Generation via Equilibrium Search

Paper • 2310.09139 • Published Oct 13, 2023 • 14
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

Paper • 2212.11685 • Published Dec 22, 2022 • 2
Levels of AGI for Operationalizing Progress on the Path to AGI

Paper • 2311.02462 • Published Nov 4, 2023 • 36
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 629
Scaling Instructable Agents Across Many Simulated Worlds

Paper • 2404.10179 • Published Mar 13, 2024 • 28
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19, 2024 • 157
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Paper • 2407.01392 • Published Jul 1, 2024 • 45
softmax is not enough (for sharp out-of-distribution)

Paper • 2410.01104 • Published Oct 1, 2024 • 1
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 183
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Paper • 2410.02707 • Published Oct 3, 2024 • 47
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 330