🏗️ Building on HF

Zixi "Oz" Li PRO

OzTianlu

9 32 44

https://github.com/lizixi-0x2F

lizixi-0x2F

AI & ML interests

My research focuses on deep reasoning with small language models, Transformer architecture innovation, and knowledge distillation for efficient alignment and transfer.

Recent Activity

updated a model 13 days ago

OzTianlu/Qwen3.5-2B-OBLITERATED

published a model 13 days ago

OzTianlu/Qwen3.5-2B-OBLITERATED

reacted to theirpost with 🔥 21 days ago

ResNet is Explicit Euler. GPT is Implicit Euler. What Else is Hiding in Plain Sight? Read online: https://datawhalechina.github.io/learning-terrain/ I wrote an open-source monograph on learning dynamics — The Terrain of Learning. Bilingual (Chinese/English), 4 volumes, 12 chapters, 30+ print-grade figures. Completely free (CC BY-NC-SA 4.0). The core argument: gradient descent is not optimization. It's terrain motion. The loss function is a landscape. The gradient is the direction of slope. The optimizer is how you choose each step. Once you see it this way, everything clicks: ResNet = explicit Euler integration on a vector field. The residual branch is the vector field. Each layer takes one Euler step. GPT autoregression = implicit-state Euler iteration. Stable where explicit Euler explodes. That's why transformers handle long-range dependencies. DEQ = the Banach fixed-point theorem in production. The forward pass is root-finding. There are no layers to backprop through. KL divergence = a Bregman divergence on the entropy landscape. Your belief space is curved, not flat. Chain-of-thought reasoning = hidden states flowing along a reasoning field toward an attractor basin. Correct answers have wide basins. The number of reasoning steps is determined by the terrain, not by the problem. Diffusion models = systems flowing downhill along a score vector field, from noise to structure, from high energy to low energy. The book traces one idea across 337 years — from F=ma (Newton, 1687) to H=T+V (Hamilton, 1833) to loss landscape + gradient field (2020s). Hamilton replaced a catalog of forces with one geometric object. This book does the same for deep learning. GitHub: https://github.com/datawhalechina/learning-terrain Discussion: https://github.com/datawhalechina/learning-terrain/discussions/2 Convergence is not hope. Convergence is geometry. You see.

View all activity

Organizations

upvoted an article 21 days ago

Article

Why ResNet is Explicit Euler, and What That Tells Us About Deep Learning

NoesisLab

•

21 days ago

• 1

upvoted a paper 24 days ago

On the Geometry of On-Policy Distillation

Paper • 2606.07082 • Published 29 days ago • 75

upvoted a paper 26 days ago

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

Paper • 2606.07207 • Published 29 days ago • 4

upvoted a paper about 1 month ago

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

Paper • 2605.06741 • Published May 7 • 1

upvoted a collection 2 months ago

DeepSeek-V4

Collection

6 items • Updated 7 days ago • 715

upvoted 2 articles 4 months ago

Article

Arcade-3B: SLM Optimization via Orthogonal Decoupling of Latent State Spaces

NoesisLab

•

Mar 15

• 1

Article

Arcade-3B: 基于隐藏层状态空间正交解耦的 SLM 优化

NoesisLab

•

Mar 15

• 1

upvoted 3 papers 4 months ago

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

Paper • 2602.17684 • Published Feb 4 • 22

Efficient RLVR Training via Weighted Mutual Information Data Selection

Paper • 2603.01907 • Published Mar 2 • 14

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10, 2025 • 6

upvoted a collection 4 months ago

Seed Flagship Model Released

Collection

contributed • 8 items • Updated Apr 13 • 3

upvoted a paper 4 months ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published Feb 25 • 55

upvoted 2 articles 4 months ago

Article

Exploring New Frontiers of LLMs: Adaptive Dual-Search Distillation (ADS) and the 30B Model Open Beta

NoesisLab

•

Mar 1

• 2

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

ggerganov, ngxson, allozaur, lysandre, victor, julien-c

•

Feb 20

• 507

upvoted a collection 4 months ago

Kai Models Series

Collection

Kai Models Distilled via Adaptive Dual Search Distillation • 3 items • Updated Mar 2 • 2

upvoted a paper 4 months ago

Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding

Paper • 2602.19626 • Published Feb 23 • 3

upvoted an article 4 months ago

Article

Shattering the Memory Wall: O(1) Inference and Causal Monoid State Compression in Spartacus-1B

OzTianlu

•

Feb 25

• 2

upvoted a collection 5 months ago

Spartacus Monoid Reasoning Models

Collection

O(1) Reasoning Models • 1 item • Updated Feb 25 • 2

upvoted an article 5 months ago

Article

The Optimal Architecture for Small Language Models

codelion

•

Dec 26, 2025

• 121

upvoted a collection 5 months ago

Geilim Smol Language Models

Collection

Geilim Smol Language Models • 2 items • Updated Mar 3 • 1

Zixi "Oz" Li PRO

AI & ML interests

Recent Activity

Organizations

OzTianlu's activity

Why ResNet is Explicit Euler, and What That Tells Us About Deep Learning

Arcade-3B: SLM Optimization via Orthogonal Decoupling of Latent State Spaces

Arcade-3B: 基于隐藏层状态空间正交解耦的 SLM 优化

Exploring New Frontiers of LLMs: Adaptive Dual-Search Distillation (ADS) and the 30B Model Open Beta

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

Shattering the Memory Wall: O(1) Inference and Causal Monoid State Compression in Spartacus-1B

The Optimal Architecture for Small Language Models