Reasoning Kingdom

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

OzTianlu authored a paper about 1 month ago

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

OzTianlu updated a Space about 1 month ago

ReasoningKingdom/README

OzTianlu submitted a paper about 1 month ago

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

View all activity

Papers

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

View all Papers

OzTianlu

posted an update about 1 month ago

Post

6317

ResNet is Explicit Euler. GPT is Implicit Euler. What Else is Hiding in Plain Sight?

Read online: https://datawhalechina.github.io/learning-terrain/

I wrote an open-source monograph on learning dynamics — The Terrain of Learning. Bilingual (Chinese/English), 4 volumes, 12 chapters, 30+ print-grade figures. Completely free (CC BY-NC-SA 4.0).

The core argument: gradient descent is not optimization. It's terrain motion. The loss function is a landscape. The gradient is the direction of slope. The optimizer is how you choose each step. Once you see it this way, everything clicks:

ResNet = explicit Euler integration on a vector field. The residual branch is the vector field. Each layer takes one Euler step.

GPT autoregression = implicit-state Euler iteration. Stable where explicit Euler explodes. That's why transformers handle long-range dependencies.

DEQ = the Banach fixed-point theorem in production. The forward pass is root-finding. There are no layers to backprop through.

KL divergence = a Bregman divergence on the entropy landscape. Your belief space is curved, not flat.

Chain-of-thought reasoning = hidden states flowing along a reasoning field toward an attractor basin. Correct answers have wide basins. The number of reasoning steps is determined by the terrain, not by the problem.

Diffusion models = systems flowing downhill along a score vector field, from noise to structure, from high energy to low energy.

The book traces one idea across 337 years — from F=ma (Newton, 1687) to H=T+V (Hamilton, 1833) to loss landscape + gradient field (2020s). Hamilton replaced a catalog of forces with one geometric object. This book does the same for deep learning.

GitHub: https://github.com/datawhalechina/learning-terrain
Discussion: https://github.com/datawhalechina/learning-terrain/discussions/2

Convergence is not hope. Convergence is geometry. You see.

1 reply

OzTianlu

authored a paper about 1 month ago

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

Paper • 2606.07207 • Published Jun 5 • 4

OzTianlu

updated a Space about 1 month ago

README

🦀

OzTianlu

submitted a paper to Daily Papers about 1 month ago

Entropy as a Structural Prior: How a Log-Barrier on DiT Belief Space Drives Musical Diversity and Development

Paper • 2606.07207 • Published Jun 5 • 4

OzTianlu

published a Space about 2 months ago

README

🦀

OzTianlu

authored a paper 2 months ago

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

Paper • 2605.06741 • Published May 7 • 1

OzTianlu

submitted a paper to Daily Papers 2 months ago

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

Paper • 2605.06741 • Published May 7 • 1

OzTianlu

posted an update 4 months ago

Post

1431

https://github.com/lizixi-0x2F/March
I just released March, an open-source high-performance KV cache sharing library for LLM inference that uses Trie-based prefix deduplication.
When you run LLM services, you often see thousands of requests sharing the same system prompt and conversation history. But traditional KV cache systems store each sequence separately — duplicating the exact same data over and over again. Pure waste.
March uses a Trie structure to automatically detect and reuse identical token prefixes. Instead of storing [system_prompt + history] 1000 times, it's stored once. Everyone shares it.
- 80-97% memory reduction in prefix-heavy workloads (tested on SmolLM2-135M with 500 multi-turn conversations)
- Zero-copy queries — returns direct pointers into the memory pool, no expensive memcpy on the hot path
- Predictable memory usage — fixed-size page pool with O(L) complexity
- Trade-off: slightly slower than dict O(1) lookup, but the memory savings are worth it in production

1 reply

OzTianlu

posted an update 4 months ago

Post

5417

Arcade-3B — SmolReasoner
NoesisLab/Arcade-3B
Arcade-3B is a 3B instruction-following and reasoning model built on SmolLM3-3B. It is the public release from the ARCADE project at NoesisLab, which investigates the State–Constraint Orthogonality Hypothesis: standard Transformer hidden states conflate factual content and reasoning structure in the same subspace, and explicitly decoupling them improves generalization.

5 replies

OzTianlu

posted an update 4 months ago

Post

1977

We deleted the Embedding Layer -- INTRO Our Collins-Embedding-3M
NoesisLab/Collins-Embedding-3M
Most "small" models are just giant vocab tables in a trench coat. Collins-3M changes that. By using 2-Universal Hashing and Chernoff-bound noise suppression, we’ve collapsed the embedding space into a fixed O(1) hash-map.
* STSB: 0.7114 (Beating many 100M+ models)
* Size: 3M (Edge-ready, IoT-ready)
* Tech: Randomized Sign-Hashing + RoPE positional injection.
Built by NoesisLab

OzTianlu

posted an update 5 months ago

Post

4802

🔥 UPGRADE in Kai: 30B Scaling! 🔥
NoesisLab/Kai-30B-Instruct
NoesisLab/Kai-30B-Instruct
We are incredibly excited to announce that the Kai-30B-Instruct model and its official Space are now LIVE! 🚀
If you've been following the journey from Kai-0.35B to Kai-3B, you know we're rethinking how models reason. Tired of verbose, slow Chain-of-Thought (CoT) outputs that flood your screen with self-talk? So are we.
Kai-30B-Instruct scales up our Adaptive Dual-Search Distillation (ADS) framework. By bridging classical A* heuristic search with continuous gradient descent , we use an information-theoretic log-barrier to physically prune high-entropy reasoning paths during training.
The result? Pure implicit reasoning. The model executes structured logic, arithmetic carries, and branch selections as a reflex in a single forward pass—no external scaffolding required.
At 3B, we observed a phase transition where the model achieved "logical crystallization". Now, at 30B, we are giving the ADS regularizer the massive representational capacity it needs to tackle higher-order symbolic abstractions and complex reasoning tasks.
🧪 Test Kai yourself in our new Space:
NoesisLab/Kai-30B-Instruct
📦 Model Weights:
NoesisLab/Kai-30B-Instruct
Bring your hardest math, logic, and coding benchmarks. We invite the community to stress-test the limits of the penalty wall! 🧱💥

1 reply

OzTianlu

posted an update 5 months ago

Post

1750

Scaling UP in Kai! 🌊
NoesisLab/Kai-3B-Instruct

Introducing NoesisLab/Kai-3B-Instruct What happens when you force a 3B model to reason entirely in its latent space ?
Meet Kai-3B, our latest industrial-grade reasoning model fine-tuned using the Adaptive Dual Search (ADS) algorithm.
GSM8K (0-shot, Direct Answer): 39.27% 🤯 (Llama-2-7B is ~14.6%)
HumanEval (Pass@1): 39.02% 💻 (Overtakes Gemma-2-2B's 30%)
MMLU (5-shot): 53.62% 📚 (Crushing the 50% barrier)
ARC-Challenge: 51.88%🎯
PIQA: 77.53%
HellaSwag: 69.53%
Kai-3B proves that reasoning density doesn't strictly require parameter bloat or verbose generation. It acts as a perfect, cold-blooded Agent action-engine—ideal for JSON routing, SWE-bench patch generation, and anywhere you need absolute structured certainty without token waste.

2 replies

OzTianlu

posted an update 5 months ago

Post

1549

🛡️ Meet Spartacus-1B: Shattering the Memory Wall with True O(1) Inference! 🚀
NoesisLab/Spartacus-1B-Instruct
NoesisLab/ChatSpartacus
At NoesisLab, we've entirely ripped out Softmax Attention and replaced it with Causal Monoid State Compression.
Say hello to Spartacus-1B-Instruct (1.3B) 🗡️.
Instead of maintaining a massive, ever-growing list of past tokens, Spartacus compresses its entire causal history into a fixed-size state matrix per head. The result?
⚡ True O(1) Inference: Memory footprint and generation time per token remain absolutely constant, whether you are on token 10 or token 100,000.
🧠 Explicit Causality: We threw away RoPE and attention masks. The model learns when to forget using dynamic, content-aware vector decay.
🔥 Blazing Fast Training: Full hardware utilization via our custom Triton-accelerated JIT parallel prefix scan.
📊 Zero-Shot Benchmarks that Hit Hard:
O(1) architectures usually sacrifice zero-shot accuracy. Not Spartacus. It is punching way above its weight class, beating established sub-quadratic models (like Mamba-1.4B and RWKV-6-1.6B):
🏆 ARC-Challenge: 0.3063 (vs Mamba 0.284)
🏆 ARC-Easy: 0.5518
🏆 PIQA: 0.6915

OzTianlu

posted an update 5 months ago

Post

3467

O(1) inference is the foundational design of Spartacus-1B-Instruct 🛡️ !

NoesisLab/Spartacus-1B-Instruct

We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head.

The technical core of this architecture relies on the associativity of the monoid operator:

Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously.
Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length.
Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates.

Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%.

The "Spartacus" era is about scaling intelligence, not the memory wall ♾️.

OzTianlu

posted an update 5 months ago

Post

873

🚀 NanoHammer-1.5B-Instruct:
https://huggingface.co/NoesisLab/NanoHammer-1.5B-Instruct
We are excited to introduce NanoHammer, a novel architecture by NoesisLab designed for Causal State Compression and true Linear Inference Complexity.
🧠 The Core: Holographic State SpaceForget the growing KV Cache. NanoHammer leverages Holographic Rotary Embeddings to compress sequence history into a dynamic integral state.
Polynomial Compression: Instead of storing raw history, we "integrate" context into a complex number space , treating memory as a container of evolving polynomial coefficients.
Dynamic Evolution: The architecture features a custom StateUpdateCell that uses Euler method fixed-point iteration, allowing the model to perform implicit reasoning via differential state updates.
⚡ Why It Matters: Efficiency Meets Reasoning O(1) Inference Memory: State size remains constant regardless of sequence length.Causal Modeling: Explicitly models the causal flow of logic through time, perfect for "implicit reasoning" tasks without the verbosity of Chain-of-Thought.1.5B Lightweight Design: High performance, low resource footprint.
🛠 Model Card HighlightsType: nanohammer (Hybrid Causal-State Architecture)
License: Apache 2.0
Capabilities: Instruction following, Long-context handling
🔗 Try it on Hugging Face: https://huggingface.co/NoesisLab/NanoHammer-1.5B-Instruct

1 reply

OzTianlu

posted an update 5 months ago

Post

2799

Geilim-1B-SR-Instruct — Serbian Intelligence for Deep Reasoning 🧠🇷🇸
NoesisLab/Geilim-1B-SR-Instruct
Geilim-1B-SR-Instruct is a lightweight Large Language Model (LLM) designed to bring advanced reasoning capabilities to low-resource languages. It focuses on Serbian understanding and generation while maintaining robust English reasoning. Built on the LLaMA-3 architecture with a proprietary hybrid reasoning mechanism, it delivers deep logic while keeping outputs concise and natural. 🚀

Core Innovations 💡

Implicit Deep Reasoning: Combines standard attention mechanisms with graph-structured reasoning components for rigorous logic and causal inference. 🕸️

ASPP & -flow Hybrid Design: High-efficiency structured propagation + internal probability space optimization for high-quality reasoning without long-winded intermediate steps. ⚡
Bilingual Adaptation: Primarily focused on Serbian while preserving English logic, making it perfect for multilingual chats and cross-lingual tasks. 🌍
Lightweight & Efficient: At ~1.3B parameters, it runs smoothly on consumer-grade GPUs, ideal for edge devices and research. 💻

Use Cases 🛠️

Serbian Chatbots: Intelligent assistants with local linguistic nuance. 🗣️
Educational Tools: Multi-turn interactive tasks and learning support. 📚

Key Advantages ✨

Clean Output: Avoids messy "thinking" tags; reasoning happens internally, delivering clear and direct results. ✅
Open Access: Licensed under Apache-2.0, making it easy for research and engineering integration. 🔓
AI Democratization: Empowering low-resource language ecosystems with cutting-edge intelligence. 🤝

1 reply

OzTianlu

posted an update 6 months ago

Post

2571

🚀 Geilim-1B-Instruct — Implicit Deep Reasoning, Zero Verbosity
NoesisLab/Geilim-1B-Instruct
https://huggingface.co/collections/NoesisLab/geilim-large-language-models
No <think> tags. No long CoT.
Reasoning happens inside the hidden states, not in the output.
What’s different
🧠 Implicit reasoning: deep causal reasoning without exposing chains
🕸️ ASPP (Adjacency-Structured Parallel Propagation): parent-only causal graph, O(n) message passing
🌊 π-flow: internal probability-space refinement instead of token-level deliberation
⚖️ Hybrid gating: learns when to use structure vs attention
Why it matters
Lower latency & token cost
Cleaner, production-ready outputs
CoT-level reasoning depth without verbosity tax
Built on Llama-3.2-1B-Instruct, trained for math, logic, and commonsense.
Designed for small-model reasoning at the edge.
#ImplicitReasoning #SmallLLM #EfficientAI #ReasoningModels #ASPP #PiFlow

2 replies

OzTianlu

posted an update 6 months ago

Post

1196

🚀 Introducing Asterisk — Hybrid ASPP-Attention Architecture! 🌟

https://huggingface.co/NoesisLab/Asterisk

We’re excited to launch Asterisk, a cutting-edge language model by NoesisLab on Hugging Face! 🎉 Built on top of SmolLM2-135M-Instruct, Asterisk integrates Adjacency-Structured Parallel Propagation (ASPP) with standard attention to bring structured reasoning power into language modeling.

✨ Key Highlights:

🔹 Hybrid Architecture – Fuses graph-centric ASPP local reasoning with global attention for richer representations.
🔹 Enhanced Reasoning – ASPP enables iterative local state evolution that complements traditional transformer layers.
🔹 Efficient Design – ~171M parameters with smart supervised fine-tuning (Capybara dataset).
🔹 Flexible & Open – Apache-2.0 licensed and ready to integrate via Hugging Face 🤗 Transformers.

📈 Asterisk showcases how hybrid operators — inspired by theoretical frameworks like the Asterisk Operator — can bring structured reasoning into modern LMs in a scalable way.

👉 Try it out, explore the code, and start building: huggingface.co/NoesisLab/Asterisk

1 reply

OzTianlu

authored a paper 6 months ago

Reasoning: From Reflection to Solution

Paper • 2511.11712 • Published Nov 12, 2025 • 2

AI & ML interests

Recent Activity

Papers

Team members 1

ReasoningKingdom's activity

README

README