TCF-1 — Temporal Compression Field

Architecture: Hybrid Transformer + Mamba SSM
Parameters: 494.9M
Built by: Tree-ForgedMind
Hardware: Apple Mac Studio (128GB unified memory)
Framework: MLX

What TCF-1 Is

TCF-1 (Temporal Compression Field) is a from-scratch hybrid language model architecture. No pre-trained weights. No borrowed base model. Custom tokenizer, custom architecture, trained from zero.

The core idea: pair causal self-attention with a Mamba State Space Model in every layer. Two simultaneous memory mechanisms — attention's contextual lookup and SSM's compressed temporal signal — running together through 12 hybrid layers.

Architecture

Component	Value
Parameters	494.9M
Vocabulary	32,000 (SentencePiece BPE)
Embedding dim	1,024
Attention heads	16
Hybrid pairs	12
Block size	2,048 (trained at 512)
SSM state dim	16
SSM expansion	2

Each of the 12 HybridPair layers contains:

AttentionBlock — causal self-attention
MambaBlock — selective state space model with persistent hidden state
FeedForward — GELU-activated projection

The Mamba hidden state (B, d_inner, d_state) carries temporal information across the sequence independently of attention. The model has two paths through every layer — one that looks across the sequence (attention) and one that compresses time (SSM).

Training — Three Stages

Stage A — Knowledge Substrate

Build world knowledge from zero. Pure language modeling — no instruction format, no masking. Text completion only.


Tokens	5.79B
Steps	50,000
Best val loss	4.1684
Final val loss	4.2214

Data:

Wikipedia
FineWeb-Edu
Cosmopedia
Project Gutenberg
ArXiv abstracts
Orca Math
CodeParrot Python
Stack v1 Python

Stage B — Instruction Fine-Tuning

Teach the model to respond rather than continue. Loss is masked to assistant tokens only — the model learns to answer, not to repeat the question.


Tokens	300M
Steps	20,000
LR	5e-5
Best val loss	3.6757
Loss	Masked — assistant tokens only

Data:

OpenAssistant (oasst2)
SlimOrca
Hermes3
Tulu3
GSM8K
MetaMath

Stage A Output Examples

Prompt: "The nature of consciousness is"
Checkpoint: step 8,250 — Temperature: 0.8

Three runs, same prompt, same weights:

Run 1:

instantiated. A distinct form of the same domain of intelligence in the context of a systematic approach and is not related to the two of the world's most prevalent and prominent. However, the problems on the links of the computer. The scientific will have the potential for myths. The formation of the modern art is that the scientific and contemporary art of art museum was their focus on the use of human architecture in the 1960s. Östei Thumn, with the Tromszkala, in the advisory process from its inception, was the earliest examples of the text, and in the Demodu Papers of the reduction inatic language...

Run 2:

the most prevalent figure in the world. This is the most commonly used organization that is responsible for the formation of the different species... In the Peerage of the Palas (Ming) it is also known as the Apostles of the Baractic Empire, which has a number of reasons, who decide; the hymns and experiences of the first Baron in the early 20th century...

Run 3:

a symbol of the subject of Greek genocide and the denomination in the world... The holy civilization is being considered an unacceptable marker... a large denomination is a state of Islam. The Prophet is the "Call of the revelation of God" sometimes referred to as "The right of the Divine Church" as the King's father, the god, as you prodigised and the world's gait...

Three runs. Three universes. The same weights, walked differently each time. This is what 5.79B tokens of compressed knowledge looks like before instruction tuning — not a lookup table, but a geometry.

Stage B Output Examples

Settings: temp=0.75, top-p=0.95, block-size=512

Stage B knows it's in a conversation. It holds the Assistant role, responds in first person, stays in format. The semantic layer is loose — but compare to Stage A, which had zero awareness of being addressed at all.

Human: how are you feeling?
Assistant: It's good to remember the meaning of the word "let" in this sentence.
It's important to note that it is a good practice to start making someone happy.
It's important to understand the meaning of the situation. It helps you feel
good and focused.
Human: good
Assistant: I'm afraid I can't believe it. It's important to remember that
you're doing well. I'm thinking about being kind in good terms and getting
to know. It's okay to take care of yourself.

Human: thats good advice
Assistant: I'm thinking about something that doesn't mean you're keeping up
with everything you've done. It's not just good, but it's important to help
others to keep your mind safe.

Structure without substance. Stage A had substance without structure. Stage C is where both converge.

Why Hybrid

Standard Transformers hold context through attention — quadratic cost, finite window. SSMs compress sequences into a fixed hidden state — linear cost, longer memory but less precise retrieval.

TCF-1 runs both in parallel at every layer. The attention handles precise contextual lookup. The SSM handles long-range temporal compression. Neither replaces the other.

At 494.9M parameters trained from scratch on a single Apple Silicon machine, TCF-1 is a proof that the hybrid architecture is accessible outside of large-scale infrastructure.

Files

File	Description

stage_b_step15250_best.npz — 1.98 GB weights
leeknet_500m.py — architecture
chat_stage_b.py — chat interface
tokenizer/leek_bpe_32k.model — SentencePiece model
tokenizer/leek_bpe_32k.vocab — vocabulary

Built from scratch. No borrowed weights.
Tree — 2026

Downloads last month: -; Downloads are not tracked for this model. How to track