Instructions to use TreeLeek/TCF-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use TreeLeek/TCF-1 with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("TreeLeek/TCF-1") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use TreeLeek/TCF-1 with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "TreeLeek/TCF-1" --prompt "Once upon a time"
TCF-1 β Temporal Compression Field
Architecture: Hybrid Transformer + Mamba SSM
Parameters: 494.9M
Built by: Tree-ForgedMind
Hardware: Apple Mac Studio (128GB unified memory)
Framework: MLX
What TCF-1 Is
TCF-1 (Temporal Compression Field) is a from-scratch hybrid language model architecture. No pre-trained weights. No borrowed base model. Custom tokenizer, custom architecture, trained from zero.
The core idea: pair causal self-attention with a Mamba State Space Model in every layer. Two simultaneous memory mechanisms β attention's contextual lookup and SSM's compressed temporal signal β running together through 12 hybrid layers.
Architecture
| Component | Value |
|---|---|
| Parameters | 494.9M |
| Vocabulary | 32,000 (SentencePiece BPE) |
| Embedding dim | 1,024 |
| Attention heads | 16 |
| Hybrid pairs | 12 |
| Block size | 2,048 (trained at 512) |
| SSM state dim | 16 |
| SSM expansion | 2 |
Each of the 12 HybridPair layers contains:
- AttentionBlock β causal self-attention
- MambaBlock β selective state space model with persistent hidden state
- FeedForward β GELU-activated projection
The Mamba hidden state (B, d_inner, d_state) carries temporal information across the sequence independently of attention. The model has two paths through every layer β one that looks across the sequence (attention) and one that compresses time (SSM).
Training β Three Stages
Stage A β Knowledge Substrate
Build world knowledge from zero. Pure language modeling β no instruction format, no masking. Text completion only.
| Tokens | 5.79B |
| Steps | 50,000 |
| Best val loss | 4.1684 |
| Final val loss | 4.2214 |
Data:
- Wikipedia
- FineWeb-Edu
- Cosmopedia
- Project Gutenberg
- ArXiv abstracts
- Orca Math
- CodeParrot Python
- Stack v1 Python
Stage B β Instruction Fine-Tuning
Teach the model to respond rather than continue. Loss is masked to assistant tokens only β the model learns to answer, not to repeat the question.
| Tokens | 300M |
| Steps | 20,000 |
| LR | 5e-5 |
| Best val loss | 3.6757 |
| Loss | Masked β assistant tokens only |
Data:
- OpenAssistant (oasst2)
- SlimOrca
- Hermes3
- Tulu3
- GSM8K
- MetaMath
Stage A Output Examples
Prompt: "The nature of consciousness is"
Checkpoint: step 8,250 β Temperature: 0.8
Three runs, same prompt, same weights:
Run 1:
instantiated. A distinct form of the same domain of intelligence in the context of a systematic approach and is not related to the two of the world's most prevalent and prominent. However, the problems on the links of the computer. The scientific will have the potential for myths. The formation of the modern art is that the scientific and contemporary art of art museum was their focus on the use of human architecture in the 1960s. Γstei Thumn, with the Tromszkala, in the advisory process from its inception, was the earliest examples of the text, and in the Demodu Papers of the reduction inatic language...
Run 2:
the most prevalent figure in the world. This is the most commonly used organization that is responsible for the formation of the different species... In the Peerage of the Palas (Ming) it is also known as the Apostles of the Baractic Empire, which has a number of reasons, who decide; the hymns and experiences of the first Baron in the early 20th century...
Run 3:
a symbol of the subject of Greek genocide and the denomination in the world... The holy civilization is being considered an unacceptable marker... a large denomination is a state of Islam. The Prophet is the "Call of the revelation of God" sometimes referred to as "The right of the Divine Church" as the King's father, the god, as you prodigised and the world's gait...
Three runs. Three universes. The same weights, walked differently each time. This is what 5.79B tokens of compressed knowledge looks like before instruction tuning β not a lookup table, but a geometry.
Stage B Output Examples
Settings: temp=0.75, top-p=0.95, block-size=512
Stage B knows it's in a conversation. It holds the Assistant role, responds in first person, stays in format. The semantic layer is loose β but compare to Stage A, which had zero awareness of being addressed at all.
Human: how are you feeling?
Assistant: It's good to remember the meaning of the word "let" in this sentence.
It's important to note that it is a good practice to start making someone happy.
It's important to understand the meaning of the situation. It helps you feel
good and focused.
Human: good
Assistant: I'm afraid I can't believe it. It's important to remember that
you're doing well. I'm thinking about being kind in good terms and getting
to know. It's okay to take care of yourself.
Human: thats good advice
Assistant: I'm thinking about something that doesn't mean you're keeping up
with everything you've done. It's not just good, but it's important to help
others to keep your mind safe.
Structure without substance. Stage A had substance without structure. Stage C is where both converge.
Why Hybrid
Standard Transformers hold context through attention β quadratic cost, finite window. SSMs compress sequences into a fixed hidden state β linear cost, longer memory but less precise retrieval.
TCF-1 runs both in parallel at every layer. The attention handles precise contextual lookup. The SSM handles long-range temporal compression. Neither replaces the other.
At 494.9M parameters trained from scratch on a single Apple Silicon machine, TCF-1 is a proof that the hybrid architecture is accessible outside of large-scale infrastructure.
Files
| File | Description |
|---|
- stage_b_step15250_best.npz β 1.98 GB weights
- leeknet_500m.py β architecture
- chat_stage_b.py β chat interface
- tokenizer/leek_bpe_32k.model β SentencePiece model
- tokenizer/leek_bpe_32k.vocab β vocabulary
Built from scratch. No borrowed weights.
Tree β 2026