🧠 LaminarNet-50m
LaminarNet v0.6.2 — A novel O(N) causal language model featuring Geometric Drift Fields, Rotary Position Embeddings (RoPE), and multi-strata hierarchical processing. Designed as a faster, more efficient alternative to traditional Transformer architectures.
⚡ Key Innovation: LaminarNet replaces standard attention with a Geometric Drift Field — an O(N) selective state-space mechanism using vectorized parallel scan, achieving linear complexity instead of quadratic.
📊 Model Details
| Property | Value |
|---|---|
| Parameters | 49.4M |
| Architecture | LaminarNet v0.6.2 |
d_model |
320 |
n_heads |
5 |
n_layers |
10 |
d_ff |
1200 |
n_strata |
2 |
strata_ratios |
(1, 2, 4) |
| Sequence Length | 1024 |
| Vocabulary | GPT-2 BPE (50,257 tokens) |
| Best Val Loss | 3.7881 |
| Training Steps | 115,500 |
| Training Tokens | ~1B (FineWeb) |
🚀 Quick Start
1. Installation
pip install laminarnet safetensors transformers torch
Or install from source:
git clone https://huggingface.co/Uunan/LaminarNet-50m
cd LaminarNet-50m
pip install -e ./laminarnet
2. Download & Load Model
import torch
from collections import Counter
import torch.nn.functional as F
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
# Download model files
config_path = hf_hub_download("Uunan/LaminarNet-50m", "config.json")
model_path = hf_hub_download("Uunan/LaminarNet-50m", "model.safetensors")
# Build model
from laminarnet import LaminarNet, LaminarNetConfig
config = LaminarNetConfig(
vocab_size=50257, d_model=320, n_heads=5, n_layers=10,
d_ff=1200, n_strata=2, strata_ratios=(1, 2, 4),
seq_len=1024, dropout=0.1, conv_kernel=4, rope_base=10000.0,
)
model = LaminarNet(config)
# Load weights (head.weight is tied to tok_emb.weight, so strict=False)
from safetensors.torch import load_file
state = load_file(model_path)
model.load_state_dict(state, strict=False)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
print(f"✅ LaminarNet-50m loaded! ({sum(p.numel() for p in model.parameters()) / 1e6:.1f}M params)")
3. Generate Text
tokenizer = AutoTokenizer.from_pretrained("gpt2")
@torch.no_grad()
def generate(model, tokenizer, prompt, max_new_tokens=200,
temperature=0.8, top_k=50, top_p=0.9, device="cpu",
repetition_penalty=1.2, no_repeat_ngram=3, frequency_penalty=0.3):
"""
Autoregressive text generation with full sampling controls.
Args:
prompt: Input text to continue from.
max_new_tokens: Maximum number of tokens to generate.
temperature: Sampling temperature (0.1 = conservative, 1.5 = creative).
top_k: Top-k sampling (0 = disabled).
top_p: Nucleus sampling threshold (0-1).
repetition_penalty: Penalize already-seen tokens (1.0 = off, 1.2+ = recommended).
no_repeat_ngram: Block repeated n-grams (3 = no trigram repeats, 0 = off).
frequency_penalty: Penalize tokens by their generation frequency (0.0 = off).
"""
SEQ_LEN = 1024
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
prompt_len = input_ids.shape[1]
if input_ids.shape[1] > SEQ_LEN:
input_ids = input_ids[:, -SEQ_LEN:]
eos_id = tokenizer.eos_token_id
for _ in range(max_new_tokens):
ctx = input_ids[:, -SEQ_LEN:]
logits = model(ctx)[:, -1, :]
# Repetition Penalty
if repetition_penalty != 1.0:
for tok_id in set(input_ids[0].tolist()):
if logits[0, tok_id] > 0:
logits[0, tok_id] /= repetition_penalty
else:
logits[0, tok_id] *= repetition_penalty
# Frequency Penalty
if frequency_penalty > 0:
token_counts = Counter(input_ids[0].tolist()[prompt_len:])
for tok_id, count in token_counts.items():
logits[0, tok_id] -= frequency_penalty * count
# N-gram Blocking
if no_repeat_ngram > 0 and input_ids.shape[1] >= no_repeat_ngram:
generated = input_ids[0].tolist()
prefix = tuple(generated[-(no_repeat_ngram - 1):])
for i in range(len(generated) - no_repeat_ngram + 1):
if tuple(generated[i:i + no_repeat_ngram - 1]) == prefix:
logits[0, generated[i + no_repeat_ngram - 1]] = float("-inf")
logits = logits / temperature
# Top-k
if top_k > 0:
vals, _ = torch.topk(logits, top_k)
logits[logits < vals[:, -1:]] = float("-inf")
# Top-p (nucleus)
if 0 < top_p < 1.0:
sorted_logits, sorted_idx = torch.sort(logits, descending=True)
cum_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)
remove_mask = cum_probs - F.softmax(sorted_logits, dim=-1) >= top_p
sorted_logits[remove_mask] = float("-inf")
logits = sorted_logits.scatter(1, sorted_idx, sorted_logits)
probs = F.softmax(logits, dim=-1)
next_tok = torch.multinomial(probs, num_samples=1)
input_ids = torch.cat([input_ids, next_tok], dim=1)
if next_tok.item() == eos_id:
break
return tokenizer.decode(input_ids[0], skip_special_tokens=True)
# Generate!
text = generate(model, tokenizer, "The meaning of life is", device=device)
print(text)
🎛️ Generation Parameters Guide
| Parameter | Default | Range | Description |
|---|---|---|---|
temperature |
0.8 | 0.1 — 2.0 | Lower = more focused, higher = more creative |
top_k |
50 | 0 — vocab_size | Keep only top-k most likely tokens |
top_p |
0.9 | 0.0 — 1.0 | Nucleus sampling — keep tokens until cumulative prob ≥ p |
repetition_penalty |
1.2 | 1.0 — 2.0 | Penalize previously generated tokens (1.0 = off) |
no_repeat_ngram |
3 | 0 — 5 | Block repeated n-grams (0 = off) |
frequency_penalty |
0.3 | 0.0 — 1.0 | Penalize tokens proportional to usage count (0 = off) |
max_new_tokens |
200 | 1 — 1024 | Maximum tokens to generate |
Recommended Presets
# 📝 Factual / Focused
text = generate(model, tokenizer, prompt, temperature=0.5, top_k=30, top_p=0.85, device=device)
# 🎨 Creative / Story Writing
text = generate(model, tokenizer, prompt, temperature=1.0, top_k=80, top_p=0.95,
repetition_penalty=1.3, frequency_penalty=0.4, device=device)
# ⚡ Greedy (deterministic)
text = generate(model, tokenizer, prompt, temperature=0.1, top_k=1, top_p=1.0,
repetition_penalty=1.0, device=device)
🏗️ Architecture Deep Dive
LaminarNet introduces a unique multi-strata architecture that processes information at multiple temporal resolutions simultaneously:
Input Tokens
│
▼
┌─────────────┐
│ Embedding │ Token → d_model (320)
└──────┬──────┘
│
├──────── Stratum 0 (Fine, ratio=1) ── Full resolution
│
└──────── Stratum 1 (Coarse, ratio=2) ── Half resolution
│
┌────┴────┐
│ x10 │ LaminarBlock layers
│ Layers │
└────┬────┘
│
Each LaminarBlock:
┌─────────────────────────────────────────┐
│ 1. GeometricDriftField (per stratum) │ ← O(N) parallel scan + RoPE
│ 2. CrossStratumRouting (between strata) │ ← Dual-gated information exchange
│ 3. SwiGLU FFN (per stratum) │ ← Gated feed-forward
└─────────────────────────────────────────┘
│
▼
┌─────────────┐
│ LM Head │ d_model → vocab (weight-tied with embedding)
└─────────────┘
Core Components
| Component | Description |
|---|---|
| Geometric Drift Field | Replaces self-attention with an O(N) selective state-space mechanism. Uses chunked parallel scan (chunk_size=256) with vectorized inter-chunk carry propagation. |
| RoPE | Rotary Position Embeddings applied to value vectors within the drift field for relative positional encoding. |
| Cross-Stratum Routing | Bidirectional gated information flow between fine and coarse strata using causal downsampling (left-padded AvgPool) and nearest-neighbor upsampling. |
| SwiGLU FFN | Gated feed-forward: SiLU(W1·x) ⊙ W3·x projected back via W2. |
| RMSNorm | Pre-norm with Root Mean Square normalization (float32 stable). |
| Causal Conv1d | Depthwise causal convolution (kernel=4) before drift field projections. |
📈 Training Details
| Property | Value |
|---|---|
| Dataset | FineWeb (~1B tokens, packed, no padding) |
| Tokenizer | GPT-2 BPE (50,257 tokens) |
| Optimizer | AdamW (lr=3e-4, weight_decay=0.01, betas=(0.9, 0.999)) |
| LR Schedule | Linear warmup (4000 steps) + cosine decay (min ratio 0.1) |
| Batch Size | 8 × 1024 = 8,192 tokens/step |
| Gradient Clipping | max_norm=1.0 |
| Mixed Precision | AMP (FP16) with GradScaler |
| Epochs | 1 |
| Total Steps | ~115,500 |
| Training Hardware | Google Colab (single GPU) |
📋 Sample Outputs
Prompt: "The meaning of life is"
The meaning of life is that it takes a good deal of time to understand the nature of our lives. When we look at it, you're not only one who can live in an illusion or experience other than just feeling like someone else and then having lost control over everything as well as knowing that there are many things that come down into your psyche but also for us all this stuff needs to be done right.
Prompt: "The meaning of life is"
The meaning of life is so intense that the person who makes it to Him will be able to accept a different image, one that he has been living in. His love for God and his grace was born out in her mind when she'd just finished living through what she knew – but I can imagine this: "My goodness knows that you are not alone!"
⚠️ Limitations
- 50M parameters — This is a small research model, not suitable for production use.
- English only — Trained exclusively on English text from FineWeb.
- No instruction tuning — Base model only, not aligned or fine-tuned for chat/instructions.
- May generate incorrect facts, biased content, or repetitive text.
- 1024 token context — Maximum sequence length is 1024 tokens.
📜 Citation
@misc{laminarnet2025,
title={LaminarNet: O(N) Language Model with Geometric Drift Fields},
author={Uunan},
year={2025},
url={https://huggingface.co/Uunan/LaminarNet-50m}
}
📄 License
Apache 2.0 — free for research and commercial use.
- Downloads last month
- 12
Evaluation results
- Validation Lossself-reported3.788