🧠 LaminarNet-50m

LaminarNet v0.6.2 — A novel O(N) causal language model featuring Geometric Drift Fields, Rotary Position Embeddings (RoPE), and multi-strata hierarchical processing. Designed as a faster, more efficient alternative to traditional Transformer architectures.

⚡ Key Innovation: LaminarNet replaces standard attention with a Geometric Drift Field — an O(N) selective state-space mechanism using vectorized parallel scan, achieving linear complexity instead of quadratic.

📊 Model Details

Property	Value
Parameters	49.4M
Architecture	LaminarNet v0.6.2
`d_model`	320
`n_heads`	5
`n_layers`	10
`d_ff`	1200
`n_strata`	2
`strata_ratios`	(1, 2, 4)
Sequence Length	1024
Vocabulary	GPT-2 BPE (50,257 tokens)
Best Val Loss	3.7881
Training Steps	115,500
Training Tokens	~1B (FineWeb)

🚀 Quick Start

1. Installation

pip install laminarnet safetensors transformers torch

Or install from source:

git clone https://huggingface.co/Uunan/LaminarNet-50m
cd LaminarNet-50m
pip install -e ./laminarnet

2. Download & Load Model

import torch
from collections import Counter
import torch.nn.functional as F
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Download model files
config_path = hf_hub_download("Uunan/LaminarNet-50m", "config.json")
model_path  = hf_hub_download("Uunan/LaminarNet-50m", "model.safetensors")

# Build model
from laminarnet import LaminarNet, LaminarNetConfig

config = LaminarNetConfig(
    vocab_size=50257, d_model=320, n_heads=5, n_layers=10,
    d_ff=1200, n_strata=2, strata_ratios=(1, 2, 4),
    seq_len=1024, dropout=0.1, conv_kernel=4, rope_base=10000.0,
)
model = LaminarNet(config)

# Load weights (head.weight is tied to tok_emb.weight, so strict=False)
from safetensors.torch import load_file
state = load_file(model_path)
model.load_state_dict(state, strict=False)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

print(f"✅ LaminarNet-50m loaded! ({sum(p.numel() for p in model.parameters()) / 1e6:.1f}M params)")

3. Generate Text

tokenizer = AutoTokenizer.from_pretrained("gpt2")

@torch.no_grad()
def generate(model, tokenizer, prompt, max_new_tokens=200,
             temperature=0.8, top_k=50, top_p=0.9, device="cpu",
             repetition_penalty=1.2, no_repeat_ngram=3, frequency_penalty=0.3):
    """
    Autoregressive text generation with full sampling controls.

    Args:
        prompt:             Input text to continue from.
        max_new_tokens:     Maximum number of tokens to generate.
        temperature:        Sampling temperature (0.1 = conservative, 1.5 = creative).
        top_k:              Top-k sampling (0 = disabled).
        top_p:              Nucleus sampling threshold (0-1).
        repetition_penalty: Penalize already-seen tokens (1.0 = off, 1.2+ = recommended).
        no_repeat_ngram:    Block repeated n-grams (3 = no trigram repeats, 0 = off).
        frequency_penalty:  Penalize tokens by their generation frequency (0.0 = off).
    """
    SEQ_LEN = 1024
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
    prompt_len = input_ids.shape[1]

    if input_ids.shape[1] > SEQ_LEN:
        input_ids = input_ids[:, -SEQ_LEN:]

    eos_id = tokenizer.eos_token_id

    for _ in range(max_new_tokens):
        ctx = input_ids[:, -SEQ_LEN:]
        logits = model(ctx)[:, -1, :]

        # Repetition Penalty
        if repetition_penalty != 1.0:
            for tok_id in set(input_ids[0].tolist()):
                if logits[0, tok_id] > 0:
                    logits[0, tok_id] /= repetition_penalty
                else:
                    logits[0, tok_id] *= repetition_penalty

        # Frequency Penalty
        if frequency_penalty > 0:
            token_counts = Counter(input_ids[0].tolist()[prompt_len:])
            for tok_id, count in token_counts.items():
                logits[0, tok_id] -= frequency_penalty * count

        # N-gram Blocking
        if no_repeat_ngram > 0 and input_ids.shape[1] >= no_repeat_ngram:
            generated = input_ids[0].tolist()
            prefix = tuple(generated[-(no_repeat_ngram - 1):])
            for i in range(len(generated) - no_repeat_ngram + 1):
                if tuple(generated[i:i + no_repeat_ngram - 1]) == prefix:
                    logits[0, generated[i + no_repeat_ngram - 1]] = float("-inf")

        logits = logits / temperature

        # Top-k
        if top_k > 0:
            vals, _ = torch.topk(logits, top_k)
            logits[logits < vals[:, -1:]] = float("-inf")

        # Top-p (nucleus)
        if 0 < top_p < 1.0:
            sorted_logits, sorted_idx = torch.sort(logits, descending=True)
            cum_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)
            remove_mask = cum_probs - F.softmax(sorted_logits, dim=-1) >= top_p
            sorted_logits[remove_mask] = float("-inf")
            logits = sorted_logits.scatter(1, sorted_idx, sorted_logits)

        probs = F.softmax(logits, dim=-1)
        next_tok = torch.multinomial(probs, num_samples=1)
        input_ids = torch.cat([input_ids, next_tok], dim=1)

        if next_tok.item() == eos_id:
            break

    return tokenizer.decode(input_ids[0], skip_special_tokens=True)


# Generate!
text = generate(model, tokenizer, "The meaning of life is", device=device)
print(text)

🎛️ Generation Parameters Guide

Parameter	Default	Range	Description
`temperature`	0.8	0.1 — 2.0	Lower = more focused, higher = more creative
`top_k`	50	0 — vocab_size	Keep only top-k most likely tokens
`top_p`	0.9	0.0 — 1.0	Nucleus sampling — keep tokens until cumulative prob ≥ p
`repetition_penalty`	1.2	1.0 — 2.0	Penalize previously generated tokens (1.0 = off)
`no_repeat_ngram`	3	0 — 5	Block repeated n-grams (0 = off)
`frequency_penalty`	0.3	0.0 — 1.0	Penalize tokens proportional to usage count (0 = off)
`max_new_tokens`	200	1 — 1024	Maximum tokens to generate

Recommended Presets

# 📝 Factual / Focused
text = generate(model, tokenizer, prompt, temperature=0.5, top_k=30, top_p=0.85, device=device)

# 🎨 Creative / Story Writing
text = generate(model, tokenizer, prompt, temperature=1.0, top_k=80, top_p=0.95,
                repetition_penalty=1.3, frequency_penalty=0.4, device=device)

# ⚡ Greedy (deterministic)
text = generate(model, tokenizer, prompt, temperature=0.1, top_k=1, top_p=1.0,
                repetition_penalty=1.0, device=device)

🏗️ Architecture Deep Dive

LaminarNet introduces a unique multi-strata architecture that processes information at multiple temporal resolutions simultaneously:

Input Tokens
    │
    ▼
┌─────────────┐
│  Embedding   │  Token → d_model (320)
└──────┬──────┘
       │
       ├──────── Stratum 0 (Fine, ratio=1)  ── Full resolution
       │
       └──────── Stratum 1 (Coarse, ratio=2) ── Half resolution 
       │
  ┌────┴────┐
  │  x10    │  LaminarBlock layers
  │ Layers  │
  └────┬────┘
       │
  Each LaminarBlock:
  ┌─────────────────────────────────────────┐
  │ 1. GeometricDriftField (per stratum)    │ ← O(N) parallel scan + RoPE
  │ 2. CrossStratumRouting (between strata) │ ← Dual-gated information exchange
  │ 3. SwiGLU FFN (per stratum)             │ ← Gated feed-forward
  └─────────────────────────────────────────┘
       │
       ▼
┌─────────────┐
│  LM Head    │  d_model → vocab (weight-tied with embedding)
└─────────────┘

Core Components

Component	Description
Geometric Drift Field	Replaces self-attention with an O(N) selective state-space mechanism. Uses chunked parallel scan (chunk_size=256) with vectorized inter-chunk carry propagation.
RoPE	Rotary Position Embeddings applied to value vectors within the drift field for relative positional encoding.
Cross-Stratum Routing	Bidirectional gated information flow between fine and coarse strata using causal downsampling (left-padded AvgPool) and nearest-neighbor upsampling.
SwiGLU FFN	Gated feed-forward: `SiLU(W1·x) ⊙ W3·x` projected back via `W2`.
RMSNorm	Pre-norm with Root Mean Square normalization (float32 stable).
Causal Conv1d	Depthwise causal convolution (kernel=4) before drift field projections.

📈 Training Details

Property	Value
Dataset	FineWeb (~1B tokens, packed, no padding)
Tokenizer	GPT-2 BPE (50,257 tokens)
Optimizer	AdamW (lr=3e-4, weight_decay=0.01, betas=(0.9, 0.999))
LR Schedule	Linear warmup (4000 steps) + cosine decay (min ratio 0.1)
Batch Size	8 × 1024 = 8,192 tokens/step
Gradient Clipping	max_norm=1.0
Mixed Precision	AMP (FP16) with GradScaler
Epochs	1
Total Steps	~115,500
Training Hardware	Google Colab (single GPU)

📋 Sample Outputs

Prompt: "The meaning of life is"

The meaning of life is that it takes a good deal of time to understand the nature of our lives. When we look at it, you're not only one who can live in an illusion or experience other than just feeling like someone else and then having lost control over everything as well as knowing that there are many things that come down into your psyche but also for us all this stuff needs to be done right.

Prompt: "The meaning of life is"

The meaning of life is so intense that the person who makes it to Him will be able to accept a different image, one that he has been living in. His love for God and his grace was born out in her mind when she'd just finished living through what she knew – but I can imagine this: "My goodness knows that you are not alone!"

⚠️ Limitations

50M parameters — This is a small research model, not suitable for production use.
English only — Trained exclusively on English text from FineWeb.
No instruction tuning — Base model only, not aligned or fine-tuned for chat/instructions.
May generate incorrect facts, biased content, or repetitive text.
1024 token context — Maximum sequence length is 1024 tokens.

📜 Citation

@misc{laminarnet2025,
  title={LaminarNet: O(N) Language Model with Geometric Drift Fields},
  author={Uunan},
  year={2025},
  url={https://huggingface.co/Uunan/LaminarNet-50m}
}

📄 License

Apache 2.0 — free for research and commercial use.

Downloads last month: 5

Evaluation results

Validation Loss
self-reported

3.788