🧠 LaminarNet-50m

LaminarNet v0.6.2 — A novel O(N) causal language model featuring Geometric Drift Fields, Rotary Position Embeddings (RoPE), and multi-strata hierarchical processing. Designed as a faster, more efficient alternative to traditional Transformer architectures.

⚡ Key Innovation: LaminarNet replaces standard attention with a Geometric Drift Field — an O(N) selective state-space mechanism using vectorized parallel scan, achieving linear complexity instead of quadratic.


📊 Model Details

Property Value
Parameters 49.4M
Architecture LaminarNet v0.6.2
d_model 320
n_heads 5
n_layers 10
d_ff 1200
n_strata 2
strata_ratios (1, 2, 4)
Sequence Length 1024
Vocabulary GPT-2 BPE (50,257 tokens)
Best Val Loss 3.7881
Training Steps 115,500
Training Tokens ~1B (FineWeb)

🚀 Quick Start

1. Installation

pip install laminarnet safetensors transformers torch

Or install from source:

git clone https://huggingface.co/Uunan/LaminarNet-50m
cd LaminarNet-50m
pip install -e ./laminarnet

2. Download & Load Model

import torch
from collections import Counter
import torch.nn.functional as F
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Download model files
config_path = hf_hub_download("Uunan/LaminarNet-50m", "config.json")
model_path  = hf_hub_download("Uunan/LaminarNet-50m", "model.safetensors")

# Build model
from laminarnet import LaminarNet, LaminarNetConfig

config = LaminarNetConfig(
    vocab_size=50257, d_model=320, n_heads=5, n_layers=10,
    d_ff=1200, n_strata=2, strata_ratios=(1, 2, 4),
    seq_len=1024, dropout=0.1, conv_kernel=4, rope_base=10000.0,
)
model = LaminarNet(config)

# Load weights (head.weight is tied to tok_emb.weight, so strict=False)
from safetensors.torch import load_file
state = load_file(model_path)
model.load_state_dict(state, strict=False)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

print(f"✅ LaminarNet-50m loaded! ({sum(p.numel() for p in model.parameters()) / 1e6:.1f}M params)")

3. Generate Text

tokenizer = AutoTokenizer.from_pretrained("gpt2")

@torch.no_grad()
def generate(model, tokenizer, prompt, max_new_tokens=200,
             temperature=0.8, top_k=50, top_p=0.9, device="cpu",
             repetition_penalty=1.2, no_repeat_ngram=3, frequency_penalty=0.3):
    """
    Autoregressive text generation with full sampling controls.

    Args:
        prompt:             Input text to continue from.
        max_new_tokens:     Maximum number of tokens to generate.
        temperature:        Sampling temperature (0.1 = conservative, 1.5 = creative).
        top_k:              Top-k sampling (0 = disabled).
        top_p:              Nucleus sampling threshold (0-1).
        repetition_penalty: Penalize already-seen tokens (1.0 = off, 1.2+ = recommended).
        no_repeat_ngram:    Block repeated n-grams (3 = no trigram repeats, 0 = off).
        frequency_penalty:  Penalize tokens by their generation frequency (0.0 = off).
    """
    SEQ_LEN = 1024
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
    prompt_len = input_ids.shape[1]

    if input_ids.shape[1] > SEQ_LEN:
        input_ids = input_ids[:, -SEQ_LEN:]

    eos_id = tokenizer.eos_token_id

    for _ in range(max_new_tokens):
        ctx = input_ids[:, -SEQ_LEN:]
        logits = model(ctx)[:, -1, :]

        # Repetition Penalty
        if repetition_penalty != 1.0:
            for tok_id in set(input_ids[0].tolist()):
                if logits[0, tok_id] > 0:
                    logits[0, tok_id] /= repetition_penalty
                else:
                    logits[0, tok_id] *= repetition_penalty

        # Frequency Penalty
        if frequency_penalty > 0:
            token_counts = Counter(input_ids[0].tolist()[prompt_len:])
            for tok_id, count in token_counts.items():
                logits[0, tok_id] -= frequency_penalty * count

        # N-gram Blocking
        if no_repeat_ngram > 0 and input_ids.shape[1] >= no_repeat_ngram:
            generated = input_ids[0].tolist()
            prefix = tuple(generated[-(no_repeat_ngram - 1):])
            for i in range(len(generated) - no_repeat_ngram + 1):
                if tuple(generated[i:i + no_repeat_ngram - 1]) == prefix:
                    logits[0, generated[i + no_repeat_ngram - 1]] = float("-inf")

        logits = logits / temperature

        # Top-k
        if top_k > 0:
            vals, _ = torch.topk(logits, top_k)
            logits[logits < vals[:, -1:]] = float("-inf")

        # Top-p (nucleus)
        if 0 < top_p < 1.0:
            sorted_logits, sorted_idx = torch.sort(logits, descending=True)
            cum_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)
            remove_mask = cum_probs - F.softmax(sorted_logits, dim=-1) >= top_p
            sorted_logits[remove_mask] = float("-inf")
            logits = sorted_logits.scatter(1, sorted_idx, sorted_logits)

        probs = F.softmax(logits, dim=-1)
        next_tok = torch.multinomial(probs, num_samples=1)
        input_ids = torch.cat([input_ids, next_tok], dim=1)

        if next_tok.item() == eos_id:
            break

    return tokenizer.decode(input_ids[0], skip_special_tokens=True)


# Generate!
text = generate(model, tokenizer, "The meaning of life is", device=device)
print(text)

🎛️ Generation Parameters Guide

Parameter Default Range Description
temperature 0.8 0.1 — 2.0 Lower = more focused, higher = more creative
top_k 50 0 — vocab_size Keep only top-k most likely tokens
top_p 0.9 0.0 — 1.0 Nucleus sampling — keep tokens until cumulative prob ≥ p
repetition_penalty 1.2 1.0 — 2.0 Penalize previously generated tokens (1.0 = off)
no_repeat_ngram 3 0 — 5 Block repeated n-grams (0 = off)
frequency_penalty 0.3 0.0 — 1.0 Penalize tokens proportional to usage count (0 = off)
max_new_tokens 200 1 — 1024 Maximum tokens to generate

Recommended Presets

# 📝 Factual / Focused
text = generate(model, tokenizer, prompt, temperature=0.5, top_k=30, top_p=0.85, device=device)

# 🎨 Creative / Story Writing
text = generate(model, tokenizer, prompt, temperature=1.0, top_k=80, top_p=0.95,
                repetition_penalty=1.3, frequency_penalty=0.4, device=device)

# ⚡ Greedy (deterministic)
text = generate(model, tokenizer, prompt, temperature=0.1, top_k=1, top_p=1.0,
                repetition_penalty=1.0, device=device)

🏗️ Architecture Deep Dive

LaminarNet introduces a unique multi-strata architecture that processes information at multiple temporal resolutions simultaneously:

Input Tokens
    │
    ▼
┌─────────────┐
│  Embedding   │  Token → d_model (320)
└──────┬──────┘
       │
       ├──────── Stratum 0 (Fine, ratio=1)  ── Full resolution
       │
       └──────── Stratum 1 (Coarse, ratio=2) ── Half resolution 
       │
  ┌────┴────┐
  │  x10    │  LaminarBlock layers
  │ Layers  │
  └────┬────┘
       │
  Each LaminarBlock:
  ┌─────────────────────────────────────────┐
  │ 1. GeometricDriftField (per stratum)    │ ← O(N) parallel scan + RoPE
  │ 2. CrossStratumRouting (between strata) │ ← Dual-gated information exchange
  │ 3. SwiGLU FFN (per stratum)             │ ← Gated feed-forward
  └─────────────────────────────────────────┘
       │
       ▼
┌─────────────┐
│  LM Head    │  d_model → vocab (weight-tied with embedding)
└─────────────┘

Core Components

Component Description
Geometric Drift Field Replaces self-attention with an O(N) selective state-space mechanism. Uses chunked parallel scan (chunk_size=256) with vectorized inter-chunk carry propagation.
RoPE Rotary Position Embeddings applied to value vectors within the drift field for relative positional encoding.
Cross-Stratum Routing Bidirectional gated information flow between fine and coarse strata using causal downsampling (left-padded AvgPool) and nearest-neighbor upsampling.
SwiGLU FFN Gated feed-forward: SiLU(W1·x) ⊙ W3·x projected back via W2.
RMSNorm Pre-norm with Root Mean Square normalization (float32 stable).
Causal Conv1d Depthwise causal convolution (kernel=4) before drift field projections.

📈 Training Details

Property Value
Dataset FineWeb (~1B tokens, packed, no padding)
Tokenizer GPT-2 BPE (50,257 tokens)
Optimizer AdamW (lr=3e-4, weight_decay=0.01, betas=(0.9, 0.999))
LR Schedule Linear warmup (4000 steps) + cosine decay (min ratio 0.1)
Batch Size 8 × 1024 = 8,192 tokens/step
Gradient Clipping max_norm=1.0
Mixed Precision AMP (FP16) with GradScaler
Epochs 1
Total Steps ~115,500
Training Hardware Google Colab (single GPU)

📋 Sample Outputs

Prompt: "The meaning of life is"

The meaning of life is that it takes a good deal of time to understand the nature of our lives. When we look at it, you're not only one who can live in an illusion or experience other than just feeling like someone else and then having lost control over everything as well as knowing that there are many things that come down into your psyche but also for us all this stuff needs to be done right.

Prompt: "The meaning of life is"

The meaning of life is so intense that the person who makes it to Him will be able to accept a different image, one that he has been living in. His love for God and his grace was born out in her mind when she'd just finished living through what she knew – but I can imagine this: "My goodness knows that you are not alone!"


⚠️ Limitations

  • 50M parameters — This is a small research model, not suitable for production use.
  • English only — Trained exclusively on English text from FineWeb.
  • No instruction tuning — Base model only, not aligned or fine-tuned for chat/instructions.
  • May generate incorrect facts, biased content, or repetitive text.
  • 1024 token context — Maximum sequence length is 1024 tokens.

📜 Citation

@misc{laminarnet2025,
  title={LaminarNet: O(N) Language Model with Geometric Drift Fields},
  author={Uunan},
  year={2025},
  url={https://huggingface.co/Uunan/LaminarNet-50m}
}

📄 License

Apache 2.0 — free for research and commercial use.

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results