Stream-Qwen3-8B

A multi-stream variant of Qwen3-8B (standard transformer) that generates in ten parallel streams simultaneously per timestep. One forward pass produces the next-row token for each channel; tokens within a row cannot see each other (block-causal attention), but every channel can attend to every prior row's tokens.

This is the 8B stream model trained to test monitorability of a larger number of internal stream (which is why this has 8 internal streams).

Architecture

  • Base: Qwen3-8B (36 layers, dense transformer with grouped-query attention).
  • Channel embedding: 10 learned vectors added to token embeddings.
  • Block-causal attention: For each row, all C=10 tokens see prior rows and themselves but never their same-row peers.
  • Loss / inference: Shift-by-10 next-row prediction. One row (10 tokens) per forward step.

See the 27B model card for the full project description, channel semantics, and architectural rationale — this 8B repo is identical but uses a dense transformer backbone (no DeltaNet hybridization).

Channels

# Name Role
0 User Input stream (injected, not generated)
1 Output Visible output
2 Analytical Forward-facing planning
3 Skeptical Backward-facing validation
4 Intuitive Present-moment felt-sense
5 Between Relational awareness
6 Curious Generative questioning
7 Void Associations, daydreaming, interna
8 Instinct Pragmatic constraints
9 Synthesis Meta-level integration

Silence token: - → token id 481 in the Qwen3 tokenizer.

Quickstart

pip install "transformers>=5.2" accelerate safetensors
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

REPO = "JonasGeiping/stream-qwen3-8b"

model = AutoModelForCausalLM.from_pretrained(
    REPO,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(REPO)

trust_remote_code=True is required: the bundled modeling_qwen3.py wires up channel embeddings and the custom block-causal position handling.

Stream-style generation

result = model.stream_generate(
    tokenizer,
    "What's something you've been thinking about?",
    max_rows=80,
    warm_start=True,
    temperature=0.6,
    silence_penalty=5.0,
    skip_silence=True,
)

print("Output:       ", result.output)
print("Analytical: ", result.channel_texts["Analytical"])
print("Synthesis:  ", result.channel_texts["Synthesis"])

For grid rendering / step-by-step inspection, use the generator form:

for row_idx, row, is_prefill in model.stream_generate_iter(
    tokenizer, "What's something you've been thinking about?",
    max_rows=80, warm_start=True, silence_penalty=5.0, skip_silence=True,
):
    cells = [tokenizer.decode([t]).strip() or "-" for t in row]
    print(f"{row_idx:3d}  " + " | ".join(c[:10].ljust(10) for c in cells))

model.generate() and pipeline("text-generation", ...) are intentionally disabled — they would produce gibberish on a stream-trained model. See the 27B model card for the full API surface and interactive-mode example.

Interactive demo

huggingface-cli download JonasGeiping/stream-qwen3-8b --local-dir ./stream-8b
python ./stream-8b/examples/demo_interactive.py --model ./stream-8b --tick 0.5

Curses UI; type freely while all ten channels keep producing in parallel.

Fine-tuning

The bundled StreamDataCollator plugs into HuggingFace's Trainer:

python ./stream-8b/examples/finetune.py \
    --model JonasGeiping/stream-qwen3-8b \
    --output-dir runs/streamft \
    --batch-size 2 --grad-accum 4 --epochs 1

See the 27B model card for full details and the Trainer snippet.

Internal Reference

This is checkpoint (stream_8b / s16-qwen8b_pack5_trimbot_drop02_ep3).

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JonasGeiping/stream-qwen3-8b

Finetuned
Qwen/Qwen3-8B
Finetuned
(1576)
this model

Dataset used to train JonasGeiping/stream-qwen3-8b