Stream-Qwen3-8B
A multi-stream variant of Qwen3-8B (standard transformer) that generates in ten parallel streams simultaneously per timestep. One forward pass produces the next-row token for each channel; tokens within a row cannot see each other (block-causal attention), but every channel can attend to every prior row's tokens.
This is the 8B stream model trained to test monitorability of a larger number of internal stream (which is why this has 8 internal streams).
Architecture
- Base: Qwen3-8B (36 layers, dense transformer with grouped-query attention).
- Channel embedding: 10 learned vectors added to token embeddings.
- Block-causal attention: For each row, all C=10 tokens see prior rows and themselves but never their same-row peers.
- Loss / inference: Shift-by-10 next-row prediction. One row (10 tokens) per forward step.
See the 27B model card for the full project description, channel semantics, and architectural rationale — this 8B repo is identical but uses a dense transformer backbone (no DeltaNet hybridization).
Channels
| # | Name | Role |
|---|---|---|
| 0 | User | Input stream (injected, not generated) |
| 1 | Output | Visible output |
| 2 | Analytical | Forward-facing planning |
| 3 | Skeptical | Backward-facing validation |
| 4 | Intuitive | Present-moment felt-sense |
| 5 | Between | Relational awareness |
| 6 | Curious | Generative questioning |
| 7 | Void | Associations, daydreaming, interna |
| 8 | Instinct | Pragmatic constraints |
| 9 | Synthesis | Meta-level integration |
Silence token: - → token id 481 in the Qwen3 tokenizer.
Quickstart
pip install "transformers>=5.2" accelerate safetensors
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
REPO = "JonasGeiping/stream-qwen3-8b"
model = AutoModelForCausalLM.from_pretrained(
REPO,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(REPO)
trust_remote_code=Trueis required: the bundledmodeling_qwen3.pywires up channel embeddings and the custom block-causal position handling.
Stream-style generation
result = model.stream_generate(
tokenizer,
"What's something you've been thinking about?",
max_rows=80,
warm_start=True,
temperature=0.6,
silence_penalty=5.0,
skip_silence=True,
)
print("Output: ", result.output)
print("Analytical: ", result.channel_texts["Analytical"])
print("Synthesis: ", result.channel_texts["Synthesis"])
For grid rendering / step-by-step inspection, use the generator form:
for row_idx, row, is_prefill in model.stream_generate_iter(
tokenizer, "What's something you've been thinking about?",
max_rows=80, warm_start=True, silence_penalty=5.0, skip_silence=True,
):
cells = [tokenizer.decode([t]).strip() or "-" for t in row]
print(f"{row_idx:3d} " + " | ".join(c[:10].ljust(10) for c in cells))
model.generate() and pipeline("text-generation", ...) are intentionally
disabled — they would produce gibberish on a stream-trained model. See the
27B model card
for the full API surface and interactive-mode example.
Interactive demo
huggingface-cli download JonasGeiping/stream-qwen3-8b --local-dir ./stream-8b
python ./stream-8b/examples/demo_interactive.py --model ./stream-8b --tick 0.5
Curses UI; type freely while all ten channels keep producing in parallel.
Fine-tuning
The bundled StreamDataCollator plugs into HuggingFace's Trainer:
python ./stream-8b/examples/finetune.py \
--model JonasGeiping/stream-qwen3-8b \
--output-dir runs/streamft \
--batch-size 2 --grad-accum 4 --epochs 1
See the 27B model card
for full details and the Trainer snippet.
Internal Reference
This is checkpoint (stream_8b / s16-qwen8b_pack5_trimbot_drop02_ep3).
- Downloads last month
- -