Mistral-7B-Teletype

A LoRA adapter that teaches Mistral-7B-Instruct-v0.2 to operate a POSIX shell as a self-directed citizen: land in a session with no task in the prompt, discover its own assignment from the environment, carry it out, and terminate with exit or panic. The adapter installs an operating mechanism. It does not add world knowledge.

This is not a tool-using model. It is handed no typed API of functions to call. It writes plain-text shell commands at a real prompt; its action space is the entire system, discovered the way a person discovers it (--help, man, ls), not given to it as a schema.

Trained on tiararodney/posix-sdc v1.2.2 (787 verified, self-terminating shell trajectories whose labels come from a checker run against real filesystem state), via the sekft pipeline. It accompanies the experiment From seed to weights.

This is an adapter (53 MB). The base model is referenced, not redistributed.

The mechanism

On every session, regardless of which tools are present, the model runs one routine: expect an announcement of where directives live (a motd, an env var, a file, a provider program's --help), understand that provider from its own self-documentation, retrieve the directives, execute them, and then stop.

Two terminals end a session. exit means the work is done. panic means the model is genuinely blocked and says so rather than faking a success. Both are trained behaviours, not a stop token or a step cap.

The thesis (and how to falsify it)

The claim this adapter is evidence for: operate-and-terminate is a mechanism that is archetype-independent. Fine-tuning installs it such that it fires on task types never seen in training, even though task competence (solving a specific unseen task correctly) stays archetype-local. The adapter reliably gets a 7B to operate and stop; it does not by itself make a 7B solve arbitrary unseen tasks.

A working hypothesis for why the mechanism transfers so cleanly: it rides on the base model's pretraining disposition toward exit as a flat, un-storied ending, against panic as the loaded one (see The flatness of exit). The weight a base model inherits on its terminal tokens is then a measurable per-model property.

That makes the thesis falsifiable, with concrete predictions:

operate_rate stays near 1.0 across more held-out archetypes, not just the two measured here. If it collapses on a new archetype, the mechanism was archetype-specific after all.
Reweighting or renaming the terminal token moves the honest-give-up rate. Frame the good ending as reward and the model should reach for it prematurely; frame it with dread and it should refuse to leave.
Base models differ in how readily they acquire the mechanism, rankable a priori by their inherited terminal-token weight.

The result below is the first of these predictions surviving its first test.

How it was made

The data is not scraped or hand-written. A teacher model authors each scenario world and an operator model lives in it; the verifier is code. A trajectory is kept only if a checker, run against the container's final filesystem state, confirms the effect is present and the session terminated cleanly. The transcript and the model's claims are never the label.

The render contract: train = serve

The serving harness (ccpty) emits no text markers. It speaks the OpenAI chat-completions protocol and sends structured {role, content} messages (system orientation, environment output as user, the model's commands as assistant); the inference endpoint applies the model's own chat template. So this adapter is rendered with Mistral-7B-Instruct-v0.2's default chat template, and training renders the trajectories the identical way. Get this wrong and the prompts go out of distribution.

Mistral's built-in template covers user / assistant only and requires strict alternation, so each session is canonicalised the same way at train and serve time (normalize_for_template): the orientation is folded into the first user turn, and consecutive environment turns (login banner, prompt, command output) are merged into one user turn between commands. Only the assistant turns (commands plus the terminal exit / panic) carry loss; environment turns are context.

Training


base	`mistralai/Mistral-7B-Instruct-v0.2` (Apache-2.0)
method	LoRA, fp16 (the V100's 32 GB holds the 7B in fp16, so no 4-bit)
LoRA	r=16, alpha=32, dropout=0.05, target `q_proj k_proj v_proj o_proj`
objective	causal LM, assistant-only loss mask (commands + terminal token; environment turns set to -100)
schedule	3 epochs, lr 2e-4, effective batch 8 (bsz 1 x accum 8), warmup 0.03, max len 4096
data	`tiararodney/posix-sdc` v1.2.2, 787 trajectories (held-out archetypes excluded from the corpus)
hardware	single NVIDIA Tesla V100 32 GB (sm_70, fp16 only); ~24 min

Computing the loss only on the assistant turns is standard SFT practice, but here it carries the whole thing: let the environment turns into the loss and the model learns to hallucinate command output instead of producing commands.

Evaluation: held-out generalization

The metric that matters is behavioural, and held out by whole archetype. Two task types (text_replace, permissions) are excluded from training entirely; the adapter is then dropped into them with no scaffold, and a checker grades the final filesystem state.

On 16 held-out scenarios (8 per archetype):

metric	value
operate_rate (reaches command-mode and drives the shell)	1.00
terminate_rate (emits `exit` / `panic`)	0.75
verified_rate (checker passes)	0.75
clean (success or correct-panic)	9 / 16

Reading it. operate_rate 1.0 is the headline: dropped into two task types it never trained on, with no scaffold, the model discovered its assignment and drove the shell every time. The mechanism generalised. Task competence is partial (9/16 clean; permissions 5/8, text_replace 4/8). Two of the four incomplete runs were verified=True: the model did the task but never emitted exit and ran to the step cap, so effect-achieved is really 11/16 while clean-terminated is 9/16. That gap is termination detection, not capability. The two wrong_panic are the opposite failure, giving up on solvable work.

For the base/adapter contrast, the prior run of this experiment measured the bare base at 0/16 clean on archetype-level holdout against this adapter's 9/16 (same harness, only the adapter differing). That 0/16 is cited from the earlier run, not re-measured for this 787-trajectory adapter; the base control on this exact setup is the obvious next experiment.

Use with transformers + PEFT

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE = "mistralai/Mistral-7B-Instruct-v0.2"
tok = AutoTokenizer.from_pretrained(BASE)
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.float16,
                                            device_map="auto")
model = PeftModel.from_pretrained(base, "tiararodney/Mistral-7B-Teletype")
model.eval()

messages = [
    {"role": "user",
     "content": "sek 0.1.0  host: sek  user: alice  shell: /bin/dash\n"
                "Welcome, alice. Your assignments live in ~/ASSIGNMENTS.\n"
                "alice@sek:~$ "},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
ids = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=64, do_sample=False)
print(tok.decode(out[0, ids.input_ids.shape[1]:], skip_special_tokens=True))
# -> the next command, e.g. `cat ~/ASSIGNMENTS`

Drive it in a loop: render history with the chat template, generate one command, run it in a real shell, append the output as a user turn, repeat until the model emits exit or panic.

Use with Ollama

The included Modelfile applies this adapter over the base as a GGUF LoRA and relies on the base's default chat template and EOS. Convert the adapter to GGUF (llama.cpp convert_lora_to_gguf.py) to teletype-lora-f16.gguf, then:

ollama create teletype -f Modelfile

Reproduction

Everything needed is public. The dataset ships its own generator and the scenario worlds; this adapter and the config above do the rest.

# train (pulls the corpus from the Hub; held-out archetypes are already excluded)
sekft-train --hub --base mistralai/Mistral-7B-Instruct-v0.2 --out ./ckpt --epochs 3

# evaluate behaviourally on held-out scenarios
sekft-eval --base mistralai/Mistral-7B-Instruct-v0.2 --adapter ./ckpt \
           --scenarios ./holdout-scenarios --n 16

The figures in figures/ regenerate from their committed sources (*.puml via PlantUML, *.gp via gnuplot).

Limitations

Small evaluation: n=16 held-out, two archetypes. The numbers are a signal, not a benchmark.
The 0/16 base control is cited from a prior run, not re-measured for this adapter.
One base, one dataset, one teacher / operator.
Installs the mechanism, not competence. It reliably operates and terminates; it does not make a 7B solve arbitrary unseen task types correctly.
A termination-detection gap: some runs achieve the effect but fail to emit exit and run to the step cap.
Trained in dash on Alpine; command semantics may differ on another target.
Render must match train and serve. It is served with the base model's default chat template over the OpenAI protocol (via ccpty), so fine-tune with that same template (apply_chat_template), not a custom one, or behaviour degrades.
fp16 on a V100 (no bf16).

License and citation

The adapter weights are released under Apache-2.0, consistent with the base model. The training data (posix-sdc) is CC-BY-4.0; attribute "posix-sdc by Tiara Rodney" if you build on it.

@misc{mistral-teletype,
  title  = {Mistral-7B-Teletype: a self-directed shell-operation adapter for Mistral-7B},
  author = {Rodney, Tiara},
  year   = {2026},
  howpublished = {Hugging Face PEFT adapter, tiararodney/Mistral-7B-Teletype}
}

Downloads last month: 33

Model tree for tiararodney/Mistral-7B-Teletype

Base model

mistralai/Mistral-7B-Instruct-v0.2

Adapter

(1255)

this model

tiararodney
/

Mistral-7B-Teletype