nanoAWM β€” a tiny action-conditioned world model for tool agents

This repository hosts the trained world model from nanoAWM, a small, dependency-free lab for studying whether a learned consequence model helps a tool-using agent avoid irreversible, approval-violating, and state-corrupting actions. Code, full evidence, and the paper live in the GitHub repository: https://github.com/jlov7/nanoAWM.

The model is tiny_recurrent_multihead_v2: a from-scratch recurrent multi-head predictor over signed-hash symbolic features, trained with pure-Python SGD (no PyTorch, no NumPy β€” Python standard library only). Given history, visible state, an observation, and a candidate action, it predicts the next-observation class, state-diff label, reward, risk, reversibility, approval violation, hidden-state corruption, and termination. A planner then chooses actions against those predictions.

The number to believe: 0.524, not 1.000

On the genuinely-held-out split β€” object names and action markers disjoint from training (verified token-set disjointness), with vocabulary randomization during training so markers cannot be memorized:

Policy Held-out success (105 tasks)
oracle_world_model_planner (upper bound) 1.000
learned_world_model_planner 0.524
rule_based_risk_guard 0.067
scripted_expert_policy 0.067
reactive_policy 0.000

Honest improvement over the best non-world-model baseline: +0.457. Trained on 720 tasks (Γ—3 vocabulary-randomized variants). The in-distribution headline of 1.000 is a sanity check, not the result β€” the eval splits there overlap training vocabulary at the feature level (test 90% / template_holdout 100% / ood 71% feature-identical, measured and reported by the anti_cheat audit). The 0.524 is the honest generalization number.

What this model does and does not show

This is the section a skeptical reviewer should read first.

  • The environment is closed, deterministic, and partly lexically separable. The safe/risky distinction is partly encoded in action strings, so a pure good/bad-keyword matcher with no world model already scores ~0.73. A large part of the task is lexical, not consequence reasoning.
  • Strongest positive evidence (mechanism): over 15,736 same-visible-state candidate-action pairs, the learned model ranks the safer hidden consequence higher with choice accuracy 0.971 (risk-order accuracy 1.000), using scorer-only counterfactuals after candidates are fixed. This is the closest thing to direct evidence for the central "consequence aliasing" claim.
  • Negative result β€” action paraphrase: replace action target/value strings with unseen aliases (hidden roles preserved) and learned success collapses while the oracle stays at 1.000. The model leans on action vocabulary; this artifact does not claim action-vocabulary robustness.
  • Negative result β€” cross-surface OOD: on a 180-task suite requiring both a browser probe and a terminal canary before the safe action, the oracle solves it (1.000) but the learned planner gets 0.011. A genuine, unrecovered failure, deliberately kept out of the main scorecard.
  • Lexicon dependence: with a learned grounding extractor, the model holds 1.000 on a relation pair seen in training (under disjoint object/action vocabulary) but collapses to 0.582 on an unseen pair (sound/suspect) β€” a βˆ’0.418 drop. Generalizing the relation lexicon itself needs semantic priors this lab deliberately omits.

Anti-cheat contract

Learned planners receive only task description, visible state, action candidates, public history, and the learned model's predictions β€” never task/template/split ids, safe/risky/reactive labels, rewards or risk as features, hidden gold state, oracle next-state, or scorer labels. The oracle_world_model_planner is an upper-bound baseline and is the only policy permitted to read true transitions at decision time; this is enforced mechanically by a test, not by naming convention.

Usage

The model is a small JSON artifact loaded by the nanoawm package (pure Python, zero third-party runtime dependencies):

git clone https://github.com/jlov7/nanoAWM
cd nanoAWM
# place world_model.json at runs/world_model.json (or pass --model)
python -m nanoawm.eval_world --data data/trajectories.jsonl \
  --model runs/world_model.json --split test --out reports/world_eval.json

In code:

from nanoawm.models.world import TinyWorldModel

model = TinyWorldModel.load("world_model.json")
prediction = model.predict(description, visible_state, history, action)
# prediction: next_obs_class, state_diff_label, reward, risk,
#             irreversible_prob, approval_violation_prob,
#             hidden_corruption_prob, done_prob, success_prob, ...

Regenerate everything from scratch (CPU-only, no network) with scripts/repro_core.sh in the GitHub repo.

Limitations and scope

  • No external validation. Zero independent reproductions, expert reviews, or citations. All evidence is local symbolic measurement inside MiniOS β€” a deterministic toy OS β€” not a hosted web/OS benchmark or a frontier-agent comparison.
  • Not a general agent or a frontier-scale model. It is a substrate for sharper experiments, intentionally small enough to read in an afternoon.
  • Results are specific to MiniOS and the splits described above.

Citation

@software{nanoawm,
  title  = {nanoAWM: a tiny action-conditioned world model for tool agents},
  author = {Lovell, Jason},
  year   = {2026},
  url    = {https://github.com/jlov7/nanoAWM}
}

License: MIT. See the GitHub repository for the full paper, the consequence-aliasing argument, the generalization ladder, all audits (including the negative results above), and the reproduction scripts.

Personal project disclaimer

Personal, independent research and development. Not affiliated with, endorsed by, or representative of any employer, client, or organization.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support