eschaton-familiar-brain-v1
A DQN warm-start checkpoint for the in-browser Neural Familiar agent at neural-familiar.html.
Architecture
- Input: 20-dim feature vector (clamp to [-1, 1])
- Trunk: Dense(64, relu) → Dense(64, relu) → Dense(32, relu)
- Output: Dense(7, linear) — Q-values over actions: wander, seek_food, avoid_hazard, approach_user, rest, speak, explore
Training
- Trained with stable-baselines3 DQN against
env_sim.py, a compressed port of the JS world (energy, food, hazards, novelty grid, edge penalty, action diversity, displacement bonus). - 500000 env steps, γ=0.95, lr=0.0005
- Single-source-of-truth reward weights mirror app.js DEFAULT_REWARD_WEIGHTS.
Loading in browser
// scripts/neural-familiar/brain.js — LocalNeuralBrain.loadFromHF()
const url = 'https://huggingface.co/{handle}/eschaton-familiar-brain-v1/resolve/main/policy_v1.json';
const blob = await fetch(url).then(r => r.json());
this.applyWeights(blob); // setWeights on the existing tf.sequential
Limitations
- The JS world has features (POIs, mood, day/night cycle, levels) that the Python sim only partially models. The warm start gives early-action priors, not optimal play — local DQN training on the user's session corrects toward true Q-values.
- Reward shaping in env_sim.py must be kept in sync with app.js DEFAULT_REWARD_WEIGHTS. Drift will degrade warm-start usefulness.
Round-trip verification
The exporter writes both policy_v1.safetensors and policy_v1.json. The
JSON path's plain numpy matmul produces predictions equal to sb3's PyTorch
forward pass to <1e-3 max-abs-diff (verified at export time).