eschaton-familiar-brain-v1

A DQN warm-start checkpoint for the in-browser Neural Familiar agent at neural-familiar.html.

Architecture

Input: 20-dim feature vector (clamp to [-1, 1])
Trunk: Dense(64, relu) → Dense(64, relu) → Dense(32, relu)
Output: Dense(7, linear) — Q-values over actions: wander, seek_food, avoid_hazard, approach_user, rest, speak, explore

Training

Trained with stable-baselines3 DQN against env_sim.py, a compressed port of the JS world (energy, food, hazards, novelty grid, edge penalty, action diversity, displacement bonus).
500000 env steps, γ=0.95, lr=0.0005
Single-source-of-truth reward weights mirror app.js DEFAULT_REWARD_WEIGHTS.

Loading in browser

// scripts/neural-familiar/brain.js — LocalNeuralBrain.loadFromHF()
const url = 'https://huggingface.co/{handle}/eschaton-familiar-brain-v1/resolve/main/policy_v1.json';
const blob = await fetch(url).then(r => r.json());
this.applyWeights(blob);  // setWeights on the existing tf.sequential

Limitations

The JS world has features (POIs, mood, day/night cycle, levels) that the Python sim only partially models. The warm start gives early-action priors, not optimal play — local DQN training on the user's session corrects toward true Q-values.
Reward shaping in env_sim.py must be kept in sync with app.js DEFAULT_REWARD_WEIGHTS. Drift will degrade warm-start usefulness.

Round-trip verification

The exporter writes both policy_v1.safetensors and policy_v1.json. The JSON path's plain numpy matmul produces predictions equal to sb3's PyTorch forward pass to <1e-3 max-abs-diff (verified at export time).

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning