| | --- |
| | language: en |
| | license: apache-2.0 |
| | tags: |
| | - world-model |
| | - rssm |
| | - tutoring |
| | - predictive-model |
| | - pytorch |
| | - kat |
| | library_name: pytorch |
| | pipeline_tag: reinforcement-learning |
| | model-index: |
| | - name: kat-2-RSSM |
| | results: |
| | - task: |
| | type: world-modeling |
| | name: Tutoring State Prediction |
| | metrics: |
| | - name: Eval Loss (best) |
| | type: loss |
| | value: 0.3124 |
| | - name: Reconstruction Loss |
| | type: loss |
| | value: 0.1389 |
| | - name: KL Divergence |
| | type: loss |
| | value: 0.0104 |
| | - name: Reward Loss |
| | type: loss |
| | value: 0.082 |
| | - name: Done Loss |
| | type: loss |
| | value: 0.064 |
| | --- |
| | |
| | # KAT-2-RSSM |
| |
|
| | A **Recurrent State-Space Model** trained for tutoring state prediction, part of the **KAT** system by [Progga AI](https://progga.ai). |
| |
|
| | ## Model Description |
| |
|
| | This is a complete world model for predicting tutoring session dynamics β student state transitions, reward signals, and session termination. It uses a DreamerV3-inspired RSSM architecture with VL-JEPA-style EMA target encoding. |
| |
|
| | ### Architecture |
| |
|
| | ``` |
| | TutoringRSSM (2,802,838 params) |
| | βββ ObservationEncoder: obs_dim(20) β encoder_hidden(256) β latent_dim(128) |
| | βββ ActionEmbedding: action_dim(8) β embed_dim(32) |
| | βββ DeterministicTransition: GRU(hidden_dim=512) |
| | βββ StochasticLatent: Diagonal Gaussian prior/posterior (latent_dim=128) |
| | βββ ObservationDecoder: feature_dim(640) β decoder_hidden(256) β obs_dim(20) |
| | βββ RewardPredictor: feature_dim(640) β 1 |
| | βββ DonePredictor: feature_dim(640) β 1 |
| | βββ EMATargetEncoder: momentum=0.996 (VL-JEPA heritage) |
| | ``` |
| |
|
| | **Feature dimension**: `hidden_dim + latent_dim = 512 + 128 = 640` |
| |
|
| | ### Observation Space (20-dim) |
| |
|
| | The 20-dimensional observation vector encodes tutoring session state: |
| |
|
| | | Dims | Signal | |
| | |------|--------| |
| | | 0-3 | Mastery estimates (per-topic confidence) | |
| | | 4-7 | Engagement signals (attention, participation) | |
| | | 8-11 | Response quality (accuracy, depth, speed) | |
| | | 12-15 | Emotional state (frustration, confidence, curiosity) | |
| | | 16-19 | Session context (time, hint level, attempt count) | |
| |
|
| | ### Action Space (8 discrete actions) |
| |
|
| | | Index | Strategy | |
| | |-------|----------| |
| | | 0 | SOCRATIC β Guided questioning | |
| | | 1 | SCAFFOLDED β Structured support | |
| | | 2 | DIRECT β Direct instruction | |
| | | 3 | EXPLORATORY β Open exploration | |
| | | 4 | REMEDIAL β Error correction | |
| | | 5 | ASSESSMENT β Knowledge check | |
| | | 6 | MOTIVATIONAL β Encouragement | |
| | | 7 | METACOGNITIVE β Reflection | |
| |
|
| | ## Training Details |
| |
|
| | - **Data**: 100,901 synthetic tutoring trajectories (95,856 train / 5,045 eval) |
| | - **Epochs**: 100 (best at epoch 93) |
| | - **Hardware**: NVIDIA A100-SXM4-40GB |
| | - **Optimizer**: Adam (lr=3e-4) |
| | - **Training time**: ~45 minutes |
| | - **Framework**: PyTorch 2.x |
| |
|
| | ### Training Metrics (Best Checkpoint β Epoch 93) |
| |
|
| | | Metric | Value | |
| | |--------|-------| |
| | | **Total Loss** | 0.3124 | |
| | | Reconstruction Loss | 0.1389 | |
| | | KL Divergence | 0.0104 | |
| | | Reward Loss | 0.0820 | |
| | | Done Loss | 0.0640 | |
| | | Rollout Loss | 0.3294 | |
| |
|
| | ### Training Curve |
| |
|
| | Training converged smoothly over 100 epochs with consistent eval loss improvement. No catastrophic forgetting or training instability observed. |
| |
|
| | ## Files |
| |
|
| | | File | Description | Size | |
| | |------|-------------|------| |
| | | `tutoring_rssm_best.pt` | Best checkpoint (epoch 93, eval loss 0.3124) | 11 MB | |
| | | `tutoring_rssm_final.pt` | Final checkpoint (epoch 100) | 11 MB | |
| | | `tutoring_rssm_epoch{N}.pt` | Snapshots every 10 epochs | 11 MB each | |
| | | `v1-backup/` | RSSM v1 checkpoints (smaller model) | ~800 KB each | |
| | | `training_log.txt` | Full training log | ~8 KB | |
| | | `config.json` | Model configuration | <1 KB | |
| | | `architecture.py` | Standalone model definition | ~20 KB | |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | import torch |
| | from architecture import TutoringRSSM, TutoringWorldModelConfig |
| | |
| | # Load model |
| | config = TutoringWorldModelConfig( |
| | obs_dim=20, action_dim=8, |
| | latent_dim=128, hidden_dim=512, |
| | encoder_hidden=256, decoder_hidden=256, |
| | ) |
| | model = TutoringRSSM(config).cuda() |
| | |
| | ckpt = torch.load("tutoring_rssm_best.pt", map_location="cuda") |
| | model.load_state_dict(ckpt["model_state_dict"]) |
| | model.eval() |
| | |
| | # Initialize state |
| | h, z = model.initial_state(batch_size=1) |
| | |
| | # Observe a tutoring step |
| | obs = torch.randn(1, 20).cuda() # Student observation |
| | action = torch.tensor([0]).cuda() # SOCRATIC strategy |
| | result = model.observe_step(h, z, action, obs) |
| | |
| | h_new, z_new = result["h"], result["z"] |
| | pred_obs = result["pred_obs"] # Predicted next observation |
| | pred_reward = result["pred_reward"] # Predicted reward |
| | pred_done = result["pred_done"] # Predicted session end |
| | |
| | # Imagination (planning without observation) |
| | imagined = model.imagine_step(h_new, z_new, torch.tensor([3]).cuda()) |
| | # Returns predicted state without requiring real observation |
| | ``` |
| |
|
| | ## Evaluation Results (94/94 tests pass) |
| |
|
| | | Component | Tests | Status | |
| | |-----------|-------|--------| |
| | | Predictive Student Model | 44/44 | ALL PASS | |
| | | Cognition World Model Eval | 2/2 | ALL ACCEPTANCE MET | |
| | | Core PyTorch RSSM | 10/10 | ALL PASS | |
| | | Physics/Causality Micro-Modules | 23/23 | ALL PASS | |
| | | Trained Checkpoint Inference | 7/7 | ALL PASS | |
| | | Advanced Planners (MCTS/Beam) | 8/8 | ALL PASS | |
| |
|
| | ### Acceptance Criteria |
| |
|
| | - **Prediction accuracy**: 12.08% error at horizon (target <20%) β |
| | - **Planning improvement**: +14.5% vs reactive baseline (target >+10%) β |
| |
|
| | ## Heritage |
| |
|
| | This model inherits from the **Abigail3 cognitive architecture**, specifically: |
| | - RSSM design from `abigail/core/world_model.py` |
| | - VL-JEPA EMA target encoding from Meta AI's Joint-Embedding Predictive Architecture |
| | - DreamerV3-inspired training with KL balancing and rollout losses |
| | - Governance-first design: generation separated from governance |
| |
|
| | ## Ecosystem |
| |
|
| | This world model is part of the broader KAT system: |
| | - **23 physics/causality micro-modules** (67M params total) β intuitive physics simulation |
| | - **MCTS Planner** β Monte Carlo Tree Search for action planning |
| | - **Beam Search Planner** β Anytime approximate planning |
| | - **Causal World Model** β Structural causal model with do-calculus |
| | - **Predictive Student Model** β VL-JEPA/RSSM adapted for tutoring personalization |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|
| | ## Author |
| |
|
| | **Preston Mills** β Progga AI |
| | - Built for KAT-2 framework |
| | - Designed by Progga AI |
| | - February 2026 |