--- library_name: mouse-core tags: - mouse-core - reinforcement-learning --- # micahr234/mouse-example-model This repository contains a MOUSE model checkpoint. ## Architecture - Backbone: `qwen3` - Hidden dimension: `1024` - Heads: `action_value_layerwise` - Action head: `action_value_layerwise` ### Encoder `StepEmbedder` reads flat step-record dicts and projects each declared modality into the shared `1024`-dimensional token space before the backbone. | Field | Type | Required | Tensor shape | Dtype | Notes | |---|---|---:|---|---|---| | `action` | `discrete` | yes | `[B, S]` | `torch.long` | integer ids in `[0, 3]` | | `observation` | `discrete` | yes | `[B, S]` | `torch.long` | integer ids in `[0, 63]` | | `reward` | `rff` | yes | `[B, S]` | `torch.float32` | scalar value | | `done` | `discrete` | yes | `[B, S]` | `torch.long` | integer ids in `[0, 4]` | ## Install MouseCore ```bash pip install mouse-core ``` ## Load The Model ```python import torch from mouse_core import load_model device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = load_model("micahr234/mouse-example-model", map_location="cpu").eval().to(device) ``` ## Run Inference The model accepts a `list[list[dict]]` batch of shape `[B][S]` — B sequences, each containing S step-record dicts with flat keys matching the encoder's declared modalities above. ```python # Batch shape: [B=1][S=1] — one sequence of one step. batch = [[ { "action": 0, "observation": 0, "reward": 0.0, "done": 0, } ]] predictions, objective_data, cache = model(batch) with torch.no_grad(): predictions, _, cache = model(batch) action = model.get_action(predictions, temperature=0.0) ``` `model()` returns `(predictions, objective_data, cache)`. `objective_data` is a `TensorDict[B, S]` of the modality tensors extracted by the encoder — pass it to objectives during training. For cached one-step rollout, keep `cache` and pass it back on the next call with `use_cache=True`.