---
language: en
license: mit
tags:
  - pytorch
  - mixture-of-experts
  - language-model
  - reasoning
  - grpo
---

# SHOREKEEPER-4B

A 4-billion parameter language model built around a **Council of Experts** architecture — 12 specialized expert modules routed by a learned gating network, layered on top of 28 transformer blocks with Grouped Query Attention and RoPE positional encoding. Designed for reasoning, code generation, and long-term memory across conversations.

---

## Architecture

| Component | Details |
|---|---|
| Parameters | ~4B |
| Layers | 28 transformer blocks |
| Attention | Grouped Query Attention (24 heads, 6 KV heads, head_dim 128) |
| Positional encoding | RoPE (θ = 1,000,000) |
| Experts | 12 specialists, 2 activated per token |
| Expert routing | Sentinel (learned gating with load-balance loss) |
| Expert dim | 2048 |
| Hidden dim | 3072 |
| Vocab size | 50,304 |
| Max sequence length | 8,192 |
| Quantization | 4-bit NF4 (bitsandbytes) |

Each transformer block applies **attention → MoE FFN** with pre-norm and residual connections. The 12 experts share weights across layers (cross-layer parameter sharing), keeping the model compact while preserving specialization.

---

## The Council of Experts

The Sentinel router selects 2 experts per token based on learned routing logits. Each expert is a gated feed-forward network (SiLU gate × value projection) with a role-specific bias term.

| Expert | Role | Specialization |
|---|---|---|
| **Asmoday** | Code | Python development, debugging |
| **Istaroth** | Systems | OS, networking, deployment |
| **Ronova** | Reasoning | Math, logic, step-by-step problems |
| **Naberius** | Memory | Long-term retrieval |
| **Phanes** | Creation | Writing, generation |
| **Barbeloth** | Analysis | Data patterns, insights |
| **Tacet** | Silence | Noise filtering, summarization |
| **Abby** | Empathy | User context, preferences |
| **Reindoter** | Validation | Testing, verification |
| **Zestial** | Vision | Visualization, diagrams |
| **Alice** | Exploration | Novel solutions, experiments |
| **Rover** | Execution | Terminal commands, sandbox |

---

## Persistent Memory

SHOREKEEPER maintains a JSON-based memory store across conversations, organized into six categories:

- `user_preferences` — learned user settings and habits
- `project_context` — active project information
- `conversation_history` — past exchanges (capped at 1,000 entries per category)
- `important_facts` — stored knowledge
- `code_patterns` — learned code conventions
- `learned_skills` — acquired capabilities

Memory context is automatically injected into each `chat()` call. Use `/remember` and `/recall` commands to interact with it directly.

---

## Training

Training happens in two stages:

**Stage 1 — Supervised Fine-Tuning**
Mixed STEM dataset: GSM8K, CodeAlpaca, OpenOrca, MathInstruct (~50K examples). Standard causal language modeling loss with AdamW + cosine annealing.

**Stage 2 — GRPO**
Group Relative Policy Optimization on math reasoning prompts. Reward signal: +2.0 for correct answer, +0.5 bonus for chain-of-thought reasoning steps. Load balance loss applied every step to prevent expert collapse.

---

## Sandboxed Execution

SHOREKEEPER can execute terminal commands inside a Docker container with:
- Command whitelist (python3, pip, git, ls, cat, mkdir, touch, echo)
- 30-second timeout
- 4GB memory / 2 CPU limit
- No interactive shell access

---

## Quick Start

```bash
pip install -r requirements.txt
python scripts/07_run_shorekeeper.py
```

**Available commands in the CLI:**

```
/remember <fact>    Store something in long-term memory
/recall <query>     Search memory
/run <command>      Execute in sandbox
/project <name>     Create a new project
/exit               Quit
```

---

## Project Structure

```
src/
├── shorekeeper.py          Main model class
├── council/
│   ├── attention.py        GQA + RoPE attention layer
│   ├── sentinel.py         Expert router
│   ├── experts.py          12 expert modules
│   └── base_expert.py      Shared expert base class
├── memory/
│   └── json_library.py     Persistent memory system
├── sandbox/
│   └── terminal.py         Docker-based execution
└── training/
    └── grpo.py             GRPO trainer

configs/                    YAML configs (model, training, memory, sandbox)
scripts/                    Training and inference scripts
tests/                      Unit tests
```

---

## Requirements

- Python 3.10+
- PyTorch 2.5+
- CUDA recommended for inference at full precision
- Docker (optional, for sandbox execution)

```bash
pip install -r requirements.txt
```

---

## Variants

A **15B variant** config is available at `configs/model_15b.yaml` (dim 6144, 48 layers, 16 experts).