SHOREKEEPER-4B
A 4-billion parameter language model built around a Council of Experts architecture β 12 specialized expert modules routed by a learned gating network, layered on top of 28 transformer blocks with Grouped Query Attention and RoPE positional encoding. Designed for reasoning, code generation, and long-term memory across conversations.
Architecture
| Component | Details |
|---|---|
| Parameters | ~4B |
| Layers | 28 transformer blocks |
| Attention | Grouped Query Attention (24 heads, 6 KV heads, head_dim 128) |
| Positional encoding | RoPE (ΞΈ = 1,000,000) |
| Experts | 12 specialists, 2 activated per token |
| Expert routing | Sentinel (learned gating with load-balance loss) |
| Expert dim | 2048 |
| Hidden dim | 3072 |
| Vocab size | 50,304 |
| Max sequence length | 8,192 |
| Quantization | 4-bit NF4 (bitsandbytes) |
Each transformer block applies attention β MoE FFN with pre-norm and residual connections. The 12 experts share weights across layers (cross-layer parameter sharing), keeping the model compact while preserving specialization.
The Council of Experts
The Sentinel router selects 2 experts per token based on learned routing logits. Each expert is a gated feed-forward network (SiLU gate Γ value projection) with a role-specific bias term.
| Expert | Role | Specialization |
|---|---|---|
| Asmoday | Code | Python development, debugging |
| Istaroth | Systems | OS, networking, deployment |
| Ronova | Reasoning | Math, logic, step-by-step problems |
| Naberius | Memory | Long-term retrieval |
| Phanes | Creation | Writing, generation |
| Barbeloth | Analysis | Data patterns, insights |
| Tacet | Silence | Noise filtering, summarization |
| Abby | Empathy | User context, preferences |
| Reindoter | Validation | Testing, verification |
| Zestial | Vision | Visualization, diagrams |
| Alice | Exploration | Novel solutions, experiments |
| Rover | Execution | Terminal commands, sandbox |
Persistent Memory
SHOREKEEPER maintains a JSON-based memory store across conversations, organized into six categories:
user_preferencesβ learned user settings and habitsproject_contextβ active project informationconversation_historyβ past exchanges (capped at 1,000 entries per category)important_factsβ stored knowledgecode_patternsβ learned code conventionslearned_skillsβ acquired capabilities
Memory context is automatically injected into each chat() call. Use /remember and /recall commands to interact with it directly.
Training
Training happens in two stages:
Stage 1 β Supervised Fine-Tuning Mixed STEM dataset: GSM8K, CodeAlpaca, OpenOrca, MathInstruct (~50K examples). Standard causal language modeling loss with AdamW + cosine annealing.
Stage 2 β GRPO Group Relative Policy Optimization on math reasoning prompts. Reward signal: +2.0 for correct answer, +0.5 bonus for chain-of-thought reasoning steps. Load balance loss applied every step to prevent expert collapse.
Sandboxed Execution
SHOREKEEPER can execute terminal commands inside a Docker container with:
- Command whitelist (python3, pip, git, ls, cat, mkdir, touch, echo)
- 30-second timeout
- 4GB memory / 2 CPU limit
- No interactive shell access
Quick Start
pip install -r requirements.txt
python scripts/07_run_shorekeeper.py
Available commands in the CLI:
/remember <fact> Store something in long-term memory
/recall <query> Search memory
/run <command> Execute in sandbox
/project <name> Create a new project
/exit Quit
Project Structure
src/
βββ shorekeeper.py Main model class
βββ council/
β βββ attention.py GQA + RoPE attention layer
β βββ sentinel.py Expert router
β βββ experts.py 12 expert modules
β βββ base_expert.py Shared expert base class
βββ memory/
β βββ json_library.py Persistent memory system
βββ sandbox/
β βββ terminal.py Docker-based execution
βββ training/
βββ grpo.py GRPO trainer
configs/ YAML configs (model, training, memory, sandbox)
scripts/ Training and inference scripts
tests/ Unit tests
Requirements
- Python 3.10+
- PyTorch 2.5+
- CUDA recommended for inference at full precision
- Docker (optional, for sandbox execution)
pip install -r requirements.txt
Variants
A 15B variant config is available at configs/model_15b.yaml (dim 6144, 48 layers, 16 experts).