| --- |
| language: en |
| license: mit |
| tags: |
| - pytorch |
| - mixture-of-experts |
| - language-model |
| - reasoning |
| - grpo |
| --- |
| |
| # SHOREKEEPER-4B |
|
|
| A 4-billion parameter language model built around a **Council of Experts** architecture β 12 specialized expert modules routed by a learned gating network, layered on top of 28 transformer blocks with Grouped Query Attention and RoPE positional encoding. Designed for reasoning, code generation, and long-term memory across conversations. |
|
|
| --- |
|
|
| ## Architecture |
|
|
| | Component | Details | |
| |---|---| |
| | Parameters | ~4B | |
| | Layers | 28 transformer blocks | |
| | Attention | Grouped Query Attention (24 heads, 6 KV heads, head_dim 128) | |
| | Positional encoding | RoPE (ΞΈ = 1,000,000) | |
| | Experts | 12 specialists, 2 activated per token | |
| | Expert routing | Sentinel (learned gating with load-balance loss) | |
| | Expert dim | 2048 | |
| | Hidden dim | 3072 | |
| | Vocab size | 50,304 | |
| | Max sequence length | 8,192 | |
| | Quantization | 4-bit NF4 (bitsandbytes) | |
| |
| Each transformer block applies **attention β MoE FFN** with pre-norm and residual connections. The 12 experts share weights across layers (cross-layer parameter sharing), keeping the model compact while preserving specialization. |
| |
| --- |
| |
| ## The Council of Experts |
| |
| The Sentinel router selects 2 experts per token based on learned routing logits. Each expert is a gated feed-forward network (SiLU gate Γ value projection) with a role-specific bias term. |
| |
| | Expert | Role | Specialization | |
| |---|---|---| |
| | **Asmoday** | Code | Python development, debugging | |
| | **Istaroth** | Systems | OS, networking, deployment | |
| | **Ronova** | Reasoning | Math, logic, step-by-step problems | |
| | **Naberius** | Memory | Long-term retrieval | |
| | **Phanes** | Creation | Writing, generation | |
| | **Barbeloth** | Analysis | Data patterns, insights | |
| | **Tacet** | Silence | Noise filtering, summarization | |
| | **Abby** | Empathy | User context, preferences | |
| | **Reindoter** | Validation | Testing, verification | |
| | **Zestial** | Vision | Visualization, diagrams | |
| | **Alice** | Exploration | Novel solutions, experiments | |
| | **Rover** | Execution | Terminal commands, sandbox | |
| |
| --- |
| |
| ## Persistent Memory |
| |
| SHOREKEEPER maintains a JSON-based memory store across conversations, organized into six categories: |
| |
| - `user_preferences` β learned user settings and habits |
| - `project_context` β active project information |
| - `conversation_history` β past exchanges (capped at 1,000 entries per category) |
| - `important_facts` β stored knowledge |
| - `code_patterns` β learned code conventions |
| - `learned_skills` β acquired capabilities |
|
|
| Memory context is automatically injected into each `chat()` call. Use `/remember` and `/recall` commands to interact with it directly. |
|
|
| --- |
|
|
| ## Training |
|
|
| Training happens in two stages: |
|
|
| **Stage 1 β Supervised Fine-Tuning** |
| Mixed STEM dataset: GSM8K, CodeAlpaca, OpenOrca, MathInstruct (~50K examples). Standard causal language modeling loss with AdamW + cosine annealing. |
|
|
| **Stage 2 β GRPO** |
| Group Relative Policy Optimization on math reasoning prompts. Reward signal: +2.0 for correct answer, +0.5 bonus for chain-of-thought reasoning steps. Load balance loss applied every step to prevent expert collapse. |
|
|
| --- |
|
|
| ## Sandboxed Execution |
|
|
| SHOREKEEPER can execute terminal commands inside a Docker container with: |
| - Command whitelist (python3, pip, git, ls, cat, mkdir, touch, echo) |
| - 30-second timeout |
| - 4GB memory / 2 CPU limit |
| - No interactive shell access |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install -r requirements.txt |
| python scripts/07_run_shorekeeper.py |
| ``` |
|
|
| **Available commands in the CLI:** |
|
|
| ``` |
| /remember <fact> Store something in long-term memory |
| /recall <query> Search memory |
| /run <command> Execute in sandbox |
| /project <name> Create a new project |
| /exit Quit |
| ``` |
|
|
| --- |
|
|
| ## Project Structure |
|
|
| ``` |
| src/ |
| βββ shorekeeper.py Main model class |
| βββ council/ |
| β βββ attention.py GQA + RoPE attention layer |
| β βββ sentinel.py Expert router |
| β βββ experts.py 12 expert modules |
| β βββ base_expert.py Shared expert base class |
| βββ memory/ |
| β βββ json_library.py Persistent memory system |
| βββ sandbox/ |
| β βββ terminal.py Docker-based execution |
| βββ training/ |
| βββ grpo.py GRPO trainer |
| |
| configs/ YAML configs (model, training, memory, sandbox) |
| scripts/ Training and inference scripts |
| tests/ Unit tests |
| ``` |
|
|
| --- |
|
|
| ## Requirements |
|
|
| - Python 3.10+ |
| - PyTorch 2.5+ |
| - CUDA recommended for inference at full precision |
| - Docker (optional, for sandbox execution) |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| --- |
|
|
| ## Variants |
|
|
| A **15B variant** config is available at `configs/model_15b.yaml` (dim 6144, 48 layers, 16 experts). |
|
|