Add comprehensive README with HuggingFace model card metadata

43a2621 6 days ago

4.85 kB

	---
	language: en
	license: mit
	tags:
	- pytorch
	- mixture-of-experts
	- language-model
	- reasoning
	- grpo
	---

	# SHOREKEEPER-4B

	A 4-billion parameter language model built around a Council of Experts architecture — 12 specialized expert modules routed by a learned gating network, layered on top of 28 transformer blocks with Grouped Query Attention and RoPE positional encoding. Designed for reasoning, code generation, and long-term memory across conversations.

	---

	## Architecture

	\| Component \| Details \|
	\|---\|---\|
	\| Parameters \| ~4B \|
	\| Layers \| 28 transformer blocks \|
	\| Attention \| Grouped Query Attention (24 heads, 6 KV heads, head_dim 128) \|
	\| Positional encoding \| RoPE (θ = 1,000,000) \|
	\| Experts \| 12 specialists, 2 activated per token \|
	\| Expert routing \| Sentinel (learned gating with load-balance loss) \|
	\| Expert dim \| 2048 \|
	\| Hidden dim \| 3072 \|
	\| Vocab size \| 50,304 \|
	\| Max sequence length \| 8,192 \|
	\| Quantization \| 4-bit NF4 (bitsandbytes) \|

	Each transformer block applies attention → MoE FFN with pre-norm and residual connections. The 12 experts share weights across layers (cross-layer parameter sharing), keeping the model compact while preserving specialization.

	---

	## The Council of Experts

	The Sentinel router selects 2 experts per token based on learned routing logits. Each expert is a gated feed-forward network (SiLU gate × value projection) with a role-specific bias term.

	\| Expert \| Role \| Specialization \|
	\|---\|---\|---\|
	\| Asmoday \| Code \| Python development, debugging \|
	\| Istaroth \| Systems \| OS, networking, deployment \|
	\| Ronova \| Reasoning \| Math, logic, step-by-step problems \|
	\| Naberius \| Memory \| Long-term retrieval \|
	\| Phanes \| Creation \| Writing, generation \|
	\| Barbeloth \| Analysis \| Data patterns, insights \|
	\| Tacet \| Silence \| Noise filtering, summarization \|
	\| Abby \| Empathy \| User context, preferences \|
	\| Reindoter \| Validation \| Testing, verification \|
	\| Zestial \| Vision \| Visualization, diagrams \|
	\| Alice \| Exploration \| Novel solutions, experiments \|
	\| Rover \| Execution \| Terminal commands, sandbox \|

	---

	## Persistent Memory

	SHOREKEEPER maintains a JSON-based memory store across conversations, organized into six categories:

	- `user_preferences` — learned user settings and habits
	- `project_context` — active project information
	- `conversation_history` — past exchanges (capped at 1,000 entries per category)
	- `important_facts` — stored knowledge
	- `code_patterns` — learned code conventions
	- `learned_skills` — acquired capabilities

	Memory context is automatically injected into each `chat()` call. Use `/remember` and `/recall` commands to interact with it directly.

	---

	## Training

	Training happens in two stages:

	Stage 1 — Supervised Fine-Tuning
	Mixed STEM dataset: GSM8K, CodeAlpaca, OpenOrca, MathInstruct (~50K examples). Standard causal language modeling loss with AdamW + cosine annealing.

	Stage 2 — GRPO
	Group Relative Policy Optimization on math reasoning prompts. Reward signal: +2.0 for correct answer, +0.5 bonus for chain-of-thought reasoning steps. Load balance loss applied every step to prevent expert collapse.

	---

	## Sandboxed Execution

	SHOREKEEPER can execute terminal commands inside a Docker container with:
	- Command whitelist (python3, pip, git, ls, cat, mkdir, touch, echo)
	- 30-second timeout
	- 4GB memory / 2 CPU limit
	- No interactive shell access

	---

	## Quick Start

	```bash
	pip install -r requirements.txt
	python scripts/07_run_shorekeeper.py
	```

	Available commands in the CLI:

	```
	/remember <fact> Store something in long-term memory
	/recall <query> Search memory
	/run <command> Execute in sandbox
	/project <name> Create a new project
	/exit Quit
	```

	---

	## Project Structure

	```
	src/
	├── shorekeeper.py Main model class
	├── council/
	│ ├── attention.py GQA + RoPE attention layer
	│ ├── sentinel.py Expert router
	│ ├── experts.py 12 expert modules
	│ └── base_expert.py Shared expert base class
	├── memory/
	│ └── json_library.py Persistent memory system
	├── sandbox/
	│ └── terminal.py Docker-based execution
	└── training/
	└── grpo.py GRPO trainer

	configs/ YAML configs (model, training, memory, sandbox)
	scripts/ Training and inference scripts
	tests/ Unit tests
	```

	---

	## Requirements

	- Python 3.10+
	- PyTorch 2.5+
	- CUDA recommended for inference at full precision
	- Docker (optional, for sandbox execution)

	```bash
	pip install -r requirements.txt
	```

	---

	## Variants

	A 15B variant config is available at `configs/model_15b.yaml` (dim 6144, 48 layers, 16 experts).