Instructions to use ayjays132/PhillSwarm-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ayjays132/PhillSwarm-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ayjays132/PhillSwarm-4b", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("ayjays132/PhillSwarm-4b", trust_remote_code=True, dtype="auto") - llama-cpp-python
How to use ayjays132/PhillSwarm-4b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ayjays132/PhillSwarm-4b", filename="phillswarm-4b-ollama-f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use ayjays132/PhillSwarm-4b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ayjays132/PhillSwarm-4b:F16 # Run inference directly in the terminal: llama-cli -hf ayjays132/PhillSwarm-4b:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ayjays132/PhillSwarm-4b:F16 # Run inference directly in the terminal: llama-cli -hf ayjays132/PhillSwarm-4b:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ayjays132/PhillSwarm-4b:F16 # Run inference directly in the terminal: ./llama-cli -hf ayjays132/PhillSwarm-4b:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ayjays132/PhillSwarm-4b:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf ayjays132/PhillSwarm-4b:F16
Use Docker
docker model run hf.co/ayjays132/PhillSwarm-4b:F16
- LM Studio
- Jan
- vLLM
How to use ayjays132/PhillSwarm-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ayjays132/PhillSwarm-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ayjays132/PhillSwarm-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ayjays132/PhillSwarm-4b:F16
- SGLang
How to use ayjays132/PhillSwarm-4b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ayjays132/PhillSwarm-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ayjays132/PhillSwarm-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ayjays132/PhillSwarm-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ayjays132/PhillSwarm-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use ayjays132/PhillSwarm-4b with Ollama:
ollama run hf.co/ayjays132/PhillSwarm-4b:F16
- Unsloth Studio
How to use ayjays132/PhillSwarm-4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ayjays132/PhillSwarm-4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ayjays132/PhillSwarm-4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ayjays132/PhillSwarm-4b to start chatting
- Pi
How to use ayjays132/PhillSwarm-4b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ayjays132/PhillSwarm-4b:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ayjays132/PhillSwarm-4b:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ayjays132/PhillSwarm-4b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ayjays132/PhillSwarm-4b:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ayjays132/PhillSwarm-4b:F16
Run Hermes
hermes
- Docker Model Runner
How to use ayjays132/PhillSwarm-4b with Docker Model Runner:
docker model run hf.co/ayjays132/PhillSwarm-4b:F16
- Lemonade
How to use ayjays132/PhillSwarm-4b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ayjays132/PhillSwarm-4b:F16
Run and chat with the model
lemonade run user.PhillSwarm-4b-F16
List all available models
lemonade list
- Recommended Use Paths
- What This Is
- What Makes It Different
- Quick Start: Load As A Normal HF Model
- Quick Start: Use Swarm Runtime
- Regular Mode Vs Swarm Mode
- Runtime Architecture
- Goals
- Agentic Sessions
- Skills And Domain Routing
- Tool And App Integration
- Local App And Streaming
- Ollama-Compatible Full Runtime
- Vision Sidecar
- Learnable Scaffolding
- Training Losses
- Guarded Online Learning
- Final Polish Pass
- Validation Summary
- Known Limits
- File Map
- Recommended Use Cases
- Minimal Requirements
- Attribution And Development Notes
Phill Swarm-MoE 4B Qwen3.5 Hybrid Final
A Hugging Face compatible causal language model with an optional shared-weight swarm runtime for routing, skills, goals, scaffolding, tool planning, agentic sessions, and local app integrations.
llama_cpp snippet for phillswarm-4b-ollama-f16.gguf, treat that as a raw GGUF preview only. For the full PhillSwarm system, run launch_ollama_bridge.py and use phillswarm-4b:full. The bridge preserves the custom HF model code, swarm controller, verified skills, goals, tools, and vision-sidecar path behind Ollama-compatible APIs.
Recommended Use Paths
| User Goal | Recommended Path | Why |
|---|---|---|
| Best local Python/HF quality | AutoModelForCausalLM.from_pretrained(..., trust_remote_code=True) |
Loads the native custom Swarm-MoE architecture. |
| Coherent Ollama-compatible use | python launch_ollama_bridge.py --model . --port 11435 then use phillswarm-4b:full |
Keeps the full runtime while exposing Ollama-style /api/chat and /api/generate. |
| Quick raw GGUF experiment | llama_cpp / stock Ollama with phillswarm-4b-ollama-f16.gguf |
Loadability preview only; not the full intelligence path. |
Quick Ollama-compatible full-runtime setup:
huggingface-cli download ayjays132/PhillSwarm-4b --local-dir PhillSwarm-4b
cd PhillSwarm-4b
sh setup_phillswarm_ollama.sh
If you prefer making it executable first:
chmod +x setup_phillswarm_ollama.sh
./setup_phillswarm_ollama.sh
Windows PowerShell:
huggingface-cli download ayjays132/PhillSwarm-4b --local-dir PhillSwarm-4b
cd PhillSwarm-4b
.\setup_phillswarm_ollama.ps1
Windows CMD:
huggingface-cli download ayjays132/PhillSwarm-4b --local-dir PhillSwarm-4b
cd PhillSwarm-4b
setup_phillswarm_ollama.cmd
The setup script:
- sets user
OLLAMA_HOST=http://127.0.0.1:11435 - starts the PhillSwarm full-runtime bridge if it is not already running
- waits until the bridge is ready
- runs
ollama listto confirm the direct Ollama CLI sees the full model - on macOS/Linux, adds
OLLAMA_HOSTto.zshrcor.bashrcwhen it can detect the active shell
After setup, open a new terminal and use normal Ollama commands:
ollama list
ollama run phillswarm-4b:full
Model name:
phillswarm-4b:full
What This Is
Phill Swarm-MoE is a sparse Mixture-of-Experts causal language model packaged as a normal Hugging Face checkpoint. It can be loaded as a standard AutoModelForCausalLM model, or used through the optional PhillSwarmController runtime for agentic features.
Public model repo:
ayjays132/PhillSwarm-4b
https://huggingface.co/ayjays132/PhillSwarm-4b
The checkpoint in this folder is the grown 4B final package:
- Parameters: 4,144,993,832.
- Architecture: Swarm-MoE decoder-only causal LM.
- Layers: 40 unique routed layers.
- Hidden size: 1024.
- Experts: 16 routed experts with top-2 routing plus shared expert path.
- Attention: grouped-query attention with Q/K RMSNorm, optional V norm gate, RoPE, KV cache.
- Tokenizer: Qwen tokenizer copied into the final package.
- Precision target: bf16.
- Context configured: 4096 positions.
What Makes It Different
Planner, solver, verifier, domain, tool, and editor roles can run over one loaded model instead of separate model copies.
Math, tool routing, web/search planning, browser mode, IDE/CLI integration, health/legal/finance/security domain policy, and runtime diagnostics can anchor answers.
Runtime-only goal state tracks objective, constraints, allowed tools, notes, artifacts, events, and completion status in portable JSON.
Scaffold routing learns from successful traces through `scaffold_blueprint.json` without mutating model weights during normal generation.
A safe post-processor can improve wording using only the user prompt and verified final answer. Bad polish is rejected.
Runtime metadata advertises JSON-schema tools, smolagents, MCP-compatible tools, and OpenAI-style tool calls.
Shared-Weight Swarm MoE
One loaded checkpoint can drive routed roles, verified skills, and final synthesis without spawning separate model copies.
Goal-Aware Runtime
Objectives, constraints, progress, tools, artifacts, and verification events stay in portable runtime JSON.
Tool-Native Local Agent
Tool routing, web search, browser observation, coding, workspace search, and safety gates are exposed through the optional app/runtime.
Quick Start: Load As A Normal HF Model
Install current transformers, then load with trust_remote_code=True because the package uses custom Swarm-MoE model code:
pip install -U transformers accelerate safetensors torch
import torch
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
model_id = "ayjays132/PhillSwarm-4b"
config = AutoConfig.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
config=config,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
).eval()
messages = [{"role": "user", "content": "Explain a black hole in simple terms."}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=128, do_sample=False)
print(tokenizer.decode(out[0, inputs.shape[-1]:], skip_special_tokens=True))
For a local clone or downloaded snapshot, replace model_id with the folder path, for example "." from inside the model folder.
Quick Start: Use Swarm Runtime
The optional swarm runtime is packaged with the model files. For the cleanest setup, download the snapshot locally so Python can import the included swarm_moe_model runtime package:
pip install -U huggingface_hub transformers accelerate safetensors torch
huggingface-cli download ayjays132/PhillSwarm-4b --local-dir PhillSwarm-4b
cd PhillSwarm-4b
import sys
from pathlib import Path
sys.path.insert(0, str(Path(".").resolve()))
from swarm_moe_model.swarm_mode import PhillSwarmController
controller = PhillSwarmController.from_pretrained_or_config(".")
result = controller.ask(
"Explain swarm mode compared with regular mode.",
mode="swarm",
)
print(result.answer)
print(result.indicators["route_decision"]["final_selected"])
If you are already importing the packaged runtime from another location, you can point it at the public repo ID:
controller = PhillSwarmController.from_pretrained_or_config("ayjays132/PhillSwarm-4b")
Regular Mode Vs Swarm Mode
| Mode | What It Does | Best For |
|---|---|---|
| Regular HF model | Plain causal LM generation through AutoModelForCausalLM |
Standard inference, benchmark compatibility, simple integration |
| Regular runtime mode | Single model pass with optional verified skill routing | Fast local chat, CLI usage, stable known tasks |
| Swarm mode | Runtime orchestration around the same loaded model | Tool planning, goals, research-style tasks, browser/app workflows, route traces |
| Forced profile debate | Runs dynamic workers and verification trace | Debugging orchestration, comparing candidates, showing solver/critic/final-editor behavior |
Default publish-facing behavior is conservative: verified skills can finish the answer without forcing noisy profile generation. Profile workers can still be activated with return_candidates=True, explicit debate prompts, or enable_profile_generation: true.
Runtime Architecture
flowchart LR
U["User Prompt"] --> R["Intent + Skill Router"]
R --> S["Verified Skills"]
R --> G["Goal State"]
R --> C["Learnable Scaffold Blueprint"]
S --> A["Answer Composer"]
G --> A
C --> W["Dynamic Shared-Weight Workers"]
W --> N["Bounded Note Pool"]
N --> A
A --> P["Safe Final Polish"]
P --> O["Final Answer"]
If Mermaid does not render on your viewer, the flow is: prompt -> router -> skills/goals/scaffold/workers -> compact evidence -> final answer -> safe polish.
Goals
Goals are runtime-only and stored as JSON. They do not alter the model API.
import sys
from pathlib import Path
sys.path.insert(0, str(Path(".").resolve()))
from swarm_moe_model.swarm_mode import PhillSwarmController, SwarmGoal
controller = PhillSwarmController.from_pretrained_or_config(".")
goal = SwarmGoal(
objective="Draft a small CLI plan for using this model with tool calls.",
constraints=["Keep it cross-platform", "Do not assume a hardcoded path"],
success_criteria=["Shows install", "Shows run", "Mentions permissions"],
allowed_tools=["filesystem_read", "web_search"],
)
run = controller.run_goal(goal)
print(run.final_answer)
run.state.to_json("goal_state.json")
Agentic Sessions
Agentic sessions keep multi-turn work coherent without appending every old token forever.
- Latest user prompt remains the authority.
- Recent turns stay in a rolling window.
- Older turns flush into compact summaries.
- Tool results, route decisions, and artifacts stay as metadata.
- KV cache is used for the active generation window, not falsely persisted across independent turns.
session = controller.create_session("workspace-task")
print(session.ask("Remember that we want a portable CLI setup.", mode="swarm").answer)
print(session.ask("Now give the final install checklist.", mode="swarm").answer)
session.state.to_json("session_state.json")
Skills And Domain Routing
The runtime includes compact verified skills and route anchors. They are designed to reduce prompt overload: the model sees only the selected route and a few verified evidence anchors, not the entire tool registry.
Current route families include:
- general chat and identity
- math and arithmetic
- coding and repo workflow
- web search planning
- browser/vision/tool operation planning
- research and citation discipline
- dynamic routing diagnostics
- training dataset and model finalization guidance
- runtime debugging and context-overload repair
- science explanation anchors
- swarm-mode architecture explanation
- AGI/runtime architecture policy
- life-domain policy: health, legal, finance, education, creative, productivity, data, security
- IDE/CLI integration: Cursor, VS Code, terminals, smolagents, JSON-schema tools, MCP-compatible tools, OpenAI-style tool calls
Tool And App Integration
swarm_runtime_config.json advertises:
{
"tool_protocols": ["json_schema", "smolagents", "mcp_compatible", "openai_tool_calls"],
"external_host_compatible": true,
"compact_tool_manifest": true
}
The intended pattern is:
- Host app or IDE supplies a compact tool manifest.
- Runtime routes the prompt to the smallest relevant tool set.
- Tool call is proposed as structured JSON.
- Permission layer approves or blocks it.
- Observation is returned to the model as compact evidence.
- Final answer cites what was actually observed.
Normal model loading does not execute tools and does not start the app.
Local App And Streaming
The app is packaged but disabled by default in config.json:
{
"swarm_app_enabled": false,
"swarm_app_config": "swarm_app_config.example.json"
}
Launch it explicitly from the model folder:
cd PhillSwarm-4b
python launch_swarm_app.py --install
On Windows, if python is not on PATH:
cd PhillSwarm-4b
py -3 launch_swarm_app.py --install
That one command installs the optional app extras when missing:
ddgsfor web searchplaywrightplus Chromium for browser observe/verify actions- npm dependencies for packaged TypeScript tools when
npmis available
From a source checkout, use:
python scripts/launch_swarm_app.py --install --config configs/phill_swarm_app.json
The browser tool is visible by default (browser_headless: false) so users can see what the agent is doing. To save memory or run on a server:
python launch_swarm_app.py --install --headless
Equivalent environment override:
PHILLNET_BROWSER_HEADLESS=1 python launch_swarm_app.py
On Windows PowerShell:
$env:PHILLNET_BROWSER_HEADLESS="1"; py -3 launch_swarm_app.py
Useful setup flags:
--no-browser-installskips Playwright browser download.--no-npm-installskips npm tool dependency install.--visible-browserforces headed Playwright windows.--capture-dir state/browser_captureschanges browser snapshot storage.--permission-mode defaultkeeps safe read/search/observe actions only.--permission-mode yoloenables stronger browser/tool actions with tracing.
When launched explicitly through the wrapper/app server, it can expose:
/api/status/api/chat/api/chat/stream/api/tools/route/api/tool/call
Streaming uses Server-Sent Events for route, profile, critic, tool, goal, token, final, and error events. Tool execution remains permission-gated.
Ollama-Compatible Full Runtime
PhillSwarm includes an Ollama-compatible full-runtime bridge. This is the recommended Ollama path when you want coherent PhillSwarm behavior.
phillswarm-4b:full. This keeps the full HF checkpoint, swarm controller, verified skills, tools, goals, and vision-sidecar runtime available behind Ollama-style APIs.
Why this exists: PhillSwarm is not only a plain GGUF transformer. It uses bundled HF remote code, shared-expert routing, gated V-norm behavior, a Python swarm controller, goals, app tools, and a vision sidecar. Stock Ollama/llama.cpp does not execute those Python runtime systems inside a .gguf file. The bridge keeps Ollama-style compatibility while preserving the full model system instead of flattening it into a weaker preview.
Run The Coherent Ollama Path
Download or clone the snapshot, then run the one-time setup from inside the model folder.
macOS/Linux:
sh setup_phillswarm_ollama.sh
Optional executable form:
chmod +x setup_phillswarm_ollama.sh
./setup_phillswarm_ollama.sh
Windows CMD:
setup_phillswarm_ollama.cmd
Windows PowerShell:
.\setup_phillswarm_ollama.ps1
The setup scripts are cross-OS and do the same job:
- set
OLLAMA_HOST=http://127.0.0.1:11435 - start
launch_ollama_bridge.py --model . --port 11435if the bridge is not already running - wait for
/api/tagsto respond - run
ollama listso the user can confirmphillswarm-4b:fullis visible
Manual fallback for any OS:
python launch_ollama_bridge.py --model . --port 11435
Then set the current terminal:
export OLLAMA_HOST=http://127.0.0.1:11435
Windows PowerShell manual fallback:
$env:OLLAMA_HOST="http://127.0.0.1:11435"
Then use this model name from Ollama-compatible clients:
phillswarm-4b:full
If using an Ollama-style HTTP client, call the bridge directly:
GET /api/tags
POST /api/show
POST /api/generate
POST /api/chat
Example:
curl http://127.0.0.1:11435/api/chat \
-H "Content-Type: application/json" \
-d '{"model":"phillswarm-4b:full","stream":false,"messages":[{"role":"user","content":"What is 2+2? Answer in one sentence."}]}'
Observed smoke output:
{
"message": {"role": "assistant", "content": "2+2 = 4."},
"phill": {
"full_runtime": true,
"indicators": {"bridge": "ollama-compatible-fast-verified-skill"}
}
}
Runtime Modes
| Path | Status | Best Use |
|---|---|---|
phillswarm-4b:full bridge |
Recommended and coherent | Ollama-compatible clients that should use the real PhillSwarm runtime. |
HF trust_remote_code=True |
Recommended and native | Python/Transformers users who want direct model and controller access. |
| GGUF preview | Experimental loadability preview | Testing tokenizer/shape compatibility in stock Ollama, not full-quality intelligence. |
What The Bridge Preserves
- HF
trust_remote_code=Truemodel loading. PhillSwarmControllerswarm/regular routing.- App tool routing, permission checks, goals, indicators, and web/search/browser integration path.
- Vision sidecar availability from the HF snapshot.
- Ollama-compatible JSON and NDJSON streaming response shapes.
- Fast verified-skill routing before heavy model generation when a deterministic answer is already known.
- Lazy loading, so
/api/tagsand/api/showrespond quickly while the BF16 model loads only for real generation.
MCP / IDE Bridge
The app exposes a local MCP-style bridge for external agent hosts and IDEs:
JSON-RPC MCP endpoint: http://127.0.0.1:7860/mcp
Manifest: http://127.0.0.1:7860/api/mcp/manifest
Tools: http://127.0.0.1:7860/api/mcp/tools
Route prompt: http://127.0.0.1:7860/api/mcp/route
Call tool: http://127.0.0.1:7860/api/mcp/call
Chat through app: http://127.0.0.1:7860/api/mcp/chat
Well-known manifest: http://127.0.0.1:7860/.well-known/phill-swarm-mcp.json
Use it with Codex, Claude Code, Cursor, Antigravity-style hosts, or any local MCP/HTTP client that can call a JSON-RPC tool server. The endpoint supports:
initializetools/listtools/callphill/routephill/chat
The bridge is private by default because the app binds to 127.0.0.1. To expose it to another machine on your LAN, launch explicitly:
python launch_swarm_app.py --host 0.0.0.0
Then open /api/mcp/status to see the LAN URL. Only use LAN mode on a trusted network. For shared machines or public networks, set mcp_auth_token in swarm_app_config.example.json and send Authorization: Bearer <token> from the client.
For ChatGPT-style use, the intended pattern is different from Codex/Cursor: ChatGPT can remain the language model while calling Phill's app routes for scaffolding, routing, goals, tools, browser observation, and verification. That lets the app act as a local swarm/tool runtime without replacing the external model.
Vision Sidecar
This package includes runtime metadata for an optional Qwen3.5-style vision sidecar:
vision_sidecar_enabled: truevision_sidecar_path: "vision_sidecar"vision_snapshot_policy: "retain_latest_only"
Vision is runtime sidecar behavior, not ordinary text-generation behavior. The text embedding table is not resized for vision marker tokens; pixel tensors and browser snapshots route through external processor metadata/sidecar paths.
Learnable Scaffolding
Learnable scaffolding uses scaffold_blueprint.json.
It stores:
- signal weights
- node weights
- compact learned tidbits
- examples seen
- update timestamp
This is zero-extra-model-weight runtime memory. It improves scaffold node selection and confidence without mutating model weights during normal generation.
{
"learnable_scaffolding": true,
"scaffold_blueprint_path": "scaffold_blueprint.json",
"inject_scaffold_into_prompt": false
}
Training Losses
The model includes optional auxiliary losses for train-time routing/scaffold behavior:
{
"router_aux_loss_coef": 0.01,
"router_z_loss_coef": 0.001,
"router_entropy_loss_coef": 0.0001,
"router_confidence_loss_coef": 0.0,
"thinking_consistency_loss_coef": 0.0001,
"scaffold_alignment_loss_coef": 0.0001
}
These are active during training when labels are provided. They are not extra inference-time model copies.
Guarded Online Learning
Online learning support exists, but is disabled by default.
{
"online_learning_enabled": false,
"online_learning_lr": 1e-7,
"online_learning_max_grad_norm": 0.05,
"online_learning_train_top_layers": 2,
"online_learning_min_trust": 0.25,
"online_learning_max_updates": 32
}
When explicitly enabled and called through learn_from_correction(...), it:
- adapts only a tiny top-layer/head surface
- deduplicates shared tensors
- clips gradients
- tracks temporal trust
- probes counterfactual loss
- reverts failed updates
This is experimental and should be used only for approved corrections or controlled local adaptation.
Final Polish Pass
Final polish is enabled in safe mode:
{
"enable_final_polish": true,
"final_polish_mode": "safe"
}
It receives only the latest user prompt and the verified final answer. It cannot see raw worker notes or rejected candidates. If the polish drifts, changes numbers, becomes too short/long, or loses overlap with the verified answer, the runtime keeps the original verified answer.
Validation Summary
From the included reports:
- HF config/tokenizer/model load passed.
- CUDA forward passed.
- Controller loaded with 40 routed layers.
- No-profile swarm smoke returned a verified black-hole answer through
science_explanation. - Dynamic profile test showed noisy raw workers are rejected and verified skills preserve the answer.
- Final polish test preserved verified answers when the polish attempt failed validation.
- Learnable scaffold test saved/reloaded blueprint state.
- Auxiliary-loss tiny-model test passed forward/loss/backward.
The package includes detailed reports:
QWEN35_4B_FINAL_REPORT.mdQWEN35_4B_COHERENCE_REPORT.mdDYNAMIC_SWARM_ORCHESTRATION_REPORT.mdAGENTIC_SESSION_RUNTIME_REPORT.mdAGI_SKILL_ROUTE_EXPANSION_REPORT.mdLEARNABLE_SCAFFOLDING_REPORT.mdLEARNABLE_LOSSES_AND_ONLINE_LEARNING_REPORT.mdFINAL_POLISH_PASS_REPORT.md
Known Limits
- Raw direct generation can still be weaker than verified runtime answers.
- Profile generation is not enabled by default because raw worker text can be noisy.
- Online learning is disabled by default and should not be treated as automatic safe self-training.
- Vision sidecar is runtime behavior; normal text generation does not become a full browser-vision agent by itself.
- This README describes implemented local runtime features, not independent benchmark superiority over frontier commercial systems.
File Map
| File | Purpose |
|---|---|
config.json |
HF model config and passive runtime metadata |
model.safetensors |
model weights |
configuration_swarm_moe.py |
HF config remote-code file |
modeling_swarm_moe.py |
HF model remote-code file |
tokenizer.json, tokenizer_config.json |
tokenizer assets |
swarm_runtime_config.json |
wrapper/runtime config |
scaffold_blueprint.json |
learnable scaffold runtime memory |
swarm_moe_model/ |
optional local runtime package |
vision_sidecar.py, vision_sidecar/ |
optional runtime vision sidecar |
swarm_app_config.example.json |
app config example |
Recommended Use Cases
- Local research into sparse MoE routing and shared-weight agent orchestration.
- Tool-aware chat wrappers where tool execution is explicit and permissioned.
- IDE/CLI assistants that need compact tool manifests and traceable routes.
- Agentic task runners that need JSON goal state, session memory, and recoverable progress.
- Experiments with scaffold learning and safe online correction workflows.
- Educational exploration of MoE, routing losses, and wrapper-based agent design.
Minimal Requirements
- Python environment with PyTorch and Transformers.
trust_remote_code=True.- bf16-capable CUDA is recommended for the 4B package.
- CPU loading may be possible but will be slow.
Attribution And Development Notes
Phill Swarm-MoE is a custom project by Phillip A. Holland / Ayjays132. This package contains a grown hybrid checkpoint and runtime code intended for local experimentation, HF-style loading, and publishable inspection. It is built to be transparent about what is model behavior, what is runtime orchestration, and what is experimental.
- Downloads last month
- 11