Use Docker images
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Vaultkeeper/Sovereign-Code" \
--host 0.0.0.0 \
--port 30000# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Vaultkeeper/Sovereign-Code",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'
Sovereign-Code
by VaultAI
Deployment Status: UNRELEASED
[ PRE-ALPHA ] SOVEREIGN-CODE & CORPUS-CALLOSUM | ARCHITECTING...
✅ Execution, Absolute.
While most models are built to converse, Sovereign-Code is built to execute. It is a specialized, cold-logic engine designed for a single purpose: high-fidelity technical output.
Engineered by VaultAI, Sovereign-Code is a custom 32-Layer Hybrid model. It utilizes an aggressive architectural "passthrough" to bridge the deep structural coding intelligence of Qwen 2.5 Coder with the rigid, high-instruction-following cortex of Llama 3.1. It does not offer opinions; it delivers functional syntax.
🧠 Architecture & Identity: The Logic Terminal
Sovereign-Code is a "Frankenmerge" that ignores standard architectural safety to achieve peak performance. By stacking disparate layers, VaultAI has created a model that processes raw intent through a coding-heavy base before filtering it through an elite instruction-following top-layer.
Key Capabilities:
- Deterministic Syntax: Optimized for zero-fluff code generation across Python, C++, Rust, and Mojo.
- Tattooed Monologue: Hardcoded via a custom Jinja2 template to engage in a mandatory three-phase internal processing loop inside
<think>tags before every output. - Hardware Optimized: Designed for dual-GPU configurations (Polaris/gfx803) using
llama.cppand Vulkan backends.
⚡ Performance & Benchmarks (Estimated)
Sovereign-Code is designed for maximum throughput on local consumer hardware (RX 570/580 8GB setups).
| Metric | Target Hardware | VRAM Footprint | Logic Mode |
|---|---|---|---|
| Quantization | Q4_K_M (GGUF) | ~9.2 GB | Full GPU Offload |
| Context Length | 32,768 Tokens | High Headroom | Optimized for Repo-level Debugging |
Standardized Accuracy Benchmarks
Benchmarks are currently queued for evaluation.
| Benchmark | Focus Area | Score | Status |
|---|---|---|---|
| HumanEval | Coding & Logic | TBD | ⏳ Pending Eval |
| MBPP | Python Programming | TBD | ⏳ Pending Eval |
| GSM8k | Mathematical Reasoning | TBD | ⏳ Pending Eval |
Model Details
- Type: Causal Language Model (Hybrid Passthrough)
- Base Architecture: Qwen 2.5 (7B) + Llama 3.1 (8B)
- Total Parameters: ~15B (Effective density via Layer Stacking)
- Merge Method: Passthrough / Frankenmerge
- Weights Composition:
- Base (Layers 0-16): Qwen2.5-Coder-7B-Instruct
- Cortex (Layers 16-32): Meta-Llama-3.1-8B-Instruct
- License: Other (See Base Model Licenses)
Why Sovereign-Code?
- The Execution Engine: No conversational "As an AI..." filler.
- Analytical Grounding: The built-in
<think>protocol forces the model to debug its own code conceptually before writing a single line. - Agentic Ready: Optimized for tool-calling and autonomous development workflows.
Install from pip and serve model
# Install SGLang from pip: pip install sglang# Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Vaultkeeper/Sovereign-Code" \ --host 0.0.0.0 \ --port 30000# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Vaultkeeper/Sovereign-Code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'