Q-Office-Suite Runtime β local HTTP server hosting all 9 sovereign specialists
This repo ships the Q-Office-Suite runtime: a standalone Nuitka-compiled Windows binary that hosts the nine sovereign Qovaryx specialists behind a local HTTP API. CPU inference. No GPU required. No internet required after download.
All nine specialist weights are published openly under Apache 2.0 at their per-model cards. The runtime entrypoint and dispatch are Qovaryx proprietary technology β same posture as the options decoder runtime: weights and audit are open; entrypoint and recipe stay private.
The nine specialists hosted
All nine are 53.5M-parameter full-finetunes from
tjarvis91/qovaryx-50m-scratch-base.
No SmolLM2. No Qwen. No Llama. No borrowed foundation weights.
| Specialist | Job | Score |
|---|---|---|
| Q-Triage | Support ticket routing | 100% (60/60) |
| Q-DocCite | Document citation w/ page anchor | 100% (60/60) |
| Q-Invoice | Invoice JSON extractor | 100% (60/60) |
| Q-ToolCall | Agent tool-call JSON | 100% (60/60) |
| Q-Meeting | Meeting note structurer | 100% (60/60) |
| Q-FinCite | 10-K/10-Q citation | 100% (60/60) |
| Q-CmdSafe | Shell command safety triage | 100% (60/60) |
| Q-SheetExtract | Spreadsheet field extractor | 100% (37/37) |
| Q-Coder | Python code one-liners + skeletons | 100% (53/53) |
What's in this repo
q_office_suite.exeβ single-file Windows binary, Nuitka onefile build, ~1.98 GB. Bundles Python 3.10 + PyTorch CPU + tokenizers + the cluster shell. No installer; just run.q_office_suite.exe.sha256β SHA256 hash for tamper-detection.README.mdβ this file.
What you need to provide
The runtime ships the dispatch layer; the 9 specialist weights are downloaded separately from their HF cards. Layout the runtime expects:
<your-dir>/
q_office_suite.exe
tokenizer.json # from any of the 9 specialist repos (they share)
weights/
q-triage-50m-v2/final.pt # from tjarvis91/Q-Triage-50M-Sovereign
q-doccite-50m-v2/final.pt # from tjarvis91/Q-DocCite-50M-Sovereign
q-docextract-50m-v1/final.pt # from tjarvis91/Q-Invoice-50M-Sovereign
q-toolcall-50m-v1/final.pt # from tjarvis91/Q-ToolCall-50M-Sovereign
q-meeting-50m-v1/final.pt # from tjarvis91/Q-Meeting-50M-Sovereign
q-fincite-50m-v1/final.pt # from tjarvis91/Q-FinCite-50M-Sovereign
q-devsafe-50m-v1/final.pt # from tjarvis91/Q-CmdSafe-50M-Sovereign
q-sheetextract-50m-v4/final.pt # from tjarvis91/Q-SheetExtract-50M-Sovereign
q-coder-50m-v2/final.pt # from tjarvis91/Q-Coder-50M-Sovereign
Total disk for all 9 weights: ~3 GB. Total RAM at idle: ~1 GB. RAM per active specialist: ~250 MB.
How to run
$env:Q_OFFICE_WEIGHTS_DIR = "C:\path\to\weights"
$env:Q_OFFICE_TOKENIZER = "C:\path\to\tokenizer.json"
$env:Q_OFFICE_HOST = "127.0.0.1"
$env:Q_OFFICE_PORT = "8788"
.\q_office_suite.exe
First launch performs Nuitka onefile self-extraction (~30 s for the 2 GB
payload to %TEMP%). Subsequent launches reuse the extracted cache.
Once up, the runtime listens on http://127.0.0.1:8788:
Endpoints
GET /healthβ{ok, loaded}GET /specialistsβ list of specialist keys + descriptionsPOST /ask {text, [system], [max_new]}β route + run; returns the dispatch decision + outputPOST /run/<key> {text, [system], [max_new]}β force-route to a specific specialist
Example: Q-Triage via /ask
curl -X POST http://127.0.0.1:8788/ask \
-H "Content-Type: application/json" \
-d '{"text":"Triage. Return JSON {category, priority}.\nSubject: 502 errors since 14:00 deploy"}'
Response:
{
"specialist": "q-triage",
"route_reason": "matched 2/2 cues",
"route_confidence": 1.0,
"output": "{\"category\": \"incident/sev2\", \"priority\": \"high\"}"
}
Example: Q-Coder via /run
curl -X POST http://127.0.0.1:8788/run/q-coder \
-H "Content-Type: application/json" \
-d '{"text":"Define a function square that returns x squared."}'
Response:
{
"specialist": "q-coder",
"output": "def square(x):\n return x * x"
}
What this is NOT
- Not a chatbot frontend. This is an HTTP backend for embedding in other applications. Bring your own UI.
- Not a Linux/macOS binary. This release is Windows only. Source-tree Python invocation works cross-platform β see the per-specialist cards.
- Not a GPU runtime. CPU only by design. The full suite runs in ~1 GB RAM with sub-second latency per call on a modern laptop.
- Not a replacement for a verifier. This is the dispatch layer. The decision-acceptance discipline lives upstream / downstream.
License & posture
The weights (each specialist's pytorch_model.pt) are Apache 2.0 at their
per-model HF cards.
The Q-Office-Suite runtime entrypoint, the cluster shell routing policy, the crystal corpora, the eval gate constants, and the training pipeline are Qovaryx proprietary technology and are not included in this release.
This is the same posture as every previous Qovaryx public release: ship the weights and the audit, not the recipe.
Watermark
The binary is Nuitka-compiled; the dispatch layer is not source-readable
without reverse engineering. SHA256 fingerprint is in
q_office_suite.exe.sha256 for tamper-detection and attribution.
Community & support
- Research devlog: https://github.com/thron-j/qovaryx-ai-research
- Discord (Qovaryx community): https://discord.gg/PtuHZDv5ju
- Ko-fi (we cover GPU bills): https://ko-fi.com/tjarvis91
- Qovaryx options decoder runtime (sibling release): https://huggingface.co/Qovaryx/qovaryx-options-decoder-full-community
If you find a routing-decision failure mode the readme doesn't cover, open a discussion here or come to the Discord.
- Downloads last month
- 8