Q-Office-Suite Runtime β€” local HTTP server hosting all 9 sovereign specialists

This repo ships the Q-Office-Suite runtime: a standalone Nuitka-compiled Windows binary that hosts the nine sovereign Qovaryx specialists behind a local HTTP API. CPU inference. No GPU required. No internet required after download.

All nine specialist weights are published openly under Apache 2.0 at their per-model cards. The runtime entrypoint and dispatch are Qovaryx proprietary technology β€” same posture as the options decoder runtime: weights and audit are open; entrypoint and recipe stay private.

The nine specialists hosted

All nine are 53.5M-parameter full-finetunes from tjarvis91/qovaryx-50m-scratch-base. No SmolLM2. No Qwen. No Llama. No borrowed foundation weights.

Specialist Job Score
Q-Triage Support ticket routing 100% (60/60)
Q-DocCite Document citation w/ page anchor 100% (60/60)
Q-Invoice Invoice JSON extractor 100% (60/60)
Q-ToolCall Agent tool-call JSON 100% (60/60)
Q-Meeting Meeting note structurer 100% (60/60)
Q-FinCite 10-K/10-Q citation 100% (60/60)
Q-CmdSafe Shell command safety triage 100% (60/60)
Q-SheetExtract Spreadsheet field extractor 100% (37/37)
Q-Coder Python code one-liners + skeletons 100% (53/53)

What's in this repo

  • q_office_suite.exe β€” single-file Windows binary, Nuitka onefile build, ~1.98 GB. Bundles Python 3.10 + PyTorch CPU + tokenizers + the cluster shell. No installer; just run.
  • q_office_suite.exe.sha256 β€” SHA256 hash for tamper-detection.
  • README.md β€” this file.

What you need to provide

The runtime ships the dispatch layer; the 9 specialist weights are downloaded separately from their HF cards. Layout the runtime expects:

<your-dir>/
  q_office_suite.exe
  tokenizer.json                  # from any of the 9 specialist repos (they share)
  weights/
    q-triage-50m-v2/final.pt      # from tjarvis91/Q-Triage-50M-Sovereign
    q-doccite-50m-v2/final.pt     # from tjarvis91/Q-DocCite-50M-Sovereign
    q-docextract-50m-v1/final.pt  # from tjarvis91/Q-Invoice-50M-Sovereign
    q-toolcall-50m-v1/final.pt    # from tjarvis91/Q-ToolCall-50M-Sovereign
    q-meeting-50m-v1/final.pt     # from tjarvis91/Q-Meeting-50M-Sovereign
    q-fincite-50m-v1/final.pt     # from tjarvis91/Q-FinCite-50M-Sovereign
    q-devsafe-50m-v1/final.pt     # from tjarvis91/Q-CmdSafe-50M-Sovereign
    q-sheetextract-50m-v4/final.pt # from tjarvis91/Q-SheetExtract-50M-Sovereign
    q-coder-50m-v2/final.pt       # from tjarvis91/Q-Coder-50M-Sovereign

Total disk for all 9 weights: ~3 GB. Total RAM at idle: ~1 GB. RAM per active specialist: ~250 MB.

How to run

$env:Q_OFFICE_WEIGHTS_DIR = "C:\path\to\weights"
$env:Q_OFFICE_TOKENIZER   = "C:\path\to\tokenizer.json"
$env:Q_OFFICE_HOST        = "127.0.0.1"
$env:Q_OFFICE_PORT        = "8788"
.\q_office_suite.exe

First launch performs Nuitka onefile self-extraction (~30 s for the 2 GB payload to %TEMP%). Subsequent launches reuse the extracted cache.

Once up, the runtime listens on http://127.0.0.1:8788:

Endpoints

  • GET /health β€” {ok, loaded}
  • GET /specialists β€” list of specialist keys + descriptions
  • POST /ask {text, [system], [max_new]} β€” route + run; returns the dispatch decision + output
  • POST /run/<key> {text, [system], [max_new]} β€” force-route to a specific specialist

Example: Q-Triage via /ask

curl -X POST http://127.0.0.1:8788/ask \
  -H "Content-Type: application/json" \
  -d '{"text":"Triage. Return JSON {category, priority}.\nSubject: 502 errors since 14:00 deploy"}'

Response:

{
  "specialist": "q-triage",
  "route_reason": "matched 2/2 cues",
  "route_confidence": 1.0,
  "output": "{\"category\": \"incident/sev2\", \"priority\": \"high\"}"
}

Example: Q-Coder via /run

curl -X POST http://127.0.0.1:8788/run/q-coder \
  -H "Content-Type: application/json" \
  -d '{"text":"Define a function square that returns x squared."}'

Response:

{
  "specialist": "q-coder",
  "output": "def square(x):\n    return x * x"
}

What this is NOT

  • Not a chatbot frontend. This is an HTTP backend for embedding in other applications. Bring your own UI.
  • Not a Linux/macOS binary. This release is Windows only. Source-tree Python invocation works cross-platform β€” see the per-specialist cards.
  • Not a GPU runtime. CPU only by design. The full suite runs in ~1 GB RAM with sub-second latency per call on a modern laptop.
  • Not a replacement for a verifier. This is the dispatch layer. The decision-acceptance discipline lives upstream / downstream.

License & posture

The weights (each specialist's pytorch_model.pt) are Apache 2.0 at their per-model HF cards.

The Q-Office-Suite runtime entrypoint, the cluster shell routing policy, the crystal corpora, the eval gate constants, and the training pipeline are Qovaryx proprietary technology and are not included in this release.

This is the same posture as every previous Qovaryx public release: ship the weights and the audit, not the recipe.

Watermark

The binary is Nuitka-compiled; the dispatch layer is not source-readable without reverse engineering. SHA256 fingerprint is in q_office_suite.exe.sha256 for tamper-detection and attribution.

Community & support

If you find a routing-decision failure mode the readme doesn't cover, open a discussion here or come to the Discord.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support