tjarvis91
/

Q-Coder-50M-Sovereign

@@ -19,66 +19,68 @@ tags:
 # Q-Coder-50M-Sovereign — Python code one-liners + small function skeletons
-## Proprietary Qovaryx technology — built on our own scratch base
-This is a **53.5M-parameter sovereign specialist** in the Qovaryx Compact
-Specialist Suite. It is full-fine-tuned from
-[`tjarvis91/qovaryx-50m-scratch-base`](https://huggingface.co/tjarvis91/qovaryx-50m-scratch-base) —
-**our own scratch-trained base, not a borrowed foundation model**.
-- **Base:** Qovaryx 50M scratch base. Pretrained from random initialization on
-  491.5M tokens of curated text. **Not SmolLM2. Not Qwen. Not Llama. Not Mistral. Not Phi.**
-  No HuggingFace base. No closed-source weights. Every parameter in this checkpoint
-  traces back to a Qovaryx training run on Qovaryx hardware.
-- **Tokenizer:** Qovaryx english_v1 BPE (vocab 32000), built in-house against our
-  pretraining corpus. **Not the SmolLM2 tokenizer. Not the Llama tokenizer.**
-- **Architecture:** Qovaryx FinanceDecoder — 12 decoder blocks, GQA, RoPE,
-  SwiGLU FFN, RMSNorm, MTP heads, decision head. Designed in the Bleeding Edge
-  research line for compact local-sovereign cognition.
-- **Recipe:** Qovaryx crystallization corpus discipline — train the law before
-  replaying the noise. See the [public research devlog](https://github.com/thron-j/qovaryx-ai-research)
-  for the architectural framing.
-- **Runs on CPU.** No GPU required at inference.
-The entire stack — base, tokenizer, model class, training recipe, eval gate,
-crystal corpus — is Qovaryx proprietary technology. The decision to publish
-the **weights and the audit** under Apache 2.0 is deliberate; the build pipeline
-and the corpus stay private.
-## What this is
 Given a short natural-language Python task, returns the smallest correct Python expression or function that solves it. Trained on aggregate ops (sum/min/max/len/avg over named lists), string ops (reverse/upper/lower/title/palindrome), list comprehensions (even/odd/positive/squares/doubles), dict .get(default), small function definitions, try/except wrappers, class skeletons, and basic file I/O. Designed for fast structured code emission, not free-form programming.
-## What this is NOT
-- **Not a general-purpose chatbot.** This head does one job. Free-text generation outside
-  the trained task surface is not supported and will degrade.
-- **Not reproducible from scratch.** The crystal corpus, the eval gate
-  constants, and the training hyperparameters are intentionally not published.
-- **Not a replacement for a verifier.** This is one component in the
-  Qovaryx [cluster-shell](https://github.com/thron-j/qovaryx-ai-research)
-  architecture. The decision-acceptance discipline lives in the wrapper, not
-  in the head.
 ## Honest performance
 - **Task:** compact Python code generation
 - **Metric:** `exact_match` (string-equal after strip + lowercase)
-- **Holdout:** n=53 (date-disjoint, never seen in training)
 - **Score:** **100.0%** mean
 - **Bootstrap CI 95% lower bound:** 1.000
 - **Gate threshold:** 0.90
-- **Verdict:** PASS at both point estimate and CI lower bound
 ## Example
 ```
-USER: Define a function `square` that returns x squared.
-ASSISTANT: def square(x):
     return x * x
 ```
 ## Architecture (Qovaryx proprietary)
 - 53.5M parameters
@@ -87,12 +89,10 @@ ASSISTANT: def square(x):
 - Multi-token prediction (MTP) auxiliary heads
 - Decision head for routed-decision tasks
 - Tokenizer: Qovaryx `english_v1` BPE, vocab 32000 (in-house build)
-- Pretrained from `qovaryx-50m-scratch-base` step 60000 — 491.5M tokens, our scratch
-  lineage from random initialization
-- Full fine-tune (no LoRA, no QLoRA, no adapter): every parameter was updated
-  on the Qovaryx crystal corpus for this specialist
-## How to use
 ```python
 import torch
@@ -101,8 +101,7 @@ from bleeding_edge.model.decoder import FinanceDecoder, DecoderConfig
 tok = Tokenizer.from_file("tokenizer.json")
 ckpt = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)
-cfg = DecoderConfig(**{k: v for k, v in ckpt["model_cfg"].items()
-                         if k in DecoderConfig.__dataclass_fields__})
 cfg.vocab_size = tok.get_vocab_size()
 model = FinanceDecoder(cfg).eval()
 state = {k.removeprefix("_orig_mod."): v for k, v in ckpt["model_state"].items()}
@@ -112,35 +111,42 @@ prompt = "Define a function `square` that returns x squared."
 ids = tok.encode(prompt).ids
 cur = torch.tensor([ids], dtype=torch.long)
 with torch.no_grad():
-    for _ in range(80):
         nxt = int(torch.argmax(model(cur, return_decision=False).logits[:, -1, :], dim=-1))
         if nxt == 0: break
         cur = torch.cat([cur, torch.tensor([[nxt]])], dim=1)
 print(tok.decode(cur[0].tolist()[len(ids):]))
 ```
-The `bleeding_edge` package is open-source at
-[github.com/thron-j/qovaryx-ai-research](https://github.com/thron-j/qovaryx-ai-research)
-(architecture notes only; full source ships with the Qovaryx runtime).
 ## License & posture
 Apache 2.0 for the published weights, model card, and example code.
-The Qovaryx scratch base, the crystallization corpus, the eval gate constants,
-the cluster routing policy, and the training pipeline are **Qovaryx proprietary
-technology** and are not included in this release. This is the same posture as
-the rest of the Qovaryx public catalog: ship the weights and the audit, not
-the recipe.
-## Sibling specialists
-The other heads in the Qovaryx Compact Specialist Suite share the same base
-and audit discipline. See the
-[Qovaryx research devlog](https://github.com/thron-j/qovaryx-ai-research)
-for the full cluster framing.
 ## Watermark
-This release carries a SHA256 issue fingerprint inside `model_cfg._qovaryx_watermark`
-for tamper-detection and attribution. See `release.json` for the canonical record.

 # Q-Coder-50M-Sovereign — Python code one-liners + small function skeletons
+> **Python task in. Smallest correct expression out. No fences. No prose.**
+## What this model does, in one sentence
 Given a short natural-language Python task, returns the smallest correct Python expression or function that solves it. Trained on aggregate ops (sum/min/max/len/avg over named lists), string ops (reverse/upper/lower/title/palindrome), list comprehensions (even/odd/positive/squares/doubles), dict .get(default), small function definitions, try/except wrappers, class skeletons, and basic file I/O. Designed for fast structured code emission, not free-form programming.
 ## Honest performance
 - **Task:** compact Python code generation
 - **Metric:** `exact_match` (string-equal after strip + lowercase)
+- **Holdout:** n=53 rows, never seen in training, scored row-by-row
 - **Score:** **100.0%** mean
 - **Bootstrap CI 95% lower bound:** 1.000
 - **Gate threshold:** 0.90
+- **Verdict:** PASS at point estimate AND at bootstrap CI lower bound
+## What it's used for — real workflows
+- **Inline boilerplate emitter for IDEs** — Wire Q-Coder into an editor extension; ask 'reverse a string', get back 's[::-1]'. The point isn't autocomplete — it's deterministic emission of the small idioms you'd otherwise type.
+- **Notebook one-liner generator** — Quick prompts in a notebook: 'sum of values', 'filter positives from nums'. Q-Coder returns the one-liner ready to paste.
+- **Snippet expansion in chat ops** — Slack/Discord bot: 'qcoder def add' returns the canonical add(a,b). Cheap, deterministic, on-prem.
+- **Test-skeleton generation** — Ask for a class skeleton with __init__ + a method; get a clean Python class body to fill in.
+## What problem this actually solves
+Coding LMs are usually trained to generate sprawling, fence-wrapped, explanation-heavy code. Q-Coder is the opposite: tight expressions for tight tasks, no markdown, no chat, no fences. Use it as the focused emission step inside a bigger workflow.
+## Integration paths
+- **Editor snippet engine** — Wire to a VS Code / JetBrains extension as a code-snippet generator.
+- **Q-Office-Suite runtime** — POST /run/q-coder with the natural-language task.
+- **Pair with Q-SheetExtract** — Q-SheetExtract gives you fields; Q-Coder gives you the expression that computes whatever aggregate you actually wanted.
 ## Example
+Input:
+```
+Define a function `square` that returns x squared.
 ```
+Output:
+```
+def square(x):
     return x * x
 ```
+## What this is NOT
+- **Not a general-purpose chatbot.** This head does one job and does it consistently. Free-text generation outside the trained task surface will degrade.
+- **Not a replacement for a verifier.** This is one component in the Qovaryx cluster-shell architecture. The decision-acceptance discipline lives in the wrapper, not in the head.
+- **Not reproducible from this card.** Weights and audit are public; the crystal corpus, eval gate constants, and training hyperparameters are not.
+## Proprietary Qovaryx technology — built on our own scratch base
+This is a **53.5M-parameter sovereign specialist** in the Qovaryx Compact Specialist Suite. It is full-fine-tuned from [`tjarvis91/qovaryx-50m-scratch-base`](https://huggingface.co/tjarvis91/qovaryx-50m-scratch-base) — **our own scratch-trained base, not a borrowed foundation model**.
+- **Base:** Qovaryx 50M scratch base. Pretrained from random initialization on 491.5M tokens. **Not SmolLM2. Not Qwen. Not Llama. Not Mistral. Not Phi.** No HuggingFace foundation. No closed-source weights. Every parameter traces back to a Qovaryx training run on Qovaryx hardware.
+- **Tokenizer:** Qovaryx `english_v1` BPE (vocab 32000), built in-house against our own pretraining corpus.
+- **Architecture:** Qovaryx FinanceDecoder — 12 decoder blocks, GQA, RoPE, SwiGLU FFN, RMSNorm, MTP heads, decision head.
+- **Recipe:** Qovaryx crystallization discipline — train the law before replaying the noise.
+- **Runs on CPU.** No GPU required at inference.
 ## Architecture (Qovaryx proprietary)
 - 53.5M parameters
 - Multi-token prediction (MTP) auxiliary heads
 - Decision head for routed-decision tasks
 - Tokenizer: Qovaryx `english_v1` BPE, vocab 32000 (in-house build)
+- Pretrained from `qovaryx-50m-scratch-base` step 60000 — 491.5M tokens
+- Full fine-tune (no LoRA, no QLoRA, no adapter): every parameter was updated on the Qovaryx crystal corpus for this specialist
+## How to load it (Python)
 ```python
 import torch
 tok = Tokenizer.from_file("tokenizer.json")
 ckpt = torch.load("pytorch_model.pt", map_location="cpu", weights_only=False)
+cfg = DecoderConfig(**{k: v for k, v in ckpt["model_cfg"].items() if k in DecoderConfig.__dataclass_fields__})
 cfg.vocab_size = tok.get_vocab_size()
 model = FinanceDecoder(cfg).eval()
 state = {k.removeprefix("_orig_mod."): v for k, v in ckpt["model_state"].items()}
 ids = tok.encode(prompt).ids
 cur = torch.tensor([ids], dtype=torch.long)
 with torch.no_grad():
+    for _ in range(120):
         nxt = int(torch.argmax(model(cur, return_decision=False).logits[:, -1, :], dim=-1))
         if nxt == 0: break
         cur = torch.cat([cur, torch.tensor([[nxt]])], dim=1)
 print(tok.decode(cur[0].tolist()[len(ids):]))
 ```
 ## License & posture
 Apache 2.0 for the published weights, model card, and example code.
+The Qovaryx scratch base build pipeline, the crystallization corpus, the eval gate constants, the cluster routing policy, and the protected runtime entrypoint are **Qovaryx proprietary technology** and are not included in this release. Same posture as every previous Qovaryx public release: ship the weights and the audit, not the recipe.
+## Sibling specialists in the Qovaryx Q-Office-Suite
+All nine specialists share the `qovaryx-50m-scratch-base` and the same audit discipline. Use one directly; use all nine through the cluster shell.
+- [Q-Triage](https://huggingface.co/tjarvis91/Q-Triage-50M-Sovereign) — ticket routing
+- [Q-DocCite](https://huggingface.co/tjarvis91/Q-DocCite-50M-Sovereign) — document citation
+- [Q-Invoice](https://huggingface.co/tjarvis91/Q-Invoice-50M-Sovereign) — invoice extraction
+- [Q-ToolCall](https://huggingface.co/tjarvis91/Q-ToolCall-50M-Sovereign) — agent tool-calls
+- [Q-Meeting](https://huggingface.co/tjarvis91/Q-Meeting-50M-Sovereign) — meeting structuring
+- [Q-FinCite](https://huggingface.co/tjarvis91/Q-FinCite-50M-Sovereign) — 10-K/10-Q citation
+- [Q-CmdSafe](https://huggingface.co/tjarvis91/Q-CmdSafe-50M-Sovereign) — command safety triage
+- [Q-SheetExtract](https://huggingface.co/tjarvis91/Q-SheetExtract-50M-Sovereign) — spreadsheet extraction
+- [Q-Coder](https://huggingface.co/tjarvis91/Q-Coder-50M-Sovereign) — Python code skeletons
 ## Watermark
+This release carries a SHA256 issue fingerprint inside `release.json` for tamper-detection and attribution.
+## Community & support
+- **Research devlog:** https://github.com/thron-j/qovaryx-ai-research
+- **Discord (Qovaryx community):** https://discord.gg/PtuHZDv5ju
+- **Ko-fi (we cover GPU bills):** https://ko-fi.com/tjarvis91
+- **Qovaryx options decoder runtime:** https://huggingface.co/Qovaryx/qovaryx-options-decoder-full-community
+If you find a failure mode this card doesn't cover, open a discussion on this repo or come to the Discord — that's how the next crystal corpus gets written.

release.json CHANGED Viewed

@@ -13,8 +13,8 @@
     "issuer": "Qovaryx AI / Thomas Jarvis",
     "specialist": "q-coder",
     "release_id": "qovaryx-sovereign-2026-06-02",
-    "released_at": "2026-06-02T08:35:45Z",
-    "fingerprint": "4c167a5bdf82bb30a54056021790f74a852db0fff777f0b95daf19230166967a",
     "base_model": "tjarvis91/qovaryx-50m-scratch-base",
     "policy": "This checkpoint is a sovereign Qovaryx specialist. It is full-fine-tuned from qovaryx-50m-scratch-base. Redistribution allowed under Apache 2.0. Fingerprint is for downstream attribution and tamper-detection."
   }

     "issuer": "Qovaryx AI / Thomas Jarvis",
     "specialist": "q-coder",
     "release_id": "qovaryx-sovereign-2026-06-02",
+    "released_at": "2026-06-02T09:15:24Z",
+    "fingerprint": "9c7358dca71cf14e421adb5861dae0404800b931cd3ca8b6a3a8c283eb9d06d5",
     "base_model": "tjarvis91/qovaryx-50m-scratch-base",
     "policy": "This checkpoint is a sovereign Qovaryx specialist. It is full-fine-tuned from qovaryx-50m-scratch-base. Redistribution allowed under Apache 2.0. Fingerprint is for downstream attribution and tamper-detection."
   }