glyph-translator-v7

A small transducer that maps English prose into glyph โ€” a compact, operator-based notation for representing causal and structural claims as graphs of noun-entity nodes connected by a closed inventory of operators (โ†’ โ‡’ โŠฃ โŠ• โŠ– โ†” โ‰ก โ‰  โŠ‚ โŠƒ โˆˆ โˆ‹ โˆŒ โˆ€ โˆƒ โŠฅ ยฌ โˆง โˆจ โ†‘ โ†“ โˆ @ : and a few more).

It is part of the APE (Atomic Pumpkin Engine) toolchain. APE ingests prose, runs it through a curator โ†’ translator pipeline, and stores the resulting glyph as the canonical knowledge representation. v7 is the translator stage.

What it does

  • Input: curator-shaped English prose (active voice, explicit causation, named entities preserved).
  • Output: glyph โ€” noun-entity atoms plus closed-inventory operators, with chain/bundle compression.

Lineage and why v7 exists

This is the seventh attempt at a small dedicated translator in the APE line. The short version of the lineage:

  • v4 โ€” full FT of gemma-3-1b-it, chat-template + masked assistant labels, 40/40/20 system-message variance (canonical / bare / adversarial). Empirically unbreakable, prompt-less.
  • v5 โ€” silently switched to LoRA on attention (q/k/v/o), single prompt template. Shipped, but rigid against its training shape.
  • v6 โ€” dropped the wrapper, added eos_weight=10ร— to compensate for the frozen lm_head. Lost ~4-5pp fidelity vs v5; still leaked EOS issues because LoRA-on-attention can't move the unembedding row for EOS or glyph tokens.
  • v7 โ€” explicit revert to v4's training method (full FT, 40/40/20) on v5's noun-only corpus (regen-lambda-20260427-gemma26b-nounonly-v3), plus a v7.5 early-stop noise-band fix that lifted rescored fidelity past the gate.

TL;DR of the decision rationale: full-FT-vs-LoRA and prompt-shape-diversity are separate methodological variables and should not be swapped silently.

Architecture

  • Base: google/gemma-3-1b-it (instruction-tuned).
  • Method: full fine-tune (no adapters).
  • Sequence: chat template with masked assistant labels.
  • System-message variance: 40% canonical glyph instructions, 40% bare (no system message), 20% adversarial system message โ€” same target glyph in all three.

Training data

  • ~23k training pairs from splits.v5-split-v1 (train view) of the APE corpus, materialised from corpus.db.
  • Source pairs are (curator_prose, target_glyph) produced by gemma-4-26b-a4b-it running the curator + noun-only translator prompts.
  • ~3k val / ~3k heldout from the same split.

Evaluation

Gold panel scorer (models/translator/v5/eval/eval_harness.py):

metric (heldout, rescored) v7.5
bare-input fidelity 0.764
injection (under adversarial system) fidelity 0.768
gate (โ‰ฅ โ‰ˆ0.70) pass

Other gates measured during run selection: terminator EOS fire rate, n-gram repetition canary, length p50/p95/max distribution match. v7.5's rescored result is what crossed the fidelity gate after an early-stop noise-band fix.

For comparison: gemma-26b teacher (single-pass, same prompt) scored 0.736 on the same panel.

Intended use

  • The translator stage in APE's ingest pipeline. Curator prose in, glyph out.
  • Distillation of larger teacher pipelines into a 1B-parameter inference target.
  • Research on small dedicated transducers for structured output over custom symbol vocabularies.

Out of scope

  • Direct chat / general instruction following โ€” the model is specialised on a narrow prose โ†’ glyph mapping.
  • Glyph โ†’ prose (decompression) โ€” that's a separate model.
  • Inputs without curator-shaped framing โ€” v7 is robust to bare and adversarial system messages by design, but its training distribution is curator output. Quality on raw arbitrary prose will degrade.

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("wimpSquad/glyph-translator-v7")
model = AutoModelForCausalLM.from_pretrained("wimpSquad/glyph-translator-v7")

msgs = [{"role": "user", "content": "Compression in v5 was too aggressive โ€” it removed the attribution edges the retrieval layer needed."}]
ids = tok.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True)
out = model.generate(ids, max_new_tokens=256, do_sample=False)
print(tok.decode(out[0][ids.shape[-1]:], skip_special_tokens=True))

Limitations and known issues

  • Operator coverage is uneven. containment and specialized_operators categories trail the others on operator coverage; the corpus carries the underlying distribution.
  • Curator dependency. v7 is trained on curator output. Performance on raw prose without the curator stage is not guaranteed.
  • No safety tuning. This is a structured-output transducer, not a general assistant. It has no harm filtering beyond what gemma-3-1b-it ships with.

License

Apache-2.0 for the fine-tune. Base model google/gemma-3-1b-it is governed by the Gemma Terms of Use; they apply transitively.

Downloads last month
26
Safetensors
Model size
1.0B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for wimpSquad/glyph-translator-v7

Finetuned
(548)
this model