FoolDev Claude Opus 4.7 commited on
Commit
25d5454
·
1 Parent(s): 732c3be

feat: scripts/verify_arch.py + make verify-arch

Browse files

Cross-checks the README "Architecture" forward-pass bullets against
the actual GGUF metadata. After correcting the MTP claim in 732c3be
it was worth confirming no other quiet inaccuracies in the
quantitative bullets. All thirteen claims (block_count, context,
embedding/FFN, attention head counts + key/value lengths, RoPE
dimension count, full_attention_interval, SSM state/dt/group, vocab)
verify clean on both qwen35- and qwen36-stamped bundles.

The script encodes the (claim, expected, gguf-key) mapping as a
reusable artifact — useful after future re-stamps, re-conversions,
or upstream converter updates that might silently drift metadata.
Exit non-zero on mismatch.

Not added to `make check` (which is meant to be cheap + pre-commit
hookable); reading the 17 GB tensor file requires the LFS smudge
and a few seconds of I/O. Exposed as a dedicated `make verify-arch`
target instead. README "What's here" table picks up the new entry;
CHANGELOG documents the audit + script.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (4) hide show
  1. CHANGELOG.md +17 -0
  2. Makefile +4 -1
  3. README.md +1 -0
  4. scripts/verify_arch.py +123 -0
CHANGELOG.md CHANGED
@@ -7,6 +7,23 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ### Fixed
11
  - README "Multi-token prediction (MTP)" bullet corrected. The
12
  earlier wording — "MTP head trained for speculative decoding" —
 
7
 
8
  ## [Unreleased]
9
 
10
+ ### Added
11
+ - `scripts/verify_arch.py` + `make verify-arch`. After correcting the
12
+ MTP bullet (see Fixed below) it was worth confirming the remaining
13
+ Architecture forward-pass bullets weren't carrying any other quiet
14
+ inaccuracies. Cross-checked all thirteen quantitative claims
15
+ (layer count, hidden / FFN dims, head counts, head_dim, RoPE
16
+ factor, hybrid stack interval, SSM dims, vocab, context) against
17
+ the actual GGUF metadata keys; every one verified clean on both
18
+ the qwen35- and qwen36-stamped bundles. The script captures that
19
+ mapping as a reusable artifact: pass it a path to a Thanatos-27B
20
+ (or compatible) GGUF and it prints each bullet alongside the
21
+ metadata key it derives from, exiting non-zero on mismatch.
22
+ Useful after future re-stamps / re-conversions / upstream
23
+ converter updates. Not folded into `make check` because it has
24
+ to read the 17 GB tensor file (LFS smudge required) — run on
25
+ demand.
26
+
27
  ### Fixed
28
  - README "Multi-token prediction (MTP)" bullet corrected. The
29
  earlier wording — "MTP head trained for speculative decoding" —
Makefile CHANGED
@@ -26,7 +26,7 @@ MODEL ?= $(TAG)
26
 
27
  PRECISION ?= F16
28
 
29
- .PHONY: help build load-bundle heal-hf smoke smoke-tools bench check hooks mmproj clean
30
 
31
  help: ## Show this help.
32
  @awk 'BEGIN {FS = ":.*##"; printf "Targets:\n"} /^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-12s\033[0m %s\n", $$1, $$2 }' $(MAKEFILE_LIST)
@@ -61,6 +61,9 @@ mmproj: ## Fetch the vision projector for llama.cpp (Ollama vision is broken up
61
  check: ## Lint shell + python files; block dot-pattern footgun.
62
  ./scripts/check.sh
63
 
 
 
 
64
  hooks: ## Install scripts/check.sh as the git pre-commit hook.
65
  ./scripts/install-hooks.sh
66
 
 
26
 
27
  PRECISION ?= F16
28
 
29
+ .PHONY: help build load-bundle heal-hf smoke smoke-tools bench check hooks mmproj verify-arch clean
30
 
31
  help: ## Show this help.
32
  @awk 'BEGIN {FS = ":.*##"; printf "Targets:\n"} /^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-12s\033[0m %s\n", $$1, $$2 }' $(MAKEFILE_LIST)
 
61
  check: ## Lint shell + python files; block dot-pattern footgun.
62
  ./scripts/check.sh
63
 
64
+ verify-arch: ## Cross-check README "Architecture" bullets against GGUF metadata.
65
+ ./scripts/verify_arch.py
66
+
67
  hooks: ## Install scripts/check.sh as the git pre-commit hook.
68
  ./scripts/install-hooks.sh
69
 
README.md CHANGED
@@ -137,6 +137,7 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
137
  | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
138
  | `scripts/check.sh` | Local lint: `bash -n`, `pyflakes`, `py_compile`, footgun-grep, plus `Modelfile`-vs-bridge-files sync check |
139
  | `scripts/check_bridge_sync.py` | Verifies the `Modelfile` `TEMPLATE` / `SYSTEM` / `PARAMETER` directives stay in sync with the root-level `template` / `system` / `params` files. Run as part of `make check`; called from the pre-commit hook. |
 
140
  | `scripts/install-hooks.sh` | Installs `check.sh` as a git pre-commit hook |
141
  | `Makefile` | Convenience wrapper — `make help` lists targets |
142
  | `LICENSE`, `CITATION.cff` | Apache-2.0 license and citation metadata |
 
137
  | `scripts/fetch_vision.sh` | Pulls the vision projector (`mmproj-F16.gguf`) for llama.cpp (Ollama vision is broken upstream — see [Vision](#vision)). Renamed from `fetch_mmproj.sh` because HF's Ollama bridge auto-indexed the script as a vision projector layer (filename pattern match). |
138
  | `scripts/check.sh` | Local lint: `bash -n`, `pyflakes`, `py_compile`, footgun-grep, plus `Modelfile`-vs-bridge-files sync check |
139
  | `scripts/check_bridge_sync.py` | Verifies the `Modelfile` `TEMPLATE` / `SYSTEM` / `PARAMETER` directives stay in sync with the root-level `template` / `system` / `params` files. Run as part of `make check`; called from the pre-commit hook. |
140
+ | `scripts/verify_arch.py` | Cross-checks the README "Architecture" forward-pass bullets (layer count, head counts, hidden / FFN dims, RoPE factor, SSM dims, vocab, context) against the actual GGUF metadata keys. Run as `make verify-arch`. Handles both `qwen35`- and `qwen36`-stamped bundles; exit non-zero if any value mismatches. Not part of `make check` because it loads the 17 GB GGUF (LFS smudge required); run on demand. |
141
  | `scripts/install-hooks.sh` | Installs `check.sh` as a git pre-commit hook |
142
  | `Makefile` | Convenience wrapper — `make help` lists targets |
143
  | `LICENSE`, `CITATION.cff` | Apache-2.0 license and citation metadata |
scripts/verify_arch.py ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Thanatos-27B — verify the README "Architecture" forward-pass bullets
4
+ against the actual GGUF metadata.
5
+
6
+ Reads either the qwen35- or qwen36-stamped bundle (or any GGUF that
7
+ declares one of those `general.architecture` values), prints each
8
+ README claim alongside the metadata key it derives from, and exits
9
+ non-zero if any value mismatches the expected README claim. Useful
10
+ as a manual audit after the bundle is re-stamped or after upstream
11
+ re-conversion.
12
+
13
+ Usage:
14
+ python3 scripts/verify_arch.py # default bundle
15
+ python3 scripts/verify_arch.py Thanatos-27B.Q4_K_M.gguf
16
+ python3 scripts/verify_arch.py /path/to/some-other.gguf
17
+
18
+ Exit code 0 = all claims verify, 1 = at least one mismatch.
19
+
20
+ Note: this does NOT verify the 27B parameter count directly (no such
21
+ KV in the GGUF) — that comes from llama.cpp's `case 64: LLM_TYPE_27B`
22
+ branch in `src/models/qwen35.cpp`, not from the file itself.
23
+ """
24
+ from __future__ import annotations
25
+
26
+ import sys
27
+ from pathlib import Path
28
+
29
+ from gguf import GGUFReader
30
+
31
+ EXPECTED = {
32
+ "block_count": (64, "64 transformer layers"),
33
+ "context_length": (262144, "262 144 native context"),
34
+ "embedding_length": (5120, "Hidden size 5120"),
35
+ "feed_forward_length": (17408, "FFN intermediate 17408"),
36
+ "attention.head_count": (24, "Gated Attention: 24 Q-heads"),
37
+ "attention.head_count_kv": (4, "Gated Attention: 4 KV-heads (GQA)"),
38
+ "attention.key_length": (256, "Gated Attention: head_dim 256 (key)"),
39
+ "attention.value_length": (256, "Gated Attention: head_dim 256 (value)"),
40
+ "rope.dimension_count": (64, "Partial RoPE: 64 of 256 dims (factor 0.25)"),
41
+ "full_attention_interval": (4, "Hybrid stack: every 4th layer is full attention (16 cycles)"),
42
+ "ssm.state_size": (128, "Gated DeltaNet: head_dim 128"),
43
+ "ssm.time_step_rank": (48, "Gated DeltaNet: 48 V-heads"),
44
+ "ssm.group_count": (16, "Gated DeltaNet: 16 QK-heads"),
45
+ }
46
+ EXPECTED_VOCAB = 248320
47
+ EXPECTED_ARCHS = {"qwen35", "qwen36"}
48
+
49
+
50
+ def read_scalar(reader: GGUFReader, key: str):
51
+ f = reader.fields.get(key)
52
+ if f is None:
53
+ return None
54
+ arr = f.parts[f.data[0]]
55
+ val = arr.tolist() if hasattr(arr, "tolist") else arr
56
+ if isinstance(val, list) and len(val) == 1:
57
+ return val[0]
58
+ return val
59
+
60
+
61
+ def read_arch(reader: GGUFReader) -> str:
62
+ f = reader.fields["general.architecture"]
63
+ return bytes(f.parts[f.data[0]]).decode()
64
+
65
+
66
+ def main() -> int:
67
+ if len(sys.argv) > 2:
68
+ print(f"usage: {sys.argv[0]} [path/to/Thanatos-27B.Q4_K_M.gguf]", file=sys.stderr)
69
+ return 2
70
+ root = Path(__file__).resolve().parent.parent
71
+ default_paths = [
72
+ root / "Thanatos-27B.Q4_K_M.qwen35.gguf",
73
+ root / "Thanatos-27B.Q4_K_M.qwen36.gguf",
74
+ root / "Thanatos-27B.Q4_K_M.gguf",
75
+ ]
76
+ if len(sys.argv) == 2:
77
+ path = Path(sys.argv[1])
78
+ else:
79
+ path = next((p for p in default_paths if p.exists() and p.stat().st_size > 1024), None)
80
+ if path is None:
81
+ print("[!] no Thanatos-27B GGUF found in repo root; pass a path explicitly", file=sys.stderr)
82
+ return 2
83
+
84
+ print(f"[*] reading: {path}")
85
+ reader = GGUFReader(str(path), "r")
86
+
87
+ arch = read_arch(reader)
88
+ if arch not in EXPECTED_ARCHS:
89
+ print(f"[!] unexpected general.architecture: {arch!r} (expected one of {EXPECTED_ARCHS})", file=sys.stderr)
90
+ return 1
91
+ print(f"[*] general.architecture: {arch}")
92
+ print()
93
+
94
+ mismatches = 0
95
+ fmt = " {marker} {claim:55s} {key:35s} = {actual}"
96
+ for suffix, (expected, claim) in EXPECTED.items():
97
+ key = f"{arch}.{suffix}"
98
+ actual = read_scalar(reader, key)
99
+ ok = actual == expected
100
+ marker = "[ ok ]" if ok else "[FAIL]"
101
+ print(fmt.format(marker=marker, claim=claim, key=key, actual=actual))
102
+ if not ok:
103
+ mismatches += 1
104
+
105
+ # Vocab count comes from the tokenizer tokens array length, not a scalar KV.
106
+ f = reader.fields.get("tokenizer.ggml.tokens")
107
+ vocab_actual = len(f.data) if f is not None else None
108
+ ok = vocab_actual == EXPECTED_VOCAB
109
+ marker = "[ ok ]" if ok else "[FAIL]"
110
+ print(fmt.format(marker=marker, claim=f"Vocab {EXPECTED_VOCAB}", key="tokenizer.ggml.tokens (length)", actual=vocab_actual))
111
+ if not ok:
112
+ mismatches += 1
113
+
114
+ print()
115
+ if mismatches:
116
+ print(f"[!] {mismatches} mismatch(es) — README Architecture claims disagree with GGUF metadata.")
117
+ return 1
118
+ print("[+] all Architecture claims verify against GGUF metadata.")
119
+ return 0
120
+
121
+
122
+ if __name__ == "__main__":
123
+ sys.exit(main())