FoolDev Claude Opus 4.7 commited on
Commit
2bf2adf
Β·
1 Parent(s): c336f44

release: v0.6.0

Browse files

First tagged release under the Thanatos name. Cuts the
two-week-burst of changes since [0.5.0] (2026-05-02) into a
versioned section. Headline:

- Project renamed `Janus-27B` -> `Thanatos-27B` (HF repo, GGUFs,
default Ollama tag, system-prompt identity, banner). MoE sibling
`FoolDev/Janus-35B` keeps the Janus name.
- Bundle GGUF re-stamped `qwen35` -> `qwen36` (the
architecturally-honest label; breaks stock loaders until upstream
adds the arch entry; as of 2026-05-19 there's no PR or tracking
issue for it in either ggml-org/llama.cpp or ollama/ollama).
- `make load-bundle`: one-shot bundle -> local Ollama tag.
- `make heal-hf`: rebadge an already-pulled hf.co/... tag in store
+ rewrite its manifest's model-layer digest, so the same tag
loads in place.
- `scripts/rename_arch.py`: generic GGUF arch renamer (metadata
only, tensors byte-identical).
- Tool-calling end-to-end via Ollama (Modelfile TEMPLATE) and HF
bridge (root-level template / system / params; kept in sync via
`scripts/check_bridge_sync.py`).
- `make smoke` / `make smoke-tools` / `make bench` for regression
and tok/s measurement. Three measured Strix Halo data points
logged in the Modelfile and CHANGELOG.
- Vision section restructured around `llama-server --mmproj` (the
only reliably working vision path; Ollama's C++ fallback for
mmproj still lacks the qwen35/qwen35moe arch entries β€” tracked
in ollama/ollama#15898).
- Bundled GGUF reduced to Q4_K_M only (17 GB); Q3_K_S no longer
redistributed, built locally via `make build QUANT=Q3_K_S`.

CITATION.cff has no `version` field, so no bump needed there. The
git tag v0.6.0 (added in this commit) is the canonical reference;
the CHANGELOG header uses the conventional last-pre-release commit
sha (`c336f44`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show
  1. CHANGELOG.md +69 -0
CHANGELOG.md CHANGED
@@ -7,6 +7,75 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ### Added
11
  - `scripts/heal_hf_pull.sh` + `make heal-hf`: heal an already-pulled
12
  `hf.co/FoolDev/Thanatos-27B:...` tag in-store by rebadging its
 
7
 
8
  ## [Unreleased]
9
 
10
+ ## [0.6.0] - 2026-05-19 β€” `c336f44`
11
+
12
+ Headline changes since 0.5.0 (over a two-week burst on 2026-05-19):
13
+
14
+ - **Project renamed `Janus-27B` β†’ `Thanatos-27B`** (HF repo, GGUF
15
+ filenames, default Ollama tag, system-prompt identity, banner
16
+ wordmark). MoE 35B sibling `FoolDev/Janus-35B` keeps the Janus
17
+ name; the dense 27B splits off as its own identity.
18
+ - **Bundle GGUF re-stamped `general.architecture: 'qwen35'` β†’
19
+ `'qwen36'`** β€” the architecturally-honest label. Breaks
20
+ `ollama run hf.co/FoolDev/Thanatos-27B` and direct
21
+ `llama-server -m Thanatos-27B.Q4_K_M.gguf` on every stock loader
22
+ until upstream adds the arch entry (which, as of 2026-05-19,
23
+ has no PR or tracking issue in either ggml-org/llama.cpp or
24
+ ollama/ollama).
25
+ - **`make load-bundle`** β€” one-shot bundle β†’ loadable local Ollama
26
+ tag (LFS smudge + qwen36 β†’ qwen35 rebadge + `ollama create`).
27
+ - **`make heal-hf`** β€” heal an already-pulled HF-bridge tag in
28
+ store by rebadging its model blob qwen36 β†’ qwen35 and rewriting
29
+ the manifest's model-layer digest, so the same `hf.co/...` tag
30
+ becomes loadable in place without switching tag names.
31
+ - **`scripts/rename_arch.py`** β€” generic GGUF arch renamer
32
+ (metadata only, tensor data byte-identical). Used by load-bundle
33
+ and heal-hf; reusable for future arch renames.
34
+ - **Tool-calling** end-to-end via Ollama (Modelfile TEMPLATE wires
35
+ `.Tools` / `.ToolCalls` so `ollama show` reports the `tools`
36
+ capability and `/api/chat` / `/v1/chat/completions` accept tools
37
+ arrays). HF Ollama bridge gets parallel coverage via root-level
38
+ `template` / `system` / `params` files (HF's bridge does **not**
39
+ read `Modelfile`); kept in sync via `scripts/check_bridge_sync.py`,
40
+ wired into `make check`.
41
+ - **`scripts/bench.sh` + `make bench`** β€” 3-prompt tok/s benchmark
42
+ using Ollama's `eval_count` / `eval_duration`. Three measured
43
+ data points logged on the Strix Halo reference hardware (Ryzen
44
+ AI Max+ 395 / Radeon 8060S iGPU): Q3_K_S Vulkan 12.31 tok/s,
45
+ Q4_K_M Vulkan 9.31 + 9.19 tok/s (two runs), Q3_K_S ROCm
46
+ 10.14 tok/s (older backend snapshot).
47
+ - **`scripts/smoke_test.sh` + `make smoke` / `make smoke-tools`** β€”
48
+ server reachable, model present, tools capability, round-trip,
49
+ token-leakage guard, opt-in tool-call round-trip
50
+ (`TOOLS_TEST=1`).
51
+ - **Vision section** restructured around llama.cpp / llama-cpp-python
52
+ (Ollama vision is broken upstream β€” `qwen35` / `qwen35moe` arch
53
+ entries present in Go engine, missing from the C++ fallback the
54
+ Ollama runner switches to when mmproj is attached;
55
+ [ollama/ollama#15898](https://github.com/ollama/ollama/issues/15898)).
56
+ `llama-server --mmproj` HTTP path now the lead recommendation.
57
+ - **Bundled GGUF**: `Thanatos-27B.Q4_K_M.gguf` (~17 GB) is the only
58
+ redistributed quant, picked by HF's Ollama bridge for the default
59
+ `:latest` tag. Other quants (Q3_K_S, Q5_K_M, Q6_K, safetensors)
60
+ pulled from upstream on demand via `make build QUANT=...`.
61
+
62
+ Bug-fix highlights:
63
+
64
+ - `scripts/fetch_mmproj.sh` renamed to `scripts/fetch_vision.sh` β€”
65
+ HF's Ollama bridge was filename-pattern-matching `mmproj*` anywhere
66
+ in the repo and shipping the 2-KB bash script as the
67
+ `application/vnd.ollama.image.projector` layer, breaking
68
+ `ollama show` / `ollama run` with `Error: invalid file magic`.
69
+ - `Modelfile` PARAMETER stop directives for `<|im_end|>`,
70
+ `<|endoftext|>`, `<|im_start|>` β€” Ollama was only picking up
71
+ `<|im_end|>` from GGUF metadata, so the model kept generating past
72
+ `<|endoftext|>` and synthesised a fake user turn.
73
+ - `examples/ollama_chat.py` thinking-trace handling under Ollama
74
+ 0.24 (now reads `message.thinking`, falls back to `<think>`
75
+ extraction for older builds).
76
+
77
+ See the full per-bullet changelog below.
78
+
79
  ### Added
80
  - `scripts/heal_hf_pull.sh` + `make heal-hf`: heal an already-pulled
81
  `hf.co/FoolDev/Thanatos-27B:...` tag in-store by rebadging its