FoolDev commited on
Commit
50f6684
·
1 Parent(s): b404cc2

Revert "docs: add safetensors mirror of Qwen/Qwen3.6-27B"

Browse files

This reverts commit b4203787cd51fa6cb79ab320f3702f0253303557.

CHANGELOG.md CHANGED
@@ -7,42 +7,6 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
10
- ### Removed (transformers config)
11
- - **Dropped `config.json`** (`5302d10`) to suppress HF's tag
12
- auto-detector surfacing `qwen3_5` in the repo header — the
13
- detector reads `architectures` from `config.json` and the
14
- surfaced tag was obscuring this card's positioning.
15
- Consequence: `AutoModelForCausalLM.from_pretrained(
16
- "FoolDev/Thanatos-27B")` no longer works on its own.
17
- `examples/transformers_quickstart.py` now pulls `AutoConfig`
18
- from upstream `Qwen/Qwen3.6-27B` (byte-identical tensors,
19
- so no behavioural difference) and weights + tokenizer +
20
- chat template from this repo. README's "What's here"
21
- table and transformers paragraph updated to match.
22
-
23
- ### Added (safetensors mirror)
24
- - **Mirrored Qwen/Qwen3.6-27B's transformers-loadable safetensors
25
- set into this repo.** 15 sharded `.safetensors` files (~58 GB
26
- total) + `model.safetensors.index.json` + tokenizer files
27
- (`tokenizer.json`, `tokenizer_config.json`, `vocab.json`,
28
- `merges.txt`) + configs (`configuration.json`,
29
- `generation_config.json`, `preprocessor_config.json`,
30
- `video_preprocessor_config.json`) + `chat_template.jinja`.
31
- (`config.json` was initially mirrored too, then dropped — see
32
- "Removed (transformers config)" above.) Tensor data
33
- byte-identical to upstream; the mirror saves a second
34
- `hf download` for users who want both GGUF + safetensors in
35
- one place. `.gitignore` was updated separately (commit
36
- `0c5bee4`) to whitelist the Qwen sharded naming pattern before
37
- the upload's preupload check ran (HF reads the destination
38
- repo's `.gitignore` to decide `shouldIgnore` per file).
39
- - `examples/transformers_quickstart.py` defaults `MODEL_ID`
40
- to `FoolDev/Thanatos-27B` (weights + tokenizer + chat
41
- template) with `CONFIG_ID="Qwen/Qwen3.6-27B"` for the
42
- architecture config — fresh users still need only this
43
- repo as the entry point, with one auxiliary HF Hub pull
44
- for `config.json` that transformers handles transparently.
45
-
46
  ### Changed (5th round trip — qwen36 → qwen35, retested next-day)
47
  - **Bundle re-stamped `general.architecture: 'qwen36'` → `'qwen35'`**
48
  in `hf upload` commit `e03e10e` (HF), 2026-05-20 midday — 8
 
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ### Changed (5th round trip — qwen36 → qwen35, retested next-day)
11
  - **Bundle re-stamped `general.architecture: 'qwen36'` → `'qwen35'`**
12
  in `hf upload` commit `e03e10e` (HF), 2026-05-20 midday — 8
README.md CHANGED
@@ -125,9 +125,6 @@ The 27B is **dense**: every parameter participates in every forward pass. It's s
125
  | `scripts/verify_arch.py` | Cross-checks the README "Architecture" forward-pass bullets (layer count, head counts, hidden / FFN dims, RoPE factor, SSM dims, vocab, context) against the actual GGUF metadata keys. Run as `make verify-arch`. Handles both `qwen35`- and `qwen36`-stamped bundles; exit non-zero if any value mismatches. Not part of `make check` because it loads the 17 GB GGUF (LFS smudge required); run on demand. |
126
  | `scripts/install-hooks.sh` | Installs `check.sh` as a git pre-commit hook |
127
  | `Makefile` | Convenience wrapper — `make help` lists targets |
128
- | `model-*-of-00015.safetensors` (15 files, ~58 GB) + `model.safetensors.index.json` | Transformers-loadable safetensors mirror of `Qwen/Qwen3.6-27B`. Byte-identical to upstream. |
129
- | `configuration.json`, `generation_config.json`, `preprocessor_config.json`, `video_preprocessor_config.json`, `chat_template.jinja` | Processor + chat-template configs mirrored from upstream. **`config.json` is intentionally not in this repo** — HF's tag auto-detector reads `architectures` from it and surfaces `qwen3_5` in the repo header, which obscures this repo's positioning. Transformers users: pull `config.json` from [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B) (see Transformers note below). |
130
- | `tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt` | Tokenizer files mirrored from upstream. |
131
  | `LICENSE`, `CITATION.cff` | Apache-2.0 license and citation metadata |
132
  | `CHANGELOG.md` | Versioned tooling/docs changes |
133
  | `README.md` | This file |
@@ -141,28 +138,7 @@ local-build path applies this repo's `Modelfile`; the `hf.co/...`
141
  path applies the root-level `template`, `system`, and `params`
142
  files (kept in sync with the `Modelfile`).
143
 
144
- The transformers safetensors set is mirrored in this repo
145
- (15 sharded `.safetensors` files + index + tokenizer +
146
- chat template), byte-identical to upstream
147
- [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).
148
- **`config.json` is not bundled here** — HF auto-detects model
149
- architecture from it and surfaces a `qwen3_5` repo-level tag
150
- that obscures this card. To load via transformers, either:
151
-
152
- ```python
153
- # A. Use upstream as the config/architecture source, this repo for weights:
154
- from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
155
- cfg = AutoConfig.from_pretrained("Qwen/Qwen3.6-27B", trust_remote_code=True)
156
- tok = AutoTokenizer.from_pretrained("FoolDev/Thanatos-27B", trust_remote_code=True)
157
- model = AutoModelForCausalLM.from_pretrained(
158
- "FoolDev/Thanatos-27B", config=cfg, trust_remote_code=True,
159
- )
160
-
161
- # B. Or just load upstream directly — tensors are byte-identical:
162
- model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-27B", trust_remote_code=True)
163
- ```
164
-
165
- `examples/transformers_quickstart.py` uses path A.
166
 
167
  ## Architecture
168
 
@@ -581,7 +557,7 @@ python examples/ollama_chat.py # section 3 runs a real round-trip
581
 
582
  | Model | Notes |
583
  |---|---|
584
- | [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) | Upstream base, safetensors (this repo mirrors them) |
585
  | [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) | Recommended GGUF source |
586
  | [FoolDev/Janus-35B](https://huggingface.co/FoolDev/Janus-35B) | 35B-A3B MoE sibling. More capacity, more memory pressure. |
587
  | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B starter model when 27B/35B is too heavy |
 
125
  | `scripts/verify_arch.py` | Cross-checks the README "Architecture" forward-pass bullets (layer count, head counts, hidden / FFN dims, RoPE factor, SSM dims, vocab, context) against the actual GGUF metadata keys. Run as `make verify-arch`. Handles both `qwen35`- and `qwen36`-stamped bundles; exit non-zero if any value mismatches. Not part of `make check` because it loads the 17 GB GGUF (LFS smudge required); run on demand. |
126
  | `scripts/install-hooks.sh` | Installs `check.sh` as a git pre-commit hook |
127
  | `Makefile` | Convenience wrapper — `make help` lists targets |
 
 
 
128
  | `LICENSE`, `CITATION.cff` | Apache-2.0 license and citation metadata |
129
  | `CHANGELOG.md` | Versioned tooling/docs changes |
130
  | `README.md` | This file |
 
138
  path applies the root-level `template`, `system`, and `params`
139
  files (kept in sync with the `Modelfile`).
140
 
141
+ If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
 
143
  ## Architecture
144
 
 
557
 
558
  | Model | Notes |
559
  |---|---|
560
+ | [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) | Upstream base, safetensors |
561
  | [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) | Recommended GGUF source |
562
  | [FoolDev/Janus-35B](https://huggingface.co/FoolDev/Janus-35B) | 35B-A3B MoE sibling. More capacity, more memory pressure. |
563
  | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B starter model when 27B/35B is too heavy |
examples/README.md CHANGED
@@ -5,7 +5,7 @@ Four minimal entry points. Pick the one that matches how you run models.
5
  | File | Backend | When to use |
6
  |---|---|---|
7
  | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
8
- | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the safetensors on GPU (this repo mirrors them from `Qwen/Qwen3.6-27B`), optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
  | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
11
 
 
5
  | File | Backend | When to use |
6
  |---|---|---|
7
  | `ollama_chat.py` | Ollama HTTP API | You already have `ollama serve` running and the `thanatos-27b` model created from the project `Modelfile`. **Text + tool calling** — vision via Ollama is broken upstream for this arch. |
8
+ | `transformers_quickstart.py` | Hugging Face Transformers | You want to run the upstream safetensors (`Qwen/Qwen3.6-27B`) on GPU, optionally in 4-bit via bitsandbytes. |
9
  | `llama_cpp_quickstart.py` | llama-cpp-python | You want to invoke a local GGUF directly without a daemon (CI, batch jobs, scripts). Text only. |
10
  | `llama_cpp_vision.py` | llama-cpp-python + mmproj | **Image input.** Loads a text GGUF + `mmproj-F16.gguf` and answers questions about an image. The only working vision path right now. |
11
 
examples/transformers_quickstart.py CHANGED
@@ -2,17 +2,11 @@
2
  """
3
  Thanatos-27B — Hugging Face Transformers quickstart.
4
 
5
- Loads the Qwen 3.6 27B safetensors from this repo (a byte-identical
6
- mirror of Qwen/Qwen3.6-27B's transformers set) and runs a single
7
- chat turn using its embedded chat template. Applies the same
8
- Thanatos system prompt the Modelfile / bridge `system` file uses.
9
-
10
- `config.json` is intentionally not in this repo (it makes HF's tag
11
- auto-detector surface a `qwen3_5` repo-level tag), so we source the
12
- architecture config from upstream `Qwen/Qwen3.6-27B` and only pull
13
- weights + tokenizer + chat template from this repo. Tensor data is
14
- byte-identical, so the result is the same model. Set
15
- `MODEL_ID = "Qwen/Qwen3.6-27B"` to bypass this repo entirely.
16
 
17
  Requirements:
18
  pip install --upgrade "transformers>=4.45" accelerate sentencepiece bitsandbytes
@@ -34,7 +28,7 @@ import sys
34
 
35
  try:
36
  import torch
37
- from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
38
  except ImportError as e: # pragma: no cover
39
  sys.exit(
40
  f"Missing dependency: {e.name}. Install with:\n"
@@ -42,8 +36,7 @@ except ImportError as e: # pragma: no cover
42
  )
43
 
44
 
45
- MODEL_ID = "FoolDev/Thanatos-27B"
46
- CONFIG_ID = "Qwen/Qwen3.6-27B" # source of config.json (not bundled in MODEL_ID — see module docstring)
47
 
48
  THANATOS_SYSTEM = (
49
  "You are Thanatos, a precise and capable assistant for reasoning, writing, "
@@ -75,11 +68,8 @@ def load(use_4bit: bool):
75
  )
76
  kwargs.pop("torch_dtype", None)
77
 
78
- cfg = AutoConfig.from_pretrained(CONFIG_ID, trust_remote_code=True)
79
  tok = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
80
- model = AutoModelForCausalLM.from_pretrained(
81
- MODEL_ID, config=cfg, trust_remote_code=True, **kwargs,
82
- )
83
  return tok, model
84
 
85
 
 
2
  """
3
  Thanatos-27B — Hugging Face Transformers quickstart.
4
 
5
+ Loads the upstream Qwen 3.6 27B safetensors directly and runs a single
6
+ chat turn using its embedded chat template. Thanatos-27B is a *wrapper*
7
+ around that base, so for the transformers route there is nothing to
8
+ download from this repo — point at Qwen/Qwen3.6-27B and apply the same
9
+ system prompt the Modelfile uses.
 
 
 
 
 
 
10
 
11
  Requirements:
12
  pip install --upgrade "transformers>=4.45" accelerate sentencepiece bitsandbytes
 
28
 
29
  try:
30
  import torch
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
32
  except ImportError as e: # pragma: no cover
33
  sys.exit(
34
  f"Missing dependency: {e.name}. Install with:\n"
 
36
  )
37
 
38
 
39
+ MODEL_ID = "Qwen/Qwen3.6-27B"
 
40
 
41
  THANATOS_SYSTEM = (
42
  "You are Thanatos, a precise and capable assistant for reasoning, writing, "
 
68
  )
69
  kwargs.pop("torch_dtype", None)
70
 
 
71
  tok = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
72
+ model = AutoModelForCausalLM.from_pretrained(MODEL_ID, trust_remote_code=True, **kwargs)
 
 
73
  return tok, model
74
 
75