FoolDev Claude Opus 4.7 commited on
Commit
ab668b6
·
1 Parent(s): c598204

Fix stale Janus URL: FoolDev/janus → FoolDev/Janus-35B

Browse files

Sibling badge, intro paragraph, comparison-table header, and
Related-models row all pointed at the old lowercase /FoolDev/janus
repo URL. Updated to /FoolDev/Janus-35B (the current canonical name).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -49,7 +49,7 @@ pipeline_tag: image-text-to-text
49
  [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
50
  [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
51
  [![Architecture](https://img.shields.io/badge/Arch-Dense_27B-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
52
- [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/janus)
53
 
54
  # Thanatos-27B
55
 
@@ -58,7 +58,7 @@ pipeline_tag: image-text-to-text
58
 
59
  **`Architecture:`** `Qwen 3.6 27B (Dense)` | **`Parameters:`** `27B` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled LLM`
60
 
61
- A personal sibling to [`FoolDev/janus`](https://huggingface.co/FoolDev/janus). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
62
 
63
  ## TL;DR
64
 
@@ -94,7 +94,7 @@ The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only
94
 
95
  The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
96
 
97
- | | Thanatos-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/janus) |
98
  |---|---|---|
99
  | Architecture | Dense transformer | MoE 256 experts, 8 active |
100
  | Total params | 27 B | 35 B |
@@ -438,7 +438,7 @@ python examples/ollama_chat.py # section 3 runs a real round-trip
438
  |---|---|
439
  | [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) | Upstream base, safetensors |
440
  | [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) | Recommended GGUF source |
441
- | [FoolDev/janus](https://huggingface.co/FoolDev/janus) | 35B-A3B MoE sibling. More capacity, more memory pressure. |
442
  | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B starter model when 27B/35B is too heavy |
443
 
444
  ## Credits
 
49
  [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
50
  [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
51
  [![Architecture](https://img.shields.io/badge/Arch-Dense_27B-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
52
+ [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/Janus-35B)
53
 
54
  # Thanatos-27B
55
 
 
58
 
59
  **`Architecture:`** `Qwen 3.6 27B (Dense)` | **`Parameters:`** `27B` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled LLM`
60
 
61
+ A personal sibling to [`FoolDev/Janus-35B`](https://huggingface.co/FoolDev/Janus-35B). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
62
 
63
  ## TL;DR
64
 
 
94
 
95
  The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) — but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
96
 
97
+ | | Thanatos-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/Janus-35B) |
98
  |---|---|---|
99
  | Architecture | Dense transformer | MoE 256 experts, 8 active |
100
  | Total params | 27 B | 35 B |
 
438
  |---|---|
439
  | [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) | Upstream base, safetensors |
440
  | [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) | Recommended GGUF source |
441
+ | [FoolDev/Janus-35B](https://huggingface.co/FoolDev/Janus-35B) | 35B-A3B MoE sibling. More capacity, more memory pressure. |
442
  | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B starter model when 27B/35B is too heavy |
443
 
444
  ## Credits