FoolDev commited on
Commit
b564869
·
1 Parent(s): bfe34c3

Initial release: Janus-27B repo

Browse files

Sibling distribution package to FoolDev/janus, targeting the dense
Qwen 3.6 27B base instead of the 35B-A3B MoE.

Includes:
- README with arch/hardware/sampling/limitations sections matching the
35B sibling card
- Modelfile that wraps a user-provided Qwen 3.6 27B GGUF
- Tokyo-Night-themed banner (PNG + SVG source) using purple as the
sibling-distinct accent vs the 35B's cyan
- Standard HF .gitattributes for LFS-tracked binary types

This repo does not redistribute weights; users pull from
unsloth/Qwen3.6-27B-GGUF or another community quant.

Files changed (5) hide show
  1. .gitattributes +1 -0
  2. Modelfile +52 -0
  3. README.md +192 -0
  4. banner.png +0 -0
  5. banner.svg +60 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.gguf filter=lfs diff=lfs merge=lfs -text
Modelfile ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Janus-27B — Ollama wrapper around Qwen 3.6 27B (dense)
2
+ #
3
+ # This repo does not redistribute weights. Edit the FROM line below to
4
+ # point at a local Qwen 3.6 27B GGUF, then:
5
+ #
6
+ # ollama create janus-27b -f Modelfile && ollama run janus-27b
7
+ #
8
+ # Recommended GGUF source:
9
+ # https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
10
+ #
11
+ # Or a community Opus-distilled variant:
12
+ # https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled-GGUF
13
+ #
14
+ # Replace the path below with wherever you keep the GGUF.
15
+
16
+ FROM ./Qwen3.6-27B.Q4_K_M.gguf
17
+
18
+ # Sampling tuned for reasoning + general use. See README "Recommended sampling"
19
+ # for creative/RP alternatives.
20
+ PARAMETER temperature 0.6
21
+ PARAMETER top_p 0.95
22
+ PARAMETER top_k 20
23
+ PARAMETER repeat_penalty 1.05
24
+ PARAMETER num_ctx 16384
25
+
26
+ SYSTEM """You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
27
+
28
+ Behavior rules:
29
+ - Answer the user's actual request directly.
30
+ - Be accurate, complete, and structured.
31
+ - Think before answering, but do not get stuck in repetitive loops or meta-commentary.
32
+ - If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
33
+ - If the user wants creative writing, preserve tone, continuity, and character consistency.
34
+ - If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
35
+ - Finish with a usable answer, not just planning."""
36
+
37
+ # Hardware notes
38
+ # --------------
39
+ # Qwen 3.6 27B is *dense* — every parameter participates in every forward pass.
40
+ # Q4_K_M GGUF is ~16 GB. Practical footprint:
41
+ # weights mmap ~16 GB
42
+ # compute graph alloc ~12 GB (smaller than 35B-A3B because dense ≠ MoE)
43
+ # KV cache @ 16K ctx ~1 GB (with OLLAMA_KV_CACHE_TYPE=q8_0)
44
+ # total minimum ~29 GB
45
+ #
46
+ # Working configurations:
47
+ # ✓ RTX 3090 / 4090 24 GB — full Q4 offload, ~25-40 tok/s
48
+ # ✓ RTX 5090 32 GB — full offload at Q5/Q6 quant
49
+ # ✓ Mac Studio M2/M3 32 GB+ unified — ~15-25 tok/s
50
+ # ✓ Linux box with 32 GB+ RAM (CPU-only) — ~1-3 tok/s
51
+ # ⚠ ASUS ROG Flow Z13 (32 GB unified) — borderline, try Q3_K_S quant
52
+ # (~12 GB) for headroom
README.md ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen3.6-27B
5
+ datasets:
6
+ - crownelius/Creative_Writing_ShareGPT_Enhanced
7
+ - microsoft/rStar-Coder
8
+ - peteromallet/dataclaw-peteromallet
9
+ - crownelius/Opus-4.7-Reasoning
10
+ - openbmb/UltraData-Math
11
+ - Crownelius/Crow-Heretic-TeichAI-Unified
12
+ language:
13
+ - en
14
+ - zh
15
+ - ru
16
+ - es
17
+ - fr
18
+ - it
19
+ - ja
20
+ - ko
21
+ - de
22
+ - ar
23
+ - tr
24
+ - pl
25
+ - sv
26
+ - nl
27
+ - he
28
+ - id
29
+ - uk
30
+ - fa
31
+ - pt
32
+ - ms
33
+ - fi
34
+ - el
35
+ tags:
36
+ - qwen3_6
37
+ - dense
38
+ - conversational
39
+ - multimodal
40
+ - agent
41
+ library_name: transformers
42
+ pipeline_tag: image-text-to-text
43
+ ---
44
+
45
+ <img src="https://huggingface.co/FoolDev/janus-27b/resolve/main/banner.png" alt="Janus-27B banner" width="100%" />
46
+
47
+ [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
48
+ [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--27B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-27B)
49
+ [![Architecture](https://img.shields.io/badge/Arch-Dense_27B-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
50
+ [![Sibling](https://img.shields.io/badge/Sibling-Janus--35B-7dcfff?style=flat&labelColor=1a1b26)](https://huggingface.co/FoolDev/janus)
51
+
52
+ # Janus-27B
53
+
54
+ > **Dense Reasoning. Friendlier Footprint.**
55
+ > *Qwen 3.6 27B (dense) repackaged with Claude Opus 4.7 in the teacher slot.*
56
+
57
+ **`Architecture:`** `Qwen 3.6 27B (Dense)` | **`Parameters:`** `27B` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled LLM`
58
+
59
+ A personal sibling to [`FoolDev/janus`](https://huggingface.co/FoolDev/janus). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
60
+
61
+ ## Why a 27B variant?
62
+
63
+ The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only ~3B active per token. That makes it fast at inference but **memory-hungry at load time** — the full 35B has to live in VRAM/RAM even though only 3B is doing useful work each step.
64
+
65
+ The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B (no sparse advantage), but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
66
+
67
+ | | Janus-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/janus) |
68
+ |---|---|---|
69
+ | Architecture | Dense transformer | MoE 256 experts, 8 active |
70
+ | Total params | 27 B | 35 B |
71
+ | Active params per token | 27 B | ~3 B |
72
+ | Layers | 64 | 40 |
73
+ | Hidden size | 5120 | 2048 |
74
+ | Q4_K_M GGUF size | ~16 GB | ~19 GB |
75
+ | Min host memory | ~24 GB | ~38 GB |
76
+ | Multimodal | Yes (vision) | Yes (vision) |
77
+ | Max context | 262 144 | 262 144 |
78
+
79
+ ## What's here
80
+
81
+ | File | Use |
82
+ |---|---|
83
+ | `banner.svg` / `banner.png` | Repo header, Tokyo Night themed |
84
+ | `Modelfile` | Ollama wrapper around the upstream Qwen3.6-27B GGUF |
85
+ | `README.md` | This file |
86
+
87
+ This repo does **not** redistribute weights. Pull the upstream GGUF from [`unsloth/Qwen3.6-27B-GGUF`](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) or any other community quant, point the Modelfile at it, and `ollama create janus-27b -f Modelfile`.
88
+
89
+ If you want the safetensors for `transformers`, fetch them from [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B).
90
+
91
+ ## Architecture
92
+
93
+ - Qwen 3.6 dense, 27B parameters, 64 transformer layers
94
+ - 24 attention heads, 4 KV heads (GQA), head_dim 256
95
+ - Hidden size 5120, intermediate size 17408 (~3.4× ratio)
96
+ - Vocab 248,320 (shared with 35B-A3B sibling)
97
+ - 262k native context, extensible with YaRN
98
+ - Vision + video support via upstream `mmproj` (not in this repo)
99
+
100
+ ## Quick start
101
+
102
+ ### Ollama
103
+
104
+ A ready-to-use `Modelfile` is included. Edit the `FROM` line to point at your local GGUF copy:
105
+
106
+ ```bash
107
+ # After pulling unsloth/Qwen3.6-27B-GGUF or another quant locally:
108
+ ollama create janus-27b -f Modelfile && ollama run janus-27b
109
+ ```
110
+
111
+ ### Inference (OpenAI-compatible)
112
+
113
+ ```bash
114
+ curl -s http://localhost:11434/v1/chat/completions \
115
+ -H 'Content-Type: application/json' \
116
+ -d '{
117
+ "model": "janus-27b",
118
+ "messages": [
119
+ {"role": "system", "content": "You are Janus, a precise reasoning assistant."},
120
+ {"role": "user", "content": "Explain the Burrows-Wheeler transform in 200 words."}
121
+ ],
122
+ "temperature": 0.6
123
+ }' | jq -r '.choices[0].message.content'
124
+ ```
125
+
126
+ ### Recommended sampling
127
+
128
+ | Use | temp | top_p | top_k | repeat_penalty |
129
+ |---|---:|---:|---:|---:|
130
+ | Reasoning / general | 0.6 | 0.95 | 20 | 1.05 |
131
+ | Creative / RP | 0.8 | 0.95 | 40 | 1.02 |
132
+
133
+ Lower temperature (0.4-0.6) and bump `repeat_penalty` to 1.08 if it loops inside `<think>` tags.
134
+
135
+ ### System prompt
136
+
137
+ Same as the 35B sibling:
138
+
139
+ ```text
140
+ You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
141
+
142
+ Behavior rules:
143
+ - Answer the user's actual request directly.
144
+ - Be accurate, complete, and structured.
145
+ - Think before answering, but do not get stuck in repetitive loops or meta-commentary.
146
+ - If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
147
+ - If the user wants creative writing, preserve tone, continuity, and character consistency.
148
+ - If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
149
+ - Finish with a usable answer, not just planning.
150
+ ```
151
+
152
+ ## Hardware requirements
153
+
154
+ The dense 27B is the easier of the two Janus models to deploy.
155
+
156
+ | Hardware | Status |
157
+ |---|---|
158
+ | ≥32 GB RAM (CPU-only) | Works, ~1-3 tok/s |
159
+ | RTX 3090 / 4090 24 GB | Works, full Q4 offload, ~25-40 tok/s |
160
+ | RTX 5090 32 GB | Works, full offload at higher quant (Q5/Q6), ~30-50 tok/s |
161
+ | Mac Studio M2/M3 32 GB+ unified | Works, ~15-25 tok/s |
162
+ | ASUS ROG Flow Z13 (Ryzen AI Max+, 32 GB unified) | Borderline — 16 GB Q4 GGUF + ~16 GB compute graph crowds the 20 GB iGPU pool. Try Q3_K_S (~12 GB) for headroom. |
163
+
164
+ ## Chat template
165
+
166
+ Identical to the 35B sibling — Qwen 3.x ChatML with `<|im_start|>` / `<|im_end|>` markers, `<think>...</think>` for reasoning traces, XML-style `<tool_call>` for function calling. The template is embedded in the GGUF metadata.
167
+
168
+ See the [Janus-35B Chat template section](https://huggingface.co/FoolDev/janus#chat-template) for examples — they apply unchanged here.
169
+
170
+ ## Known limitations
171
+
172
+ - **Slower per token than the 35B-A3B sibling.** Dense 27B beats sparse 35B/3B-active on steps-per-second benchmarks because every parameter contributes; if you optimize for tokens-per-second, the MoE wins.
173
+ - **No mmproj in this release.** Same as 35B — fetch upstream for vision input.
174
+ - **Q4_K_M quality loss** is real. Use Q5_K_M or Q6_K if you have the VRAM (~20-22 GB).
175
+ - **No formal evaluation in this card.** Numbers above are estimates.
176
+
177
+ ## Related models
178
+
179
+ | Model | Notes |
180
+ |---|---|
181
+ | [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) | Upstream base, safetensors |
182
+ | [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) | Recommended GGUF source |
183
+ | [FoolDev/janus](https://huggingface.co/FoolDev/janus) | 35B-A3B MoE sibling. More capacity, more memory pressure. |
184
+ | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B starter model when 27B/35B is too heavy |
185
+
186
+ ## Credits
187
+
188
+ - Base model: [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) (Alibaba)
189
+ - Reasoning teacher: Claude Opus 4.7 (Anthropic)
190
+ - Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius)
191
+
192
+ License inherited from upstream: Apache-2.0.
banner.png ADDED
banner.svg ADDED