FoolDev commited on
Commit
64b629a
·
0 Parent(s):

Duplicate from FoolDev/Janus-35B

Browse files
Files changed (14) hide show
  1. .gitattributes +36 -0
  2. CHANGELOG.md +111 -0
  3. CITATION.cff +39 -0
  4. Janus-35B-A3B.Q4_K_M.gguf +3 -0
  5. LICENSE +201 -0
  6. Modelfile +121 -0
  7. README.md +335 -0
  8. banner.png +0 -0
  9. banner.svg +97 -0
  10. moe-routing.svg +670 -0
  11. params +12 -0
  12. scripts/check_bridge_sync.py +147 -0
  13. system +10 -0
  14. template +51 -0
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.gguf filter=lfs diff=lfs merge=lfs -text
CHANGELOG.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Changelog
2
+
3
+ All notable changes to this repository. Format loosely follows
4
+ [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). This repo holds
5
+ a model card, an Ollama Modelfile, the HF Ollama-bridge `template` /
6
+ `system` / `params` files, and the bundled Q4_K_M GGUF, so versions
7
+ track the **tooling and documentation**, not the underlying base model.
8
+
9
+ ## [Unreleased]
10
+
11
+ ### Added
12
+ - Root-level `template`, `system`, and `params` files for HF's Ollama
13
+ bridge. The bridge generates Ollama manifests at request time from
14
+ these three files (NOT from `Modelfile` — confirmed against
15
+ https://huggingface.co/docs/hub/en/ollama). Without them, `ollama
16
+ run hf.co/FoolDev/janus` got an auto-generated manifest with the
17
+ broken `{{ if .Prompt }} .Prompt }}<|im_end|>` template (Ollama's
18
+ faulty Go-template conversion of the GGUF's embedded jinja),
19
+ corrupted stop tokens (`".Prompt }}<|im_end|>"` bleed), and no
20
+ `.Tools` / `.ToolCalls` blocks — so the published Ollama tag
21
+ advertised `completion` only and rejected any request with a
22
+ `tools` array. The three files mirror the `Modelfile`'s `TEMPLATE`
23
+ / `SYSTEM` / `PARAMETER` directives; both routes wire tool calling
24
+ correctly. Edit them together when changing one. Verified by
25
+ re-pulling the fresh tag: `ollama show hf.co/FoolDev/janus` now
26
+ reports `completion`, `tools`, `thinking` and tool calls round-trip
27
+ end-to-end through `/api/chat`.
28
+
29
+ ### Changed
30
+ - README "Tool / function calling" section: split into explicit
31
+ Ollama-path and embedded-jinja-path subsections. Earlier wording
32
+ conflated the two on-the-wire formats. The Ollama path (Modelfile
33
+ `TEMPLATE` and the new `template` bridge file, both kept in sync)
34
+ prompts JSON-in-XML — the form Ollama's tool-call extractor parses
35
+ into a structured `tool_calls` array. The embedded-jinja path
36
+ (llama.cpp, llama-cpp-python, LM Studio) reads the Qwen 3.6 native
37
+ chat template baked into the GGUF, which prompts the verbose
38
+ `<function=name>` / `<parameter=arg>` form the model was trained
39
+ on. Both are valid; the model adapts to whichever shape the system
40
+ prompt prescribes. README now shows both formats side by side.
41
+ - README "Quick start / Ollama" section: documents both pull paths
42
+ (`hf.co/...` via bridge files vs `make ... -f Modelfile` locally)
43
+ and explicitly notes that HF's bridge does not read `Modelfile`.
44
+ - README "Hardware requirements" intro: re-framed the "~38 GB
45
+ minimum" claim as "~38 GB at default `num_ctx 16384`" and
46
+ documented that 32 GB hosts can fit the model by trimming context
47
+ and batch size.
48
+ - README "Quick start / Ollama" snippet: show both
49
+ `ollama run hf.co/FoolDev/janus` and the explicit-tag form
50
+ `ollama run hf.co/FoolDev/janus:Q4_K_M`. Same blob (the default
51
+ tag maps to Q4_K_M), but parity with the 27B sibling — which lists
52
+ both `:latest` and `:Q3_K_S` — and removes ambiguity for users
53
+ scripting against an explicit quant tag. Verified the explicit tag
54
+ resolves to the same manifest (model SHA `a076aa0d3a1a`, bridge
55
+ blobs `22c7ade72045` / `84a1a6ac580b` / `f7b1992cf9c1`).
56
+
57
+ ### Added (cont'd)
58
+ - README `## TL;DR` section near the top of the model card, mirroring
59
+ the 27B sibling. Two paths (HF Ollama bridge / local Modelfile
60
+ build) with explicit tags and a one-line capability check. Notes
61
+ the bridge ingests `template` / `system` / `params`, not
62
+ `Modelfile`, so users skimming the top of the page won't form the
63
+ wrong mental model of which file gets used when.
64
+ - `CITATION.cff` for citation metadata (Apache-2.0, references the
65
+ upstream Qwen3.6-35B-A3B base and the dense Janus-27B sibling).
66
+ The 27B sibling has had this file since 0.5.0; adding here for
67
+ parity so academic-style citations work across both repos.
68
+ - `LICENSE` file containing the full Apache 2.0 text. The model card
69
+ front-matter has always declared `license: apache-2.0` and the
70
+ upstream Qwen 3.6 license inherits Apache-2.0, but until now the
71
+ repo lacked the actual license text file. Same Apache 2.0 text
72
+ shipped in the 27B sibling.
73
+ - `scripts/check_bridge_sync.py` — regression guard for the
74
+ `Modelfile` <-> `template` / `system` / `params` sync invariant.
75
+ The two configurations are consumed by different code paths
76
+ (`ollama create -f Modelfile` for local builds vs HF's Ollama
77
+ bridge for `hf.co/...` pulls — HF does not read `Modelfile`), so
78
+ drift between them re-introduces the bug fixed in commit 70ccef1
79
+ where `hf.co/FoolDev/janus` shipped a broken auto-generated
80
+ template while local builds had the correct one. Script parses
81
+ the Modelfile's `TEMPLATE` / `SYSTEM` / `PARAMETER` directives,
82
+ loads the three bridge files, and fails on any mismatch with a
83
+ per-key diff. Run on demand before pushing edits to either side
84
+ of the configuration. The 27B sibling wires an equivalent script
85
+ into a pre-commit hook (commit 5c67b08); this repo stays leaner
86
+ and runs it manually.
87
+
88
+ ### Fixed
89
+ - README "Chat template" intro previously claimed all loaders handle
90
+ the embedded jinja automatically. True for llama.cpp / LM Studio /
91
+ llama-cpp-python; not true for Ollama, which needs an explicit
92
+ override (the `Modelfile` TEMPLATE block locally, the root-level
93
+ `template` file when serving via `hf.co/...`).
94
+ - README "Tool / function calling" earlier said the XML form
95
+ `<function=name><parameter=arg>` is "not what this model produces".
96
+ That was wrong: the embedded GGUF jinja prompts exactly that form,
97
+ and llama.cpp / LM Studio / llama-cpp-python users will see it.
98
+ The "JSON-in-XML" claim only applies on the Ollama path because
99
+ that's what the Modelfile TEMPLATE prompt instructs.
100
+
101
+ ## [0.1.0] — initial public release
102
+
103
+ ### Added
104
+ - Model card with architecture overview, sampling defaults, hardware
105
+ table, and `Modelfile` for `ollama create janus -f Modelfile`.
106
+ - Bundled `Janus-35B-A3B.Q4_K_M.gguf` (~19 GB) via Git LFS so the HF
107
+ "Use this model" widget surfaces a working `ollama run` snippet.
108
+ - Tokyo Night themed banner (PNG sourced from the SVG).
109
+ - Status badges for license, base model, architecture, quant.
110
+ - Linked sibling `FoolDev/janus-27b` (dense Qwen 3.6 27B base) under
111
+ Related models.
CITATION.cff ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ cff-version: 1.2.0
2
+ title: "Janus-35B: A Mixture-of-Experts Distillation Wrapper for Qwen 3.6 35B-A3B"
3
+ message: "If you use this model card or its accompanying files, please cite as below."
4
+ type: software
5
+ authors:
6
+ - name: FoolDev
7
+ website: "https://huggingface.co/FoolDev"
8
+ repository-code: "https://huggingface.co/FoolDev/janus"
9
+ url: "https://huggingface.co/FoolDev/janus"
10
+ abstract: >-
11
+ Janus-35B is a personal repackaging of the Qwen 3.6 35B-A3B
12
+ mixture-of-experts base model (35B total / 3B active per token,
13
+ 256 experts, 8 activated) with Claude Opus 4.7 in the reasoning
14
+ teacher slot. The repository ships an Ollama Modelfile, the HF
15
+ Ollama-bridge files (template / system / params), sampling defaults,
16
+ and a bundled Q4_K_M GGUF (~19 GB) so the HF "Use this model" widget
17
+ surfaces a one-liner Ollama snippet. Other quants and the upstream
18
+ safetensors (Qwen/Qwen3.6-35B-A3B) are pulled from upstream on demand
19
+ rather than redistributed.
20
+ keywords:
21
+ - qwen
22
+ - qwen3.6
23
+ - mixture-of-experts
24
+ - moe
25
+ - distillation
26
+ - reasoning
27
+ - llm
28
+ license: Apache-2.0
29
+ references:
30
+ - type: software
31
+ title: "Qwen3.6-35B-A3B"
32
+ authors:
33
+ - name: Alibaba Qwen Team
34
+ url: "https://huggingface.co/Qwen/Qwen3.6-35B-A3B"
35
+ - type: software
36
+ title: "Janus-27B (dense sibling)"
37
+ authors:
38
+ - name: FoolDev
39
+ url: "https://huggingface.co/FoolDev/janus-27b"
Janus-35B-A3B.Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a076aa0d3a1aab0bbfa24eb6a5163f6c8eebf6fc156f81c5820ae65dc4d19fc7
3
+ size 18939312896
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for describing the origin of the Work and
141
+ reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may accept and charge a
167
+ fee for acceptance of support, warranty, indemnity, or other liability
168
+ obligations and/or rights consistent with this License. However, in
169
+ accepting such obligations, You may act only on Your own behalf and
170
+ on Your sole responsibility, not on behalf of any other Contributor,
171
+ and only if You agree to indemnify, defend, and hold each Contributor
172
+ harmless for any liability incurred by, or claims asserted against,
173
+ such Contributor by reason of your accepting any such warranty or
174
+ additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright 2025 FoolDev
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
Modelfile ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM ./Janus-35B-A3B.Q4_K_M.gguf
2
+
3
+ # Chat template — Qwen 3.6 ChatML in Ollama Go-template form, with the
4
+ # tool-calling blocks Ollama's capability detector looks for. Without a
5
+ # TEMPLATE that references .Tools and .ToolCalls, /api/chat and
6
+ # /v1/chat/completions reject any request carrying a `tools` array with
7
+ # `<model> does not support tools`. Same template as the 27B dense sibling
8
+ # (FoolDev/janus-27b) — both share the Qwen 3.6 chat format.
9
+ TEMPLATE """{{- $lastUserIdx := -1 -}}
10
+ {{- range $idx, $msg := .Messages -}}
11
+ {{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
12
+ {{- end }}
13
+ {{- if or .System .Tools }}<|im_start|>system
14
+ {{ if .System }}{{ .System }}
15
+
16
+ {{ end }}
17
+ {{- if .Tools }}# Tools
18
+
19
+ You may call one or more functions to assist with the user query.
20
+
21
+ You are provided with function signatures within <tools></tools> XML tags:
22
+ <tools>
23
+ {{- range .Tools }}
24
+ {"type": "function", "function": {{ .Function }}}
25
+ {{- end }}
26
+ </tools>
27
+
28
+ For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
29
+ <tool_call>
30
+ {"name": <function-name>, "arguments": <args-json-object>}
31
+ </tool_call>
32
+ {{- end -}}<|im_end|>
33
+ {{ end }}
34
+ {{- range $i, $_ := .Messages }}
35
+ {{- $last := eq (len (slice $.Messages $i)) 1 -}}
36
+ {{- if eq .Role "user" }}<|im_start|>user
37
+ {{ .Content }}<|im_end|>
38
+ {{ else if eq .Role "assistant" }}<|im_start|>assistant
39
+ {{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
40
+ <think>{{ .Thinking }}</think>
41
+ {{ end -}}
42
+ {{ if .Content }}{{ .Content }}{{ end }}
43
+ {{- if .ToolCalls }}
44
+ {{- range .ToolCalls }}
45
+ <tool_call>
46
+ {"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
47
+ </tool_call>
48
+ {{- end }}
49
+ {{- end }}{{ if not $last }}<|im_end|>
50
+ {{ end }}
51
+ {{- else if eq .Role "tool" }}<|im_start|>user
52
+ <tool_response>
53
+ {{ .Content }}
54
+ </tool_response><|im_end|>
55
+ {{ end }}
56
+ {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
57
+ <think>
58
+ {{ end }}
59
+ {{- end }}"""
60
+
61
+ # Sampling tuned for reasoning + general use. See README "Recommended sampling"
62
+ # for creative/RP alternatives.
63
+ PARAMETER temperature 0.6
64
+ PARAMETER top_p 0.95
65
+ PARAMETER top_k 20
66
+ PARAMETER repeat_penalty 1.05
67
+ PARAMETER num_ctx 16384
68
+
69
+ # Stop tokens. Without these, Ollama only honors <|im_end|> from the GGUF
70
+ # metadata; the model occasionally emits <|endoftext|> instead and Ollama
71
+ # keeps generating past it (synthesising a fake new user turn). Listing
72
+ # both — plus <|im_start|> as a belt-and-braces guard against the same
73
+ # loop — keeps responses cleanly terminated. Same fix the 27B sibling
74
+ # (FoolDev/janus-27b) shipped in commit 6672746.
75
+ PARAMETER stop "<|im_end|>"
76
+ PARAMETER stop "<|endoftext|>"
77
+ PARAMETER stop "<|im_start|>"
78
+
79
+ SYSTEM """You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
80
+
81
+ Behavior rules:
82
+ - Answer the user's actual request directly.
83
+ - Be accurate, complete, and structured.
84
+ - Think before answering, but do not get stuck in repetitive loops or meta-commentary.
85
+ - If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
86
+ - If the user wants creative writing, preserve tone, continuity, and character consistency.
87
+ - If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
88
+ - Finish with a usable answer, not just planning."""
89
+
90
+ # Hardware notes
91
+ # --------------
92
+ # This Q4_K_M is ~19 GB on disk. Real footprint at runtime:
93
+ # weights mmap ~19 GB
94
+ # compute graph alloc ~19 GB (Ollama log: device.go:272 "total memory")
95
+ # KV cache @ 16K ctx ~1 GB (with OLLAMA_KV_CACHE_TYPE=q8_0)
96
+ # total minimum ~38 GB
97
+ #
98
+ # Working configurations (verified or documented):
99
+ # ✓ Single H100 80GB / A100 80GB — full GPU offload
100
+ # ✓ RTX 5090 32GB / RTX 4090 24GB — partial offload, ~15-25 tok/s
101
+ # ✓ Mac Studio M2/M3 Ultra 64GB+ — unified memory, ~20+ tok/s
102
+ # ✓ Linux box with 64GB+ RAM (CPU-only) — ~3-6 tok/s
103
+ # ⚠ ASUS ROG Flow Z13 (Ryzen AI Max+, 32GB) — OOMs at default num_ctx 16384;
104
+ # fits with num_ctx ≤ 4096 and
105
+ # num_batch ≤ 256 (verified)
106
+ #
107
+ # Measured data point (ASUS ROG Flow Z13 GZ302EA-RU004W, Ryzen AI Max+ 395 +
108
+ # Radeon 8060S iGPU, 32 GB unified, ROCm gfx1151, OLLAMA_FLASH_ATTENTION=1,
109
+ # OLLAMA_KV_CACHE_TYPE=q8_0, num_ctx 4096, num_batch 256):
110
+ # Q4_K_M, 3-prompt mix → 28.71 tok/s aggregate
111
+ # (717 tokens / 25.0 s; 29.55 / 29.24 / 28.57 short/medium/long).
112
+ # ~97% of layers offload to the iGPU via ROCm. Compute split per
113
+ # `ollama ps` shows 3% CPU / 97% GPU at 4096 ctx.
114
+ #
115
+ # To run on a 32 GB unified-memory laptop, override these in your local
116
+ # Modelfile copy (or pass via -o on `ollama run`):
117
+ # PARAMETER num_ctx 4096
118
+ # PARAMETER num_batch 256
119
+ #
120
+ # If you have ≥48 GB RAM but want partial GPU offload, set:
121
+ # PARAMETER num_gpu 24 # offload most layers (model has 40)
README.md ADDED
@@ -0,0 +1,335 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen3.6-35B-A3B
5
+ datasets:
6
+ - crownelius/Creative_Writing_ShareGPT_Enhanced
7
+ - microsoft/rStar-Coder
8
+ - peteromallet/dataclaw-peteromallet
9
+ - crownelius/Opus-4.7-Reasoning
10
+ - openbmb/UltraData-Math
11
+ - Crownelius/Crow-Heretic-TeichAI-Unified
12
+ language:
13
+ - en
14
+ - zh
15
+ - ru
16
+ - es
17
+ - fr
18
+ - it
19
+ - ja
20
+ - ko
21
+ - de
22
+ - ar
23
+ - tr
24
+ - pl
25
+ - sv
26
+ - nl
27
+ - he
28
+ - id
29
+ - uk
30
+ - fa
31
+ - pt
32
+ - ms
33
+ - fi
34
+ - el
35
+ tags:
36
+ - qwen3_6
37
+ - moe
38
+ - conversational
39
+ - multimodal
40
+ - agent
41
+ - gguf
42
+ library_name: transformers
43
+ pipeline_tag: image-text-to-text
44
+ ---
45
+
46
+ <img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/banner.svg" alt="Janus-35B banner" width="100%" />
47
+
48
+ [![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
49
+ [![Base Model](https://img.shields.io/badge/Base-Qwen3.6--35B--A3B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
50
+ [![Architecture](https://img.shields.io/badge/Arch-MoE_35B/3B_active-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
51
+ [![Quant](https://img.shields.io/badge/GGUF-Q4__K__M-9ece6a?style=flat&labelColor=1a1b26)](#whats-here)
52
+ [![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
53
+
54
+ # Janus-35B
55
+
56
+ > **Flagship Reasoning. Sparse Footprint.**
57
+ > *Qwen 3.6 35B-A3B repackaged with Claude Opus 4.7 in the teacher slot.*
58
+
59
+ **`Architecture:`** `Qwen 3.6 35B-A3B (MoE)` | **`Total Params:`** `35B` | **`Active Params:`** `3B` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled MoE LLM`
60
+
61
+ A personal fork of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) — a 35B-total / 3B-active mixture-of-experts multimodal model — repackaged as Janus-35B with Claude Opus 4.7 reasoning data in the teacher slot.
62
+
63
+ ## TL;DR
64
+
65
+ One-liner via Hugging Face (pulls a GGUF + this repo's root-level
66
+ `template` / `system` / `params` files, including the tool-calling
67
+ template — HF's Ollama bridge ingests those three files, not
68
+ `Modelfile`):
69
+
70
+ ```bash
71
+ ollama run hf.co/FoolDev/Janus-35B # default ~19 GB Q4_K_M
72
+ ollama run hf.co/FoolDev/Janus-35B:Q4_K_M # same blob, explicit tag
73
+ ```
74
+
75
+ Or build locally (uses this repo's `Modelfile`, kept in sync with the
76
+ three bridge files):
77
+
78
+ ```bash
79
+ git clone https://huggingface.co/FoolDev/Janus-35B && cd Janus-35B
80
+ ollama create janus -f Modelfile && ollama run janus
81
+ ```
82
+
83
+ After either path, `ollama show janus` lists `completion`, `tools`,
84
+ and `thinking` under Capabilities. Hardware: ~38 GB RAM at default
85
+ `num_ctx 16384`, or trim ctx + batch to fit 32 GB hosts (see
86
+ [Hardware requirements](#hardware-requirements)).
87
+
88
+ ## What's here
89
+
90
+ | File | Use |
91
+ |---|---|
92
+ | `Janus-35B-A3B.Q4_K_M.gguf` | Recommended default, ~19 GB |
93
+ | `Modelfile` | Ollama wrapper for **local** builds (`ollama create janus -f Modelfile`) — overrides the GGUF's embedded template with one that exposes `.Tools` / `.ToolCalls` to Ollama's capability detector. |
94
+ | `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Janus-35B` directly. The bridge does **not** read `Modelfile` (see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)); it ingests these three root-level files instead. Kept in sync with the `Modelfile`'s `TEMPLATE` / `SYSTEM` / `PARAMETER` directives. |
95
+ | `scripts/check_bridge_sync.py` | Run before pushing a `Modelfile` / `template` / `system` / `params` edit to verify the four configurations remain in sync. Exits 0 if in sync, 1 with a per-key diff if not. |
96
+
97
+ GGUF-only release. Pull the upstream safetensors from `Qwen/Qwen3.6-35B-A3B` if you need the `transformers` tree.
98
+
99
+ ## Architecture
100
+
101
+ <p align="left">
102
+ <img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/moe-routing.svg" alt="animated MoE routing visualization: 16x16 grid of 256 expert dots with 8 lit at any time, cycling through 8 routing patterns" width="640" />
103
+ </p>
104
+
105
+ - Qwen 3.6, 35B total / 3B active, MoE (256 experts, 8 activated per token)
106
+ - 40 layers, 10 × (3 × DeltaNet → MoE / 1 × Gated Attention → MoE)
107
+ - 262k native context, extensible to ~1M with YaRN
108
+ - Vision + video supported by upstream (mmproj not included in this release)
109
+ - Vocab 248,320
110
+
111
+ ## Quick start
112
+
113
+ ### llama.cpp / LM Studio
114
+
115
+ Drop the GGUF into your loader of choice. The chat template is embedded in the GGUF metadata, so llama.cpp's `--chat-template auto` and LM Studio's GGUF auto-detection handle plain conversation correctly.
116
+
117
+ ### Ollama
118
+
119
+ The chat template baked into the GGUF is **not sufficient on Ollama** — it lacks the `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires, so a naive `ollama pull` reports `does not support tools` and rejects any request carrying a `tools` array. Two paths fix this:
120
+
121
+ ```bash
122
+ # A. Pull straight from HF (uses the root-level template/system/params files):
123
+ ollama run hf.co/FoolDev/Janus-35B # default tag, ~19 GB Q4_K_M
124
+ ollama run hf.co/FoolDev/Janus-35B:Q4_K_M # same blob, explicit tag
125
+ # Note: HF's Ollama bridge does NOT read Modelfile; it reads template/system/params.
126
+
127
+ # B. Build locally (uses Modelfile, which is kept in sync with the three above):
128
+ ollama create janus -f Modelfile && ollama run janus
129
+ ```
130
+
131
+ After either path, `ollama show janus` should list `completion`, `tools`, and `thinking` under Capabilities.
132
+
133
+ ### Inference examples
134
+
135
+ Once the model is loaded (via `ollama run janus`, `lms server`, or `llama-server`), all the standard OpenAI-compatible clients work. Examples assume the loader is listening on `http://localhost:11434` (Ollama default) — adjust the port for LM Studio (`:1234`) or llama.cpp (`:8080`).
136
+
137
+ #### curl
138
+
139
+ ```bash
140
+ curl -s http://localhost:11434/v1/chat/completions \
141
+ -H 'Content-Type: application/json' \
142
+ -d '{
143
+ "model": "janus",
144
+ "messages": [
145
+ {"role": "system", "content": "You are Janus, a precise reasoning assistant."},
146
+ {"role": "user", "content": "Sketch an algorithm to detect cycles in a directed graph."}
147
+ ],
148
+ "temperature": 0.6,
149
+ "max_tokens": 800
150
+ }' | jq -r '.choices[0].message.content'
151
+ ```
152
+
153
+ #### Python (openai-compat)
154
+
155
+ ```python
156
+ from openai import OpenAI
157
+
158
+ client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored")
159
+
160
+ resp = client.chat.completions.create(
161
+ model="janus",
162
+ messages=[
163
+ {"role": "user", "content": "Write a haiku about a stack overflow."}
164
+ ],
165
+ temperature=0.8,
166
+ top_p=0.95,
167
+ )
168
+ print(resp.choices[0].message.content)
169
+ ```
170
+
171
+ #### Streaming
172
+
173
+ ```python
174
+ stream = client.chat.completions.create(
175
+ model="janus",
176
+ messages=[{"role": "user", "content": "Explain RoPE briefly."}],
177
+ stream=True,
178
+ )
179
+ for chunk in stream:
180
+ delta = chunk.choices[0].delta.content or ""
181
+ print(delta, end="", flush=True)
182
+ ```
183
+
184
+ ### Recommended sampling
185
+
186
+ | Use | temp | top_p | top_k | repeat_penalty |
187
+ |---|---:|---:|---:|---:|
188
+ | Reasoning / general | 0.6 | 0.95 | 20 | 1.05 |
189
+ | Creative / RP | 0.8 | 0.95 | 40 | 1.02 |
190
+
191
+ Lower temperature (0.4–0.6) and bump `repeat_penalty` to 1.08 if it loops inside `<think>` tags.
192
+
193
+ ### System prompt
194
+
195
+ ```text
196
+ You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
197
+
198
+ Behavior rules:
199
+ - Answer the user's actual request directly.
200
+ - Be accurate, complete, and structured.
201
+ - Think before answering, but do not get stuck in repetitive loops or meta-commentary.
202
+ - If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
203
+ - If the user wants creative writing, preserve tone, continuity, and character consistency.
204
+ - If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
205
+ - Finish with a usable answer, not just planning.
206
+ ```
207
+
208
+ ## Hardware requirements
209
+
210
+ This is an 18.9 GB Q4_K_M GGUF. Ollama's runtime footprint at default settings is **roughly 2× the model file** (weights mmap + compute graph allocation), plus KV cache — so ~38 GB total memory at `num_ctx 16384`. The compute-graph allocation scales with context and batch size, so 32 GB hosts can fit the model by trimming both (see Z13 row in the table).
211
+
212
+ | Hardware | Status |
213
+ |---|---|
214
+ | ≥48 GB RAM (CPU-only) | Works, ~3-6 tok/s |
215
+ | Single H100/A100 80 GB | Works, full offload, ~30+ tok/s |
216
+ | RTX 4090 24 GB / 5090 32 GB + 32 GB RAM | Works, partial offload, ~15-25 tok/s |
217
+ | Mac Studio M2/M3 Ultra 64 GB+ unified | Works, ~20+ tok/s |
218
+ | 32 GB unified-memory laptops (Ryzen AI Max+, Apple M-series) | Works with `num_ctx ≤ 4096` and `num_batch ≤ 256` to fit the compute graph; default 16K ctx OOMs. Measured 28.71 tok/s on ASUS ROG Flow Z13 GZ302EA at Q4_K_M (Radeon 8060S iGPU via ROCm gfx1151). |
219
+
220
+ ## Chat template
221
+
222
+ The model uses the standard Qwen 3.x ChatML format with `<|im_start|>` / `<|im_end|>` role markers. The template is embedded in the GGUF metadata for plain conversation use, but Ollama users should rely on the `TEMPLATE` block in the included `Modelfile` — that version exposes the tool-calling scaffolding Ollama's capability detector requires (the embedded template alone is insufficient; see [Ollama](#ollama) above).
223
+
224
+ ### Plain conversation
225
+
226
+ ```text
227
+ <|im_start|>system
228
+ You are Janus, a precise and capable assistant…<|im_end|>
229
+ <|im_start|>user
230
+ What is the time complexity of mergesort?<|im_end|>
231
+ <|im_start|>assistant
232
+ ```
233
+
234
+ ### With reasoning trace
235
+
236
+ When the model decides to think, the assistant turn contains a `<think>…</think>` block followed by the visible answer:
237
+
238
+ ```text
239
+ <|im_start|>assistant
240
+ <think>
241
+ The user is asking about mergesort. Mergesort divides the array, recursively sorts each half, then merges. The recurrence T(n) = 2T(n/2) + O(n) solves to O(n log n).
242
+ </think>
243
+
244
+ Mergesort runs in **O(n log n)** time in the worst, average, and best cases. The recurrence is T(n) = 2T(n/2) + O(n), which solves to Θ(n log n) by the master theorem.<|im_end|>
245
+ ```
246
+
247
+ Most clients (Open WebUI, LibreChat, etc.) hide the `<think>` block by default and show only the final answer. If your client doesn't, set its "show reasoning" toggle off.
248
+
249
+ ### Tool / function calling
250
+
251
+ The wire format depends on which path you take. **Both are valid** — the model adapts to whichever format the system prompt specifies.
252
+
253
+ **Ollama path** (this repo's `Modelfile`). The TEMPLATE advertises tools inside `<tools>…</tools>` and asks the model to reply in JSON-in-XML — the form Ollama's tool-call extractor parses into a structured `tool_calls` array on `/api/chat` and `/v1/chat/completions`:
254
+
255
+ ```text
256
+ <tool_call>
257
+ {"name": "get_weather", "arguments": {"city": "Tokyo"}}
258
+ </tool_call>
259
+ ```
260
+
261
+ **Embedded-jinja path** (llama.cpp, llama-cpp-python, LM Studio). The Qwen 3.6 native chat template baked into the GGUF instructs the model to emit a more verbose XML form. This is the shape you'll see if you talk to `llama-server` or LM Studio directly:
262
+
263
+ ```text
264
+ <tool_call>
265
+ <function=get_weather>
266
+ <parameter=city>
267
+ Tokyo
268
+ </parameter>
269
+ </function>
270
+ </tool_call>
271
+ ```
272
+
273
+ Pick the parser shape that matches your loader. Don't mix.
274
+
275
+ #### Example (Ollama, OpenAI-compatible API)
276
+
277
+ ```python
278
+ from openai import OpenAI
279
+
280
+ client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored")
281
+
282
+ resp = client.chat.completions.create(
283
+ model="janus",
284
+ messages=[
285
+ {"role": "user", "content": "Call get_weather for Tokyo. Respond ONLY with the tool call."}
286
+ ],
287
+ tools=[{
288
+ "type": "function",
289
+ "function": {
290
+ "name": "get_weather",
291
+ "description": "Get current weather for a city",
292
+ "parameters": {
293
+ "type": "object",
294
+ "properties": {"city": {"type": "string"}},
295
+ "required": ["city"],
296
+ },
297
+ },
298
+ }],
299
+ temperature=0.3,
300
+ )
301
+ print(resp.choices[0].message.tool_calls)
302
+ # [ToolCall(id='call_xxx', type='function',
303
+ # function=Function(name='get_weather', arguments='{"city":"Tokyo"}'))]
304
+ ```
305
+
306
+ #### Tips
307
+
308
+ - Use direct prompts ("Call X for Y") rather than soft hints ("Use the tool"). The model thinks before committing to a call, and weak prompts can exhaust `num_predict` inside the `<think>` block before the call is emitted.
309
+ - Allow at least `num_predict: 1024` (or `max_tokens: 1024`) for tool-calling turns, more if the schemas are large.
310
+ - The Modelfile's JSON-in-XML format is what Ollama's tool-call extractor understands; if you swap loaders, swap the parser to match (see "Embedded-jinja path" above).
311
+
312
+ ## Known limitations
313
+
314
+ - **No mmproj in this release.** The base Qwen3.6 supports image and video input via a separate `mmproj` file, which is not included here. Text-only inference works out of the box; multimodal inference requires fetching `Qwen2.5-VL-*-mmproj-*.gguf` (or equivalent) from upstream.
315
+ - **Quantization-induced quality loss.** Q4_K_M is a strong general-purpose quant but does measurably degrade math and code accuracy compared to BF16. If you need maximum quality, run the upstream safetensors on a GPU that fits BF16 (~70 GB).
316
+ - **MoE expert utilization is uneven.** Stock Qwen3.6-35B-A3B routes 8 of 256 experts per token. On narrow domains (e.g. only one programming language) a small subset of experts dominates; load-balance loss was a training-time concern, not a runtime guarantee.
317
+ - **Thinking traces can loop.** Like most reasoning-distilled models, Janus-35B occasionally gets stuck repeating itself inside `<think>` tags. Mitigations: lower temperature to 0.4-0.6, raise `repeat_penalty` to 1.08, or set a `<think>`-token budget cap if your loader supports it.
318
+ - **Not aligned with any specific safety policy.** This is a personal repackage of an open-weight base model with reasoning-focused distillation. There is no RLHF refusal layer beyond what Qwen 3.6 ships with; downstream safety is the operator's responsibility.
319
+ - **No formal evaluation in this card.** Numbers in the hardware table are estimates, not measured. If you produce real benchmarks (MMLU, HumanEval, etc.) and want them included, file a PR.
320
+
321
+ ## Related models
322
+
323
+ | Model | Size | Notes |
324
+ |---|---|---|
325
+ | [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | 35B / 3B active | Upstream base model. `transformers`-native multimodal weights. |
326
+ | [FoolDev/Thanatos-27B](https://huggingface.co/FoolDev/Thanatos-27B) | 27B dense | Dense sibling on the Qwen 3.6 27B base. Same teacher (Opus 4.7), same dataset family, smaller memory footprint, no MoE quirks. |
327
+ | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B dense | Heretic-flavored fine-tune of the same Qwen 3.5 9B base used as a smaller starting point. Useful as a fast first-pass model when 35B is too heavy for the host. |
328
+
329
+ ## Credits
330
+
331
+ - Base model: [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) (Alibaba)
332
+ - Reasoning teacher: Claude Opus 4.7 (Anthropic)
333
+ - Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius)
334
+
335
+ License inherited from upstream: Apache-2.0.
banner.png ADDED
banner.svg ADDED
moe-routing.svg ADDED
params ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "temperature": 0.6,
3
+ "top_p": 0.95,
4
+ "top_k": 20,
5
+ "repeat_penalty": 1.05,
6
+ "num_ctx": 16384,
7
+ "stop": [
8
+ "<|im_end|>",
9
+ "<|endoftext|>",
10
+ "<|im_start|>"
11
+ ]
12
+ }
scripts/check_bridge_sync.py ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Janus-35B — verify Modelfile and HF Ollama bridge files stay in sync.
4
+
5
+ The repo ships two parallel Ollama configurations:
6
+
7
+ - ``Modelfile`` is consumed by the local-build path
8
+ (``ollama create janus -f Modelfile``). It contains
9
+ ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
10
+ - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
11
+ Ollama bridge when users ``ollama run hf.co/FoolDev/janus`` directly. HF
12
+ does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
13
+
14
+ If the two configurations drift apart, ``hf.co/...`` users and local-build
15
+ users get different behaviour — exactly the bug fixed in commit 70ccef1
16
+ ("Add HF Ollama bridge files (template/system/params)"). This script is
17
+ the regression guard: it parses the Modelfile, loads the three bridge
18
+ files, and fails on any mismatch.
19
+
20
+ Usage:
21
+ python3 scripts/check_bridge_sync.py
22
+ # exit 0 if in sync, 1 (with diff details) if not.
23
+
24
+ Run this manually before pushing a Modelfile / bridge-file edit. The 27B
25
+ sibling repo wires an equivalent script into scripts/check.sh and a
26
+ pre-commit hook; this repo intentionally stays leaner and runs it
27
+ on demand.
28
+ """
29
+ from __future__ import annotations
30
+
31
+ import json
32
+ import re
33
+ import sys
34
+ from pathlib import Path
35
+
36
+ ROOT = Path(__file__).resolve().parent.parent
37
+
38
+ # Ollama Modelfile reference: https://github.com/ollama/ollama/blob/main/docs/modelfile.md
39
+ TEMPLATE_RE = re.compile(r'^TEMPLATE\s+"""(.*?)"""', re.DOTALL | re.MULTILINE)
40
+ SYSTEM_RE = re.compile(r'^SYSTEM\s+"""(.*?)"""', re.DOTALL | re.MULTILINE)
41
+ PARAMETER_RE = re.compile(r'^PARAMETER\s+(\S+)\s+(.*?)\s*$', re.MULTILINE)
42
+
43
+
44
+ def parse_modelfile(text: str) -> tuple[str, str, dict[str, object]]:
45
+ """Extract TEMPLATE, SYSTEM, and PARAMETER blocks from a Modelfile."""
46
+ tpl_match = TEMPLATE_RE.search(text)
47
+ if not tpl_match:
48
+ die("Modelfile has no TEMPLATE block")
49
+ template = tpl_match.group(1)
50
+
51
+ sys_match = SYSTEM_RE.search(text)
52
+ if not sys_match:
53
+ die("Modelfile has no SYSTEM block")
54
+ system = sys_match.group(1)
55
+
56
+ params: dict[str, object] = {}
57
+ stops: list[str] = []
58
+ for key, raw in PARAMETER_RE.findall(text):
59
+ # Strip outer quotes if present.
60
+ value: object = raw.strip()
61
+ if isinstance(value, str) and len(value) >= 2 and value[0] == value[-1] == '"':
62
+ value = value[1:-1]
63
+ # Stop tokens accumulate; everything else is scalar.
64
+ if key == "stop":
65
+ stops.append(value) # type: ignore[arg-type]
66
+ continue
67
+ # Cast known numeric params.
68
+ if key in {"temperature", "top_p", "top_k", "repeat_penalty",
69
+ "num_ctx", "num_predict", "num_gpu", "num_batch", "seed"}:
70
+ try:
71
+ value = float(value) if "." in str(value) else int(value) # type: ignore[arg-type]
72
+ except (TypeError, ValueError):
73
+ pass
74
+ params[key] = value
75
+
76
+ if stops:
77
+ params["stop"] = stops
78
+
79
+ return template, system, params
80
+
81
+
82
+ def die(msg: str) -> None:
83
+ print(f"[FAIL] {msg}", file=sys.stderr)
84
+ sys.exit(1)
85
+
86
+
87
+ def diff_strings(label: str, expected: str, actual: str) -> bool:
88
+ if expected == actual:
89
+ return True
90
+ print(f"[FAIL] {label} drift detected", file=sys.stderr)
91
+ print(f" Modelfile len={len(expected)} bridge file len={len(actual)}", file=sys.stderr)
92
+ # Show the first diverging line for quick orientation.
93
+ e_lines = expected.splitlines()
94
+ a_lines = actual.splitlines()
95
+ for i, (e, a) in enumerate(zip(e_lines, a_lines)):
96
+ if e != a:
97
+ print(f" first diff at line {i + 1}:", file=sys.stderr)
98
+ print(f" modelfile : {e!r}", file=sys.stderr)
99
+ print(f" bridge : {a!r}", file=sys.stderr)
100
+ return False
101
+ if len(e_lines) != len(a_lines):
102
+ print(f" line count differs: modelfile={len(e_lines)} bridge={len(a_lines)}",
103
+ file=sys.stderr)
104
+ return False
105
+
106
+
107
+ def main() -> int:
108
+ modelfile = (ROOT / "Modelfile").read_text()
109
+ bridge_template = (ROOT / "template").read_text()
110
+ bridge_system = (ROOT / "system").read_text()
111
+ bridge_params = json.loads((ROOT / "params").read_text())
112
+
113
+ mf_template, mf_system, mf_params = parse_modelfile(modelfile)
114
+
115
+ ok = True
116
+
117
+ # 1. TEMPLATE: byte-for-byte.
118
+ ok &= diff_strings("TEMPLATE", mf_template, bridge_template)
119
+
120
+ # 2. SYSTEM: trim trailing whitespace on both ends. The bridge file
121
+ # typically has a trailing newline; the Modelfile block doesn't.
122
+ ok &= diff_strings("SYSTEM", mf_system.strip(), bridge_system.strip())
123
+
124
+ # 3. PARAMETER vs params JSON: compare normalized dicts.
125
+ if mf_params != bridge_params:
126
+ print("[FAIL] params drift detected", file=sys.stderr)
127
+ for k in sorted(set(mf_params) | set(bridge_params)):
128
+ mv = mf_params.get(k, "<missing>")
129
+ bv = bridge_params.get(k, "<missing>")
130
+ if mv != bv:
131
+ print(f" {k}: modelfile={mv!r} bridge={bv!r}", file=sys.stderr)
132
+ ok = False
133
+
134
+ if not ok:
135
+ print("\n[!] Modelfile and bridge files are out of sync.", file=sys.stderr)
136
+ print(" Edit them together: any change to TEMPLATE / SYSTEM /",
137
+ file=sys.stderr)
138
+ print(" PARAMETER must be reflected in template / system / params.",
139
+ file=sys.stderr)
140
+ return 1
141
+
142
+ print("[ ok ] Modelfile <-> bridge files in sync")
143
+ return 0
144
+
145
+
146
+ if __name__ == "__main__":
147
+ sys.exit(main())
system ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
2
+
3
+ Behavior rules:
4
+ - Answer the user's actual request directly.
5
+ - Be accurate, complete, and structured.
6
+ - Think before answering, but do not get stuck in repetitive loops or meta-commentary.
7
+ - If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
8
+ - If the user wants creative writing, preserve tone, continuity, and character consistency.
9
+ - If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
10
+ - Finish with a usable answer, not just planning.
template ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{- $lastUserIdx := -1 -}}
2
+ {{- range $idx, $msg := .Messages -}}
3
+ {{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
4
+ {{- end }}
5
+ {{- if or .System .Tools }}<|im_start|>system
6
+ {{ if .System }}{{ .System }}
7
+
8
+ {{ end }}
9
+ {{- if .Tools }}# Tools
10
+
11
+ You may call one or more functions to assist with the user query.
12
+
13
+ You are provided with function signatures within <tools></tools> XML tags:
14
+ <tools>
15
+ {{- range .Tools }}
16
+ {"type": "function", "function": {{ .Function }}}
17
+ {{- end }}
18
+ </tools>
19
+
20
+ For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
21
+ <tool_call>
22
+ {"name": <function-name>, "arguments": <args-json-object>}
23
+ </tool_call>
24
+ {{- end -}}<|im_end|>
25
+ {{ end }}
26
+ {{- range $i, $_ := .Messages }}
27
+ {{- $last := eq (len (slice $.Messages $i)) 1 -}}
28
+ {{- if eq .Role "user" }}<|im_start|>user
29
+ {{ .Content }}<|im_end|>
30
+ {{ else if eq .Role "assistant" }}<|im_start|>assistant
31
+ {{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
32
+ <think>{{ .Thinking }}</think>
33
+ {{ end -}}
34
+ {{ if .Content }}{{ .Content }}{{ end }}
35
+ {{- if .ToolCalls }}
36
+ {{- range .ToolCalls }}
37
+ <tool_call>
38
+ {"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
39
+ </tool_call>
40
+ {{- end }}
41
+ {{- end }}{{ if not $last }}<|im_end|>
42
+ {{ end }}
43
+ {{- else if eq .Role "tool" }}<|im_start|>user
44
+ <tool_response>
45
+ {{ .Content }}
46
+ </tool_response><|im_end|>
47
+ {{ end }}
48
+ {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
49
+ <think>
50
+ {{ end }}
51
+ {{- end }}