Instructions to use FoolDev/Janus-35B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FoolDev/Janus-35B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="FoolDev/Janus-35B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("FoolDev/Janus-35B", dtype="auto")

llama-cpp-python

How to use FoolDev/Janus-35B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="FoolDev/Janus-35B",
	filename="Janus-35B-A3B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use FoolDev/Janus-35B with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf FoolDev/Janus-35B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf FoolDev/Janus-35B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf FoolDev/Janus-35B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf FoolDev/Janus-35B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf FoolDev/Janus-35B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf FoolDev/Janus-35B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf FoolDev/Janus-35B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf FoolDev/Janus-35B:Q4_K_M

Use Docker

docker model run hf.co/FoolDev/Janus-35B:Q4_K_M

LM Studio
Jan

vLLM

How to use FoolDev/Janus-35B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FoolDev/Janus-35B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Janus-35B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/FoolDev/Janus-35B:Q4_K_M

SGLang

How to use FoolDev/Janus-35B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FoolDev/Janus-35B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Janus-35B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FoolDev/Janus-35B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Janus-35B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use FoolDev/Janus-35B with Ollama:
```
ollama run hf.co/FoolDev/Janus-35B:Q4_K_M
```

Unsloth Studio

How to use FoolDev/Janus-35B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Janus-35B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Janus-35B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for FoolDev/Janus-35B to start chatting

How to use FoolDev/Janus-35B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Janus-35B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "FoolDev/Janus-35B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use FoolDev/Janus-35B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Janus-35B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default FoolDev/Janus-35B:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use FoolDev/Janus-35B with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FoolDev/Janus-35B:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "FoolDev/Janus-35B:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use FoolDev/Janus-35B with Docker Model Runner:
```
docker model run hf.co/FoolDev/Janus-35B:Q4_K_M
```

Lemonade

How to use FoolDev/Janus-35B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull FoolDev/Janus-35B:Q4_K_M

Run and chat with the model

lemonade run user.Janus-35B-Q4_K_M

List all available models

lemonade list

FoolDev commited on May 20

Commit

64b629a

0 Parent(s):

Duplicate from FoolDev/Janus-35B

Browse files

Files changed (14) hide show

.gitattributes +36 -0
CHANGELOG.md +111 -0
CITATION.cff +39 -0
Janus-35B-A3B.Q4_K_M.gguf +3 -0
LICENSE +201 -0
Modelfile +121 -0
README.md +335 -0
banner.png +0 -0
banner.svg +97 -0
moe-routing.svg +670 -0
params +12 -0
scripts/check_bridge_sync.py +147 -0
system +10 -0
template +51 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,36 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+*.gguf filter=lfs diff=lfs merge=lfs -text

CHANGELOG.md ADDED Viewed

	@@ -0,0 +1,111 @@

+# Changelog
+All notable changes to this repository. Format loosely follows
+[Keep a Changelog](https://keepachangelog.com/en/1.1.0/). This repo holds
+a model card, an Ollama Modelfile, the HF Ollama-bridge `template` /
+`system` / `params` files, and the bundled Q4_K_M GGUF, so versions
+track the **tooling and documentation**, not the underlying base model.
+## [Unreleased]
+### Added
+- Root-level `template`, `system`, and `params` files for HF's Ollama
+  bridge. The bridge generates Ollama manifests at request time from
+  these three files (NOT from `Modelfile` — confirmed against
+  https://huggingface.co/docs/hub/en/ollama). Without them, `ollama
+  run hf.co/FoolDev/janus` got an auto-generated manifest with the
+  broken `{{ if .Prompt }} .Prompt }}<|im_end|>` template (Ollama's
+  faulty Go-template conversion of the GGUF's embedded jinja),
+  corrupted stop tokens (`".Prompt }}<|im_end|>"` bleed), and no
+  `.Tools` / `.ToolCalls` blocks — so the published Ollama tag
+  advertised `completion` only and rejected any request with a
+  `tools` array. The three files mirror the `Modelfile`'s `TEMPLATE`
+  / `SYSTEM` / `PARAMETER` directives; both routes wire tool calling
+  correctly. Edit them together when changing one. Verified by
+  re-pulling the fresh tag: `ollama show hf.co/FoolDev/janus` now
+  reports `completion`, `tools`, `thinking` and tool calls round-trip
+  end-to-end through `/api/chat`.
+### Changed
+- README "Tool / function calling" section: split into explicit
+  Ollama-path and embedded-jinja-path subsections. Earlier wording
+  conflated the two on-the-wire formats. The Ollama path (Modelfile
+  `TEMPLATE` and the new `template` bridge file, both kept in sync)
+  prompts JSON-in-XML — the form Ollama's tool-call extractor parses
+  into a structured `tool_calls` array. The embedded-jinja path
+  (llama.cpp, llama-cpp-python, LM Studio) reads the Qwen 3.6 native
+  chat template baked into the GGUF, which prompts the verbose
+  `<function=name>` / `<parameter=arg>` form the model was trained
+  on. Both are valid; the model adapts to whichever shape the system
+  prompt prescribes. README now shows both formats side by side.
+- README "Quick start / Ollama" section: documents both pull paths
+  (`hf.co/...` via bridge files vs `make ... -f Modelfile` locally)
+  and explicitly notes that HF's bridge does not read `Modelfile`.
+- README "Hardware requirements" intro: re-framed the "~38 GB
+  minimum" claim as "~38 GB at default `num_ctx 16384`" and
+  documented that 32 GB hosts can fit the model by trimming context
+  and batch size.
+- README "Quick start / Ollama" snippet: show both
+  `ollama run hf.co/FoolDev/janus` and the explicit-tag form
+  `ollama run hf.co/FoolDev/janus:Q4_K_M`. Same blob (the default
+  tag maps to Q4_K_M), but parity with the 27B sibling — which lists
+  both `:latest` and `:Q3_K_S` — and removes ambiguity for users
+  scripting against an explicit quant tag. Verified the explicit tag
+  resolves to the same manifest (model SHA `a076aa0d3a1a`, bridge
+  blobs `22c7ade72045` / `84a1a6ac580b` / `f7b1992cf9c1`).
+### Added (cont'd)
+- README `## TL;DR` section near the top of the model card, mirroring
+  the 27B sibling. Two paths (HF Ollama bridge / local Modelfile
+  build) with explicit tags and a one-line capability check. Notes
+  the bridge ingests `template` / `system` / `params`, not
+  `Modelfile`, so users skimming the top of the page won't form the
+  wrong mental model of which file gets used when.
+- `CITATION.cff` for citation metadata (Apache-2.0, references the
+  upstream Qwen3.6-35B-A3B base and the dense Janus-27B sibling).
+  The 27B sibling has had this file since 0.5.0; adding here for
+  parity so academic-style citations work across both repos.
+- `LICENSE` file containing the full Apache 2.0 text. The model card
+  front-matter has always declared `license: apache-2.0` and the
+  upstream Qwen 3.6 license inherits Apache-2.0, but until now the
+  repo lacked the actual license text file. Same Apache 2.0 text
+  shipped in the 27B sibling.
+- `scripts/check_bridge_sync.py` — regression guard for the
+  `Modelfile` <-> `template` / `system` / `params` sync invariant.
+  The two configurations are consumed by different code paths
+  (`ollama create -f Modelfile` for local builds vs HF's Ollama
+  bridge for `hf.co/...` pulls — HF does not read `Modelfile`), so
+  drift between them re-introduces the bug fixed in commit 70ccef1
+  where `hf.co/FoolDev/janus` shipped a broken auto-generated
+  template while local builds had the correct one. Script parses
+  the Modelfile's `TEMPLATE` / `SYSTEM` / `PARAMETER` directives,
+  loads the three bridge files, and fails on any mismatch with a
+  per-key diff. Run on demand before pushing edits to either side
+  of the configuration. The 27B sibling wires an equivalent script
+  into a pre-commit hook (commit 5c67b08); this repo stays leaner
+  and runs it manually.
+### Fixed
+- README "Chat template" intro previously claimed all loaders handle
+  the embedded jinja automatically. True for llama.cpp / LM Studio /
+  llama-cpp-python; not true for Ollama, which needs an explicit
+  override (the `Modelfile` TEMPLATE block locally, the root-level
+  `template` file when serving via `hf.co/...`).
+- README "Tool / function calling" earlier said the XML form
+  `<function=name><parameter=arg>` is "not what this model produces".
+  That was wrong: the embedded GGUF jinja prompts exactly that form,
+  and llama.cpp / LM Studio / llama-cpp-python users will see it.
+  The "JSON-in-XML" claim only applies on the Ollama path because
+  that's what the Modelfile TEMPLATE prompt instructs.
+## [0.1.0] — initial public release
+### Added
+- Model card with architecture overview, sampling defaults, hardware
+  table, and `Modelfile` for `ollama create janus -f Modelfile`.
+- Bundled `Janus-35B-A3B.Q4_K_M.gguf` (~19 GB) via Git LFS so the HF
+  "Use this model" widget surfaces a working `ollama run` snippet.
+- Tokyo Night themed banner (PNG sourced from the SVG).
+- Status badges for license, base model, architecture, quant.
+- Linked sibling `FoolDev/janus-27b` (dense Qwen 3.6 27B base) under
+  Related models.

CITATION.cff ADDED Viewed

	@@ -0,0 +1,39 @@

+cff-version: 1.2.0
+title: "Janus-35B: A Mixture-of-Experts Distillation Wrapper for Qwen 3.6 35B-A3B"
+message: "If you use this model card or its accompanying files, please cite as below."
+type: software
+authors:
+  - name: FoolDev
+    website: "https://huggingface.co/FoolDev"
+repository-code: "https://huggingface.co/FoolDev/janus"
+url: "https://huggingface.co/FoolDev/janus"
+abstract: >-
+  Janus-35B is a personal repackaging of the Qwen 3.6 35B-A3B
+  mixture-of-experts base model (35B total / 3B active per token,
+  256 experts, 8 activated) with Claude Opus 4.7 in the reasoning
+  teacher slot. The repository ships an Ollama Modelfile, the HF
+  Ollama-bridge files (template / system / params), sampling defaults,
+  and a bundled Q4_K_M GGUF (~19 GB) so the HF "Use this model" widget
+  surfaces a one-liner Ollama snippet. Other quants and the upstream
+  safetensors (Qwen/Qwen3.6-35B-A3B) are pulled from upstream on demand
+  rather than redistributed.
+keywords:
+  - qwen
+  - qwen3.6
+  - mixture-of-experts
+  - moe
+  - distillation
+  - reasoning
+  - llm
+license: Apache-2.0
+references:
+  - type: software
+    title: "Qwen3.6-35B-A3B"
+    authors:
+      - name: Alibaba Qwen Team
+    url: "https://huggingface.co/Qwen/Qwen3.6-35B-A3B"
+  - type: software
+    title: "Janus-27B (dense sibling)"
+    authors:
+      - name: FoolDev
+    url: "https://huggingface.co/FoolDev/janus-27b"

Janus-35B-A3B.Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a076aa0d3a1aab0bbfa24eb6a5163f6c8eebf6fc156f81c5820ae65dc4d19fc7
+size 18939312896

LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for describing the origin of the Work and
+      reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may accept and charge a
+      fee for acceptance of support, warranty, indemnity, or other liability
+      obligations and/or rights consistent with this License. However, in
+      accepting such obligations, You may act only on Your own behalf and
+      on Your sole responsibility, not on behalf of any other Contributor,
+      and only if You agree to indemnify, defend, and hold each Contributor
+      harmless for any liability incurred by, or claims asserted against,
+      such Contributor by reason of your accepting any such warranty or
+      additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright 2025 FoolDev
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

Modelfile ADDED Viewed

	@@ -0,0 +1,121 @@

+FROM ./Janus-35B-A3B.Q4_K_M.gguf
+# Chat template — Qwen 3.6 ChatML in Ollama Go-template form, with the
+# tool-calling blocks Ollama's capability detector looks for. Without a
+# TEMPLATE that references .Tools and .ToolCalls, /api/chat and
+# /v1/chat/completions reject any request carrying a `tools` array with
+# `<model> does not support tools`. Same template as the 27B dense sibling
+# (FoolDev/janus-27b) — both share the Qwen 3.6 chat format.
+TEMPLATE """{{- $lastUserIdx := -1 -}}
+{{- range $idx, $msg := .Messages -}}
+{{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
+{{- end }}
+{{- if or .System .Tools }}<|im_start|>system
+{{ if .System }}{{ .System }}
+{{ end }}
+{{- if .Tools }}# Tools
+You may call one or more functions to assist with the user query.
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{{- range .Tools }}
+{"type": "function", "function": {{ .Function }}}
+{{- end }}
+</tools>
+For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
+<tool_call>
+{"name": <function-name>, "arguments": <args-json-object>}
+</tool_call>
+{{- end -}}<|im_end|>
+{{ end }}
+{{- range $i, $_ := .Messages }}
+{{- $last := eq (len (slice $.Messages $i)) 1 -}}
+{{- if eq .Role "user" }}<|im_start|>user
+{{ .Content }}<|im_end|>
+{{ else if eq .Role "assistant" }}<|im_start|>assistant
+{{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
+<think>{{ .Thinking }}</think>
+{{ end -}}
+{{ if .Content }}{{ .Content }}{{ end }}
+{{- if .ToolCalls }}
+{{- range .ToolCalls }}
+<tool_call>
+{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
+</tool_call>
+{{- end }}
+{{- end }}{{ if not $last }}<|im_end|>
+{{ end }}
+{{- else if eq .Role "tool" }}<|im_start|>user
+<tool_response>
+{{ .Content }}
+</tool_response><|im_end|>
+{{ end }}
+{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
+<think>
+{{ end }}
+{{- end }}"""
+# Sampling tuned for reasoning + general use. See README "Recommended sampling"
+# for creative/RP alternatives.
+PARAMETER temperature 0.6
+PARAMETER top_p 0.95
+PARAMETER top_k 20
+PARAMETER repeat_penalty 1.05
+PARAMETER num_ctx 16384
+# Stop tokens. Without these, Ollama only honors <|im_end|> from the GGUF
+# metadata; the model occasionally emits <|endoftext|> instead and Ollama
+# keeps generating past it (synthesising a fake new user turn). Listing
+# both — plus <|im_start|> as a belt-and-braces guard against the same
+# loop — keeps responses cleanly terminated. Same fix the 27B sibling
+# (FoolDev/janus-27b) shipped in commit 6672746.
+PARAMETER stop "<|im_end|>"
+PARAMETER stop "<|endoftext|>"
+PARAMETER stop "<|im_start|>"
+SYSTEM """You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
+Behavior rules:
+- Answer the user's actual request directly.
+- Be accurate, complete, and structured.
+- Think before answering, but do not get stuck in repetitive loops or meta-commentary.
+- If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
+- If the user wants creative writing, preserve tone, continuity, and character consistency.
+- If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
+- Finish with a usable answer, not just planning."""
+# Hardware notes
+# --------------
+# This Q4_K_M is ~19 GB on disk. Real footprint at runtime:
+#   weights mmap          ~19 GB
+#   compute graph alloc   ~19 GB  (Ollama log: device.go:272 "total memory")
+#   KV cache @ 16K ctx     ~1 GB  (with OLLAMA_KV_CACHE_TYPE=q8_0)
+#   total minimum          ~38 GB
+#
+# Working configurations (verified or documented):
+#   ✓ Single H100 80GB / A100 80GB              — full GPU offload
+#   ✓ RTX 5090 32GB / RTX 4090 24GB             — partial offload, ~15-25 tok/s
+#   ✓ Mac Studio M2/M3 Ultra 64GB+              — unified memory, ~20+ tok/s
+#   ✓ Linux box with 64GB+ RAM (CPU-only)       — ~3-6 tok/s
+#   ⚠ ASUS ROG Flow Z13 (Ryzen AI Max+, 32GB)   — OOMs at default num_ctx 16384;
+#                                                 fits with num_ctx ≤ 4096 and
+#                                                 num_batch ≤ 256 (verified)
+#
+# Measured data point (ASUS ROG Flow Z13 GZ302EA-RU004W, Ryzen AI Max+ 395 +
+# Radeon 8060S iGPU, 32 GB unified, ROCm gfx1151, OLLAMA_FLASH_ATTENTION=1,
+# OLLAMA_KV_CACHE_TYPE=q8_0, num_ctx 4096, num_batch 256):
+#   Q4_K_M, 3-prompt mix → 28.71 tok/s aggregate
+#     (717 tokens / 25.0 s; 29.55 / 29.24 / 28.57 short/medium/long).
+#   ~97% of layers offload to the iGPU via ROCm. Compute split per
+#   `ollama ps` shows 3% CPU / 97% GPU at 4096 ctx.
+#
+# To run on a 32 GB unified-memory laptop, override these in your local
+# Modelfile copy (or pass via -o on `ollama run`):
+#   PARAMETER num_ctx 4096
+#   PARAMETER num_batch 256
+#
+# If you have ≥48 GB RAM but want partial GPU offload, set:
+#   PARAMETER num_gpu 24    # offload most layers (model has 40)

README.md ADDED Viewed

	@@ -0,0 +1,335 @@

+---
+license: apache-2.0
+base_model:
+  - Qwen/Qwen3.6-35B-A3B
+datasets:
+  - crownelius/Creative_Writing_ShareGPT_Enhanced
+  - microsoft/rStar-Coder
+  - peteromallet/dataclaw-peteromallet
+  - crownelius/Opus-4.7-Reasoning
+  - openbmb/UltraData-Math
+  - Crownelius/Crow-Heretic-TeichAI-Unified
+language:
+  - en
+  - zh
+  - ru
+  - es
+  - fr
+  - it
+  - ja
+  - ko
+  - de
+  - ar
+  - tr
+  - pl
+  - sv
+  - nl
+  - he
+  - id
+  - uk
+  - fa
+  - pt
+  - ms
+  - fi
+  - el
+tags:
+  - qwen3_6
+  - moe
+  - conversational
+  - multimodal
+  - agent
+  - gguf
+library_name: transformers
+pipeline_tag: image-text-to-text
+---
+<img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/banner.svg" alt="Janus-35B banner" width="100%" />
+[![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
+[![Base Model](https://img.shields.io/badge/Base-Qwen3.6--35B--A3B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
+[![Architecture](https://img.shields.io/badge/Arch-MoE_35B/3B_active-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
+[![Quant](https://img.shields.io/badge/GGUF-Q4__K__M-9ece6a?style=flat&labelColor=1a1b26)](#whats-here)
+[![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)
+# Janus-35B
+> **Flagship Reasoning. Sparse Footprint.**
+> *Qwen 3.6 35B-A3B repackaged with Claude Opus 4.7 in the teacher slot.*
+**`Architecture:`** `Qwen 3.6 35B-A3B (MoE)` | **`Total Params:`** `35B` | **`Active Params:`** `3B` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled MoE LLM`
+A personal fork of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) — a 35B-total / 3B-active mixture-of-experts multimodal model — repackaged as Janus-35B with Claude Opus 4.7 reasoning data in the teacher slot.
+## TL;DR
+One-liner via Hugging Face (pulls a GGUF + this repo's root-level
+`template` / `system` / `params` files, including the tool-calling
+template — HF's Ollama bridge ingests those three files, not
+`Modelfile`):
+```bash
+ollama run hf.co/FoolDev/Janus-35B               # default ~19 GB Q4_K_M
+ollama run hf.co/FoolDev/Janus-35B:Q4_K_M        # same blob, explicit tag
+```
+Or build locally (uses this repo's `Modelfile`, kept in sync with the
+three bridge files):
+```bash
+git clone https://huggingface.co/FoolDev/Janus-35B && cd Janus-35B
+ollama create janus -f Modelfile && ollama run janus
+```
+After either path, `ollama show janus` lists `completion`, `tools`,
+and `thinking` under Capabilities. Hardware: ~38 GB RAM at default
+`num_ctx 16384`, or trim ctx + batch to fit 32 GB hosts (see
+[Hardware requirements](#hardware-requirements)).
+## What's here
+| File | Use |
+|---|---|
+| `Janus-35B-A3B.Q4_K_M.gguf` | Recommended default, ~19 GB |
+| `Modelfile` | Ollama wrapper for **local** builds (`ollama create janus -f Modelfile`) — overrides the GGUF's embedded template with one that exposes `.Tools` / `.ToolCalls` to Ollama's capability detector. |
+| `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Janus-35B` directly. The bridge does **not** read `Modelfile` (see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)); it ingests these three root-level files instead. Kept in sync with the `Modelfile`'s `TEMPLATE` / `SYSTEM` / `PARAMETER` directives. |
+| `scripts/check_bridge_sync.py` | Run before pushing a `Modelfile` / `template` / `system` / `params` edit to verify the four configurations remain in sync. Exits 0 if in sync, 1 with a per-key diff if not. |
+GGUF-only release. Pull the upstream safetensors from `Qwen/Qwen3.6-35B-A3B` if you need the `transformers` tree.
+## Architecture
+<p align="left">
+  <img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/moe-routing.svg" alt="animated MoE routing visualization: 16x16 grid of 256 expert dots with 8 lit at any time, cycling through 8 routing patterns" width="640" />
+</p>
+- Qwen 3.6, 35B total / 3B active, MoE (256 experts, 8 activated per token)
+- 40 layers, 10 × (3 × DeltaNet → MoE / 1 × Gated Attention → MoE)
+- 262k native context, extensible to ~1M with YaRN
+- Vision + video supported by upstream (mmproj not included in this release)
+- Vocab 248,320
+## Quick start
+### llama.cpp / LM Studio
+Drop the GGUF into your loader of choice. The chat template is embedded in the GGUF metadata, so llama.cpp's `--chat-template auto` and LM Studio's GGUF auto-detection handle plain conversation correctly.
+### Ollama
+The chat template baked into the GGUF is **not sufficient on Ollama** — it lacks the `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires, so a naive `ollama pull` reports `does not support tools` and rejects any request carrying a `tools` array. Two paths fix this:
+```bash
+# A. Pull straight from HF (uses the root-level template/system/params files):
+ollama run hf.co/FoolDev/Janus-35B               # default tag, ~19 GB Q4_K_M
+ollama run hf.co/FoolDev/Janus-35B:Q4_K_M        # same blob, explicit tag
+# Note: HF's Ollama bridge does NOT read Modelfile; it reads template/system/params.
+# B. Build locally (uses Modelfile, which is kept in sync with the three above):
+ollama create janus -f Modelfile && ollama run janus
+```
+After either path, `ollama show janus` should list `completion`, `tools`, and `thinking` under Capabilities.
+### Inference examples
+Once the model is loaded (via `ollama run janus`, `lms server`, or `llama-server`), all the standard OpenAI-compatible clients work. Examples assume the loader is listening on `http://localhost:11434` (Ollama default) — adjust the port for LM Studio (`:1234`) or llama.cpp (`:8080`).
+#### curl
+```bash
+curl -s http://localhost:11434/v1/chat/completions \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "janus",
+    "messages": [
+      {"role": "system", "content": "You are Janus, a precise reasoning assistant."},
+      {"role": "user", "content": "Sketch an algorithm to detect cycles in a directed graph."}
+    ],
+    "temperature": 0.6,
+    "max_tokens": 800
+  }' | jq -r '.choices[0].message.content'
+```
+#### Python (openai-compat)
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored")
+resp = client.chat.completions.create(
+    model="janus",
+    messages=[
+        {"role": "user", "content": "Write a haiku about a stack overflow."}
+    ],
+    temperature=0.8,
+    top_p=0.95,
+)
+print(resp.choices[0].message.content)
+```
+#### Streaming
+```python
+stream = client.chat.completions.create(
+    model="janus",
+    messages=[{"role": "user", "content": "Explain RoPE briefly."}],
+    stream=True,
+)
+for chunk in stream:
+    delta = chunk.choices[0].delta.content or ""
+    print(delta, end="", flush=True)
+```
+### Recommended sampling
+| Use | temp | top_p | top_k | repeat_penalty |
+|---|---:|---:|---:|---:|
+| Reasoning / general | 0.6 | 0.95 | 20 | 1.05 |
+| Creative / RP | 0.8 | 0.95 | 40 | 1.02 |
+Lower temperature (0.4–0.6) and bump `repeat_penalty` to 1.08 if it loops inside `<think>` tags.
+### System prompt
+```text
+You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
+Behavior rules:
+- Answer the user's actual request directly.
+- Be accurate, complete, and structured.
+- Think before answering, but do not get stuck in repetitive loops or meta-commentary.
+- If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
+- If the user wants creative writing, preserve tone, continuity, and character consistency.
+- If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
+- Finish with a usable answer, not just planning.
+```
+## Hardware requirements
+This is an 18.9 GB Q4_K_M GGUF. Ollama's runtime footprint at default settings is **roughly 2× the model file** (weights mmap + compute graph allocation), plus KV cache — so ~38 GB total memory at `num_ctx 16384`. The compute-graph allocation scales with context and batch size, so 32 GB hosts can fit the model by trimming both (see Z13 row in the table).
+| Hardware | Status |
+|---|---|
+| ≥48 GB RAM (CPU-only) | Works, ~3-6 tok/s |
+| Single H100/A100 80 GB | Works, full offload, ~30+ tok/s |
+| RTX 4090 24 GB / 5090 32 GB + 32 GB RAM | Works, partial offload, ~15-25 tok/s |
+| Mac Studio M2/M3 Ultra 64 GB+ unified | Works, ~20+ tok/s |
+| 32 GB unified-memory laptops (Ryzen AI Max+, Apple M-series) | Works with `num_ctx ≤ 4096` and `num_batch ≤ 256` to fit the compute graph; default 16K ctx OOMs. Measured 28.71 tok/s on ASUS ROG Flow Z13 GZ302EA at Q4_K_M (Radeon 8060S iGPU via ROCm gfx1151). |
+## Chat template
+The model uses the standard Qwen 3.x ChatML format with `<|im_start|>` / `<|im_end|>` role markers. The template is embedded in the GGUF metadata for plain conversation use, but Ollama users should rely on the `TEMPLATE` block in the included `Modelfile` — that version exposes the tool-calling scaffolding Ollama's capability detector requires (the embedded template alone is insufficient; see [Ollama](#ollama) above).
+### Plain conversation
+```text
+<|im_start|>system
+You are Janus, a precise and capable assistant…<|im_end|>
+<|im_start|>user
+What is the time complexity of mergesort?<|im_end|>
+<|im_start|>assistant
+```
+### With reasoning trace
+When the model decides to think, the assistant turn contains a `<think>…</think>` block followed by the visible answer:
+```text
+<|im_start|>assistant
+<think>
+The user is asking about mergesort. Mergesort divides the array, recursively sorts each half, then merges. The recurrence T(n) = 2T(n/2) + O(n) solves to O(n log n).
+</think>
+Mergesort runs in **O(n log n)** time in the worst, average, and best cases. The recurrence is T(n) = 2T(n/2) + O(n), which solves to Θ(n log n) by the master theorem.<|im_end|>
+```
+Most clients (Open WebUI, LibreChat, etc.) hide the `<think>` block by default and show only the final answer. If your client doesn't, set its "show reasoning" toggle off.
+### Tool / function calling
+The wire format depends on which path you take. **Both are valid** — the model adapts to whichever format the system prompt specifies.
+**Ollama path** (this repo's `Modelfile`). The TEMPLATE advertises tools inside `<tools>…</tools>` and asks the model to reply in JSON-in-XML — the form Ollama's tool-call extractor parses into a structured `tool_calls` array on `/api/chat` and `/v1/chat/completions`:
+```text
+<tool_call>
+{"name": "get_weather", "arguments": {"city": "Tokyo"}}
+</tool_call>
+```
+**Embedded-jinja path** (llama.cpp, llama-cpp-python, LM Studio). The Qwen 3.6 native chat template baked into the GGUF instructs the model to emit a more verbose XML form. This is the shape you'll see if you talk to `llama-server` or LM Studio directly:
+```text
+<tool_call>
+<function=get_weather>
+<parameter=city>
+Tokyo
+</parameter>
+</function>
+</tool_call>
+```
+Pick the parser shape that matches your loader. Don't mix.
+#### Example (Ollama, OpenAI-compatible API)
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored")
+resp = client.chat.completions.create(
+    model="janus",
+    messages=[
+        {"role": "user", "content": "Call get_weather for Tokyo. Respond ONLY with the tool call."}
+    ],
+    tools=[{
+        "type": "function",
+        "function": {
+            "name": "get_weather",
+            "description": "Get current weather for a city",
+            "parameters": {
+                "type": "object",
+                "properties": {"city": {"type": "string"}},
+                "required": ["city"],
+            },
+        },
+    }],
+    temperature=0.3,
+)
+print(resp.choices[0].message.tool_calls)
+# [ToolCall(id='call_xxx', type='function',
+#           function=Function(name='get_weather', arguments='{"city":"Tokyo"}'))]
+```
+#### Tips
+- Use direct prompts ("Call X for Y") rather than soft hints ("Use the tool"). The model thinks before committing to a call, and weak prompts can exhaust `num_predict` inside the `<think>` block before the call is emitted.
+- Allow at least `num_predict: 1024` (or `max_tokens: 1024`) for tool-calling turns, more if the schemas are large.
+- The Modelfile's JSON-in-XML format is what Ollama's tool-call extractor understands; if you swap loaders, swap the parser to match (see "Embedded-jinja path" above).
+## Known limitations
+- **No mmproj in this release.** The base Qwen3.6 supports image and video input via a separate `mmproj` file, which is not included here. Text-only inference works out of the box; multimodal inference requires fetching `Qwen2.5-VL-*-mmproj-*.gguf` (or equivalent) from upstream.
+- **Quantization-induced quality loss.** Q4_K_M is a strong general-purpose quant but does measurably degrade math and code accuracy compared to BF16. If you need maximum quality, run the upstream safetensors on a GPU that fits BF16 (~70 GB).
+- **MoE expert utilization is uneven.** Stock Qwen3.6-35B-A3B routes 8 of 256 experts per token. On narrow domains (e.g. only one programming language) a small subset of experts dominates; load-balance loss was a training-time concern, not a runtime guarantee.
+- **Thinking traces can loop.** Like most reasoning-distilled models, Janus-35B occasionally gets stuck repeating itself inside `<think>` tags. Mitigations: lower temperature to 0.4-0.6, raise `repeat_penalty` to 1.08, or set a `<think>`-token budget cap if your loader supports it.
+- **Not aligned with any specific safety policy.** This is a personal repackage of an open-weight base model with reasoning-focused distillation. There is no RLHF refusal layer beyond what Qwen 3.6 ships with; downstream safety is the operator's responsibility.
+- **No formal evaluation in this card.** Numbers in the hardware table are estimates, not measured. If you produce real benchmarks (MMLU, HumanEval, etc.) and want them included, file a PR.
+## Related models
+| Model | Size | Notes |
+|---|---|---|
+| [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | 35B / 3B active | Upstream base model. `transformers`-native multimodal weights. |
+| [FoolDev/Thanatos-27B](https://huggingface.co/FoolDev/Thanatos-27B) | 27B dense | Dense sibling on the Qwen 3.6 27B base. Same teacher (Opus 4.7), same dataset family, smaller memory footprint, no MoE quirks. |
+| [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B dense | Heretic-flavored fine-tune of the same Qwen 3.5 9B base used as a smaller starting point. Useful as a fast first-pass model when 35B is too heavy for the host. |
+## Credits
+- Base model: [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) (Alibaba)
+- Reasoning teacher: Claude Opus 4.7 (Anthropic)
+- Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius)
+License inherited from upstream: Apache-2.0.

banner.png ADDED Viewed

banner.svg ADDED Viewed

moe-routing.svg ADDED Viewed

params ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "temperature": 0.6,
+  "top_p": 0.95,
+  "top_k": 20,
+  "repeat_penalty": 1.05,
+  "num_ctx": 16384,
+  "stop": [
+    "<|im_end|>",
+    "<|endoftext|>",
+    "<|im_start|>"
+  ]
+}

scripts/check_bridge_sync.py ADDED Viewed

	@@ -0,0 +1,147 @@

+#!/usr/bin/env python3
+"""
+Janus-35B — verify Modelfile and HF Ollama bridge files stay in sync.
+The repo ships two parallel Ollama configurations:
+  - ``Modelfile`` is consumed by the local-build path
+    (``ollama create janus -f Modelfile``). It contains
+    ``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
+  - ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
+    Ollama bridge when users ``ollama run hf.co/FoolDev/janus`` directly. HF
+    does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
+If the two configurations drift apart, ``hf.co/...`` users and local-build
+users get different behaviour — exactly the bug fixed in commit 70ccef1
+("Add HF Ollama bridge files (template/system/params)"). This script is
+the regression guard: it parses the Modelfile, loads the three bridge
+files, and fails on any mismatch.
+Usage:
+    python3 scripts/check_bridge_sync.py
+    # exit 0 if in sync, 1 (with diff details) if not.
+Run this manually before pushing a Modelfile / bridge-file edit. The 27B
+sibling repo wires an equivalent script into scripts/check.sh and a
+pre-commit hook; this repo intentionally stays leaner and runs it
+on demand.
+"""
+from __future__ import annotations
+import json
+import re
+import sys
+from pathlib import Path
+ROOT = Path(__file__).resolve().parent.parent
+# Ollama Modelfile reference: https://github.com/ollama/ollama/blob/main/docs/modelfile.md
+TEMPLATE_RE = re.compile(r'^TEMPLATE\s+"""(.*?)"""', re.DOTALL | re.MULTILINE)
+SYSTEM_RE = re.compile(r'^SYSTEM\s+"""(.*?)"""', re.DOTALL | re.MULTILINE)
+PARAMETER_RE = re.compile(r'^PARAMETER\s+(\S+)\s+(.*?)\s*$', re.MULTILINE)
+def parse_modelfile(text: str) -> tuple[str, str, dict[str, object]]:
+    """Extract TEMPLATE, SYSTEM, and PARAMETER blocks from a Modelfile."""
+    tpl_match = TEMPLATE_RE.search(text)
+    if not tpl_match:
+        die("Modelfile has no TEMPLATE block")
+    template = tpl_match.group(1)
+    sys_match = SYSTEM_RE.search(text)
+    if not sys_match:
+        die("Modelfile has no SYSTEM block")
+    system = sys_match.group(1)
+    params: dict[str, object] = {}
+    stops: list[str] = []
+    for key, raw in PARAMETER_RE.findall(text):
+        # Strip outer quotes if present.
+        value: object = raw.strip()
+        if isinstance(value, str) and len(value) >= 2 and value[0] == value[-1] == '"':
+            value = value[1:-1]
+        # Stop tokens accumulate; everything else is scalar.
+        if key == "stop":
+            stops.append(value)  # type: ignore[arg-type]
+            continue
+        # Cast known numeric params.
+        if key in {"temperature", "top_p", "top_k", "repeat_penalty",
+                   "num_ctx", "num_predict", "num_gpu", "num_batch", "seed"}:
+            try:
+                value = float(value) if "." in str(value) else int(value)  # type: ignore[arg-type]
+            except (TypeError, ValueError):
+                pass
+        params[key] = value
+    if stops:
+        params["stop"] = stops
+    return template, system, params
+def die(msg: str) -> None:
+    print(f"[FAIL] {msg}", file=sys.stderr)
+    sys.exit(1)
+def diff_strings(label: str, expected: str, actual: str) -> bool:
+    if expected == actual:
+        return True
+    print(f"[FAIL] {label} drift detected", file=sys.stderr)
+    print(f"  Modelfile len={len(expected)}  bridge file len={len(actual)}", file=sys.stderr)
+    # Show the first diverging line for quick orientation.
+    e_lines = expected.splitlines()
+    a_lines = actual.splitlines()
+    for i, (e, a) in enumerate(zip(e_lines, a_lines)):
+        if e != a:
+            print(f"  first diff at line {i + 1}:", file=sys.stderr)
+            print(f"    modelfile : {e!r}", file=sys.stderr)
+            print(f"    bridge    : {a!r}", file=sys.stderr)
+            return False
+    if len(e_lines) != len(a_lines):
+        print(f"  line count differs: modelfile={len(e_lines)} bridge={len(a_lines)}",
+              file=sys.stderr)
+    return False
+def main() -> int:
+    modelfile = (ROOT / "Modelfile").read_text()
+    bridge_template = (ROOT / "template").read_text()
+    bridge_system = (ROOT / "system").read_text()
+    bridge_params = json.loads((ROOT / "params").read_text())
+    mf_template, mf_system, mf_params = parse_modelfile(modelfile)
+    ok = True
+    # 1. TEMPLATE: byte-for-byte.
+    ok &= diff_strings("TEMPLATE", mf_template, bridge_template)
+    # 2. SYSTEM: trim trailing whitespace on both ends. The bridge file
+    #    typically has a trailing newline; the Modelfile block doesn't.
+    ok &= diff_strings("SYSTEM", mf_system.strip(), bridge_system.strip())
+    # 3. PARAMETER vs params JSON: compare normalized dicts.
+    if mf_params != bridge_params:
+        print("[FAIL] params drift detected", file=sys.stderr)
+        for k in sorted(set(mf_params) | set(bridge_params)):
+            mv = mf_params.get(k, "<missing>")
+            bv = bridge_params.get(k, "<missing>")
+            if mv != bv:
+                print(f"  {k}: modelfile={mv!r}  bridge={bv!r}", file=sys.stderr)
+        ok = False
+    if not ok:
+        print("\n[!] Modelfile and bridge files are out of sync.", file=sys.stderr)
+        print("    Edit them together: any change to TEMPLATE / SYSTEM /",
+              file=sys.stderr)
+        print("    PARAMETER must be reflected in template / system / params.",
+              file=sys.stderr)
+        return 1
+    print("[ ok ] Modelfile <-> bridge files in sync")
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

system ADDED Viewed

	@@ -0,0 +1,10 @@

+You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
+Behavior rules:
+- Answer the user's actual request directly.
+- Be accurate, complete, and structured.
+- Think before answering, but do not get stuck in repetitive loops or meta-commentary.
+- If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
+- If the user wants creative writing, preserve tone, continuity, and character consistency.
+- If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
+- Finish with a usable answer, not just planning.

template ADDED Viewed

	@@ -0,0 +1,51 @@

+{{- $lastUserIdx := -1 -}}
+{{- range $idx, $msg := .Messages -}}
+{{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
+{{- end }}
+{{- if or .System .Tools }}<|im_start|>system
+{{ if .System }}{{ .System }}
+{{ end }}
+{{- if .Tools }}# Tools
+You may call one or more functions to assist with the user query.
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{{- range .Tools }}
+{"type": "function", "function": {{ .Function }}}
+{{- end }}
+</tools>
+For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
+<tool_call>
+{"name": <function-name>, "arguments": <args-json-object>}
+</tool_call>
+{{- end -}}<|im_end|>
+{{ end }}
+{{- range $i, $_ := .Messages }}
+{{- $last := eq (len (slice $.Messages $i)) 1 -}}
+{{- if eq .Role "user" }}<|im_start|>user
+{{ .Content }}<|im_end|>
+{{ else if eq .Role "assistant" }}<|im_start|>assistant
+{{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
+<think>{{ .Thinking }}</think>
+{{ end -}}
+{{ if .Content }}{{ .Content }}{{ end }}
+{{- if .ToolCalls }}
+{{- range .ToolCalls }}
+<tool_call>
+{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
+</tool_call>
+{{- end }}
+{{- end }}{{ if not $last }}<|im_end|>
+{{ end }}
+{{- else if eq .Role "tool" }}<|im_start|>user
+<tool_response>
+{{ .Content }}
+</tool_response><|im_end|>
+{{ end }}
+{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
+<think>
+{{ end }}
+{{- end }}