Instructions to use FoolDev/Janus-35B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FoolDev/Janus-35B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="FoolDev/Janus-35B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("FoolDev/Janus-35B", dtype="auto") - llama-cpp-python
How to use FoolDev/Janus-35B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="FoolDev/Janus-35B", filename="Janus-35B-A3B.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use FoolDev/Janus-35B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Janus-35B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Janus-35B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Janus-35B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Janus-35B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf FoolDev/Janus-35B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf FoolDev/Janus-35B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf FoolDev/Janus-35B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf FoolDev/Janus-35B:Q4_K_M
Use Docker
docker model run hf.co/FoolDev/Janus-35B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use FoolDev/Janus-35B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FoolDev/Janus-35B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Janus-35B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/FoolDev/Janus-35B:Q4_K_M
- SGLang
How to use FoolDev/Janus-35B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FoolDev/Janus-35B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Janus-35B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FoolDev/Janus-35B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Janus-35B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use FoolDev/Janus-35B with Ollama:
ollama run hf.co/FoolDev/Janus-35B:Q4_K_M
- Unsloth Studio new
How to use FoolDev/Janus-35B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Janus-35B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Janus-35B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for FoolDev/Janus-35B to start chatting
- Pi new
How to use FoolDev/Janus-35B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Janus-35B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "FoolDev/Janus-35B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use FoolDev/Janus-35B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Janus-35B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default FoolDev/Janus-35B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use FoolDev/Janus-35B with Docker Model Runner:
docker model run hf.co/FoolDev/Janus-35B:Q4_K_M
- Lemonade
How to use FoolDev/Janus-35B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull FoolDev/Janus-35B:Q4_K_M
Run and chat with the model
lemonade run user.Janus-35B-Q4_K_M
List all available models
lemonade list
Commit ·
64b629a
0
Parent(s):
Duplicate from FoolDev/Janus-35B
Browse files- .gitattributes +36 -0
- CHANGELOG.md +111 -0
- CITATION.cff +39 -0
- Janus-35B-A3B.Q4_K_M.gguf +3 -0
- LICENSE +201 -0
- Modelfile +121 -0
- README.md +335 -0
- banner.png +0 -0
- banner.svg +97 -0
- moe-routing.svg +670 -0
- params +12 -0
- scripts/check_bridge_sync.py +147 -0
- system +10 -0
- template +51 -0
.gitattributes
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
*.gguf filter=lfs diff=lfs merge=lfs -text
|
CHANGELOG.md
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Changelog
|
| 2 |
+
|
| 3 |
+
All notable changes to this repository. Format loosely follows
|
| 4 |
+
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/). This repo holds
|
| 5 |
+
a model card, an Ollama Modelfile, the HF Ollama-bridge `template` /
|
| 6 |
+
`system` / `params` files, and the bundled Q4_K_M GGUF, so versions
|
| 7 |
+
track the **tooling and documentation**, not the underlying base model.
|
| 8 |
+
|
| 9 |
+
## [Unreleased]
|
| 10 |
+
|
| 11 |
+
### Added
|
| 12 |
+
- Root-level `template`, `system`, and `params` files for HF's Ollama
|
| 13 |
+
bridge. The bridge generates Ollama manifests at request time from
|
| 14 |
+
these three files (NOT from `Modelfile` — confirmed against
|
| 15 |
+
https://huggingface.co/docs/hub/en/ollama). Without them, `ollama
|
| 16 |
+
run hf.co/FoolDev/janus` got an auto-generated manifest with the
|
| 17 |
+
broken `{{ if .Prompt }} .Prompt }}<|im_end|>` template (Ollama's
|
| 18 |
+
faulty Go-template conversion of the GGUF's embedded jinja),
|
| 19 |
+
corrupted stop tokens (`".Prompt }}<|im_end|>"` bleed), and no
|
| 20 |
+
`.Tools` / `.ToolCalls` blocks — so the published Ollama tag
|
| 21 |
+
advertised `completion` only and rejected any request with a
|
| 22 |
+
`tools` array. The three files mirror the `Modelfile`'s `TEMPLATE`
|
| 23 |
+
/ `SYSTEM` / `PARAMETER` directives; both routes wire tool calling
|
| 24 |
+
correctly. Edit them together when changing one. Verified by
|
| 25 |
+
re-pulling the fresh tag: `ollama show hf.co/FoolDev/janus` now
|
| 26 |
+
reports `completion`, `tools`, `thinking` and tool calls round-trip
|
| 27 |
+
end-to-end through `/api/chat`.
|
| 28 |
+
|
| 29 |
+
### Changed
|
| 30 |
+
- README "Tool / function calling" section: split into explicit
|
| 31 |
+
Ollama-path and embedded-jinja-path subsections. Earlier wording
|
| 32 |
+
conflated the two on-the-wire formats. The Ollama path (Modelfile
|
| 33 |
+
`TEMPLATE` and the new `template` bridge file, both kept in sync)
|
| 34 |
+
prompts JSON-in-XML — the form Ollama's tool-call extractor parses
|
| 35 |
+
into a structured `tool_calls` array. The embedded-jinja path
|
| 36 |
+
(llama.cpp, llama-cpp-python, LM Studio) reads the Qwen 3.6 native
|
| 37 |
+
chat template baked into the GGUF, which prompts the verbose
|
| 38 |
+
`<function=name>` / `<parameter=arg>` form the model was trained
|
| 39 |
+
on. Both are valid; the model adapts to whichever shape the system
|
| 40 |
+
prompt prescribes. README now shows both formats side by side.
|
| 41 |
+
- README "Quick start / Ollama" section: documents both pull paths
|
| 42 |
+
(`hf.co/...` via bridge files vs `make ... -f Modelfile` locally)
|
| 43 |
+
and explicitly notes that HF's bridge does not read `Modelfile`.
|
| 44 |
+
- README "Hardware requirements" intro: re-framed the "~38 GB
|
| 45 |
+
minimum" claim as "~38 GB at default `num_ctx 16384`" and
|
| 46 |
+
documented that 32 GB hosts can fit the model by trimming context
|
| 47 |
+
and batch size.
|
| 48 |
+
- README "Quick start / Ollama" snippet: show both
|
| 49 |
+
`ollama run hf.co/FoolDev/janus` and the explicit-tag form
|
| 50 |
+
`ollama run hf.co/FoolDev/janus:Q4_K_M`. Same blob (the default
|
| 51 |
+
tag maps to Q4_K_M), but parity with the 27B sibling — which lists
|
| 52 |
+
both `:latest` and `:Q3_K_S` — and removes ambiguity for users
|
| 53 |
+
scripting against an explicit quant tag. Verified the explicit tag
|
| 54 |
+
resolves to the same manifest (model SHA `a076aa0d3a1a`, bridge
|
| 55 |
+
blobs `22c7ade72045` / `84a1a6ac580b` / `f7b1992cf9c1`).
|
| 56 |
+
|
| 57 |
+
### Added (cont'd)
|
| 58 |
+
- README `## TL;DR` section near the top of the model card, mirroring
|
| 59 |
+
the 27B sibling. Two paths (HF Ollama bridge / local Modelfile
|
| 60 |
+
build) with explicit tags and a one-line capability check. Notes
|
| 61 |
+
the bridge ingests `template` / `system` / `params`, not
|
| 62 |
+
`Modelfile`, so users skimming the top of the page won't form the
|
| 63 |
+
wrong mental model of which file gets used when.
|
| 64 |
+
- `CITATION.cff` for citation metadata (Apache-2.0, references the
|
| 65 |
+
upstream Qwen3.6-35B-A3B base and the dense Janus-27B sibling).
|
| 66 |
+
The 27B sibling has had this file since 0.5.0; adding here for
|
| 67 |
+
parity so academic-style citations work across both repos.
|
| 68 |
+
- `LICENSE` file containing the full Apache 2.0 text. The model card
|
| 69 |
+
front-matter has always declared `license: apache-2.0` and the
|
| 70 |
+
upstream Qwen 3.6 license inherits Apache-2.0, but until now the
|
| 71 |
+
repo lacked the actual license text file. Same Apache 2.0 text
|
| 72 |
+
shipped in the 27B sibling.
|
| 73 |
+
- `scripts/check_bridge_sync.py` — regression guard for the
|
| 74 |
+
`Modelfile` <-> `template` / `system` / `params` sync invariant.
|
| 75 |
+
The two configurations are consumed by different code paths
|
| 76 |
+
(`ollama create -f Modelfile` for local builds vs HF's Ollama
|
| 77 |
+
bridge for `hf.co/...` pulls — HF does not read `Modelfile`), so
|
| 78 |
+
drift between them re-introduces the bug fixed in commit 70ccef1
|
| 79 |
+
where `hf.co/FoolDev/janus` shipped a broken auto-generated
|
| 80 |
+
template while local builds had the correct one. Script parses
|
| 81 |
+
the Modelfile's `TEMPLATE` / `SYSTEM` / `PARAMETER` directives,
|
| 82 |
+
loads the three bridge files, and fails on any mismatch with a
|
| 83 |
+
per-key diff. Run on demand before pushing edits to either side
|
| 84 |
+
of the configuration. The 27B sibling wires an equivalent script
|
| 85 |
+
into a pre-commit hook (commit 5c67b08); this repo stays leaner
|
| 86 |
+
and runs it manually.
|
| 87 |
+
|
| 88 |
+
### Fixed
|
| 89 |
+
- README "Chat template" intro previously claimed all loaders handle
|
| 90 |
+
the embedded jinja automatically. True for llama.cpp / LM Studio /
|
| 91 |
+
llama-cpp-python; not true for Ollama, which needs an explicit
|
| 92 |
+
override (the `Modelfile` TEMPLATE block locally, the root-level
|
| 93 |
+
`template` file when serving via `hf.co/...`).
|
| 94 |
+
- README "Tool / function calling" earlier said the XML form
|
| 95 |
+
`<function=name><parameter=arg>` is "not what this model produces".
|
| 96 |
+
That was wrong: the embedded GGUF jinja prompts exactly that form,
|
| 97 |
+
and llama.cpp / LM Studio / llama-cpp-python users will see it.
|
| 98 |
+
The "JSON-in-XML" claim only applies on the Ollama path because
|
| 99 |
+
that's what the Modelfile TEMPLATE prompt instructs.
|
| 100 |
+
|
| 101 |
+
## [0.1.0] — initial public release
|
| 102 |
+
|
| 103 |
+
### Added
|
| 104 |
+
- Model card with architecture overview, sampling defaults, hardware
|
| 105 |
+
table, and `Modelfile` for `ollama create janus -f Modelfile`.
|
| 106 |
+
- Bundled `Janus-35B-A3B.Q4_K_M.gguf` (~19 GB) via Git LFS so the HF
|
| 107 |
+
"Use this model" widget surfaces a working `ollama run` snippet.
|
| 108 |
+
- Tokyo Night themed banner (PNG sourced from the SVG).
|
| 109 |
+
- Status badges for license, base model, architecture, quant.
|
| 110 |
+
- Linked sibling `FoolDev/janus-27b` (dense Qwen 3.6 27B base) under
|
| 111 |
+
Related models.
|
CITATION.cff
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
cff-version: 1.2.0
|
| 2 |
+
title: "Janus-35B: A Mixture-of-Experts Distillation Wrapper for Qwen 3.6 35B-A3B"
|
| 3 |
+
message: "If you use this model card or its accompanying files, please cite as below."
|
| 4 |
+
type: software
|
| 5 |
+
authors:
|
| 6 |
+
- name: FoolDev
|
| 7 |
+
website: "https://huggingface.co/FoolDev"
|
| 8 |
+
repository-code: "https://huggingface.co/FoolDev/janus"
|
| 9 |
+
url: "https://huggingface.co/FoolDev/janus"
|
| 10 |
+
abstract: >-
|
| 11 |
+
Janus-35B is a personal repackaging of the Qwen 3.6 35B-A3B
|
| 12 |
+
mixture-of-experts base model (35B total / 3B active per token,
|
| 13 |
+
256 experts, 8 activated) with Claude Opus 4.7 in the reasoning
|
| 14 |
+
teacher slot. The repository ships an Ollama Modelfile, the HF
|
| 15 |
+
Ollama-bridge files (template / system / params), sampling defaults,
|
| 16 |
+
and a bundled Q4_K_M GGUF (~19 GB) so the HF "Use this model" widget
|
| 17 |
+
surfaces a one-liner Ollama snippet. Other quants and the upstream
|
| 18 |
+
safetensors (Qwen/Qwen3.6-35B-A3B) are pulled from upstream on demand
|
| 19 |
+
rather than redistributed.
|
| 20 |
+
keywords:
|
| 21 |
+
- qwen
|
| 22 |
+
- qwen3.6
|
| 23 |
+
- mixture-of-experts
|
| 24 |
+
- moe
|
| 25 |
+
- distillation
|
| 26 |
+
- reasoning
|
| 27 |
+
- llm
|
| 28 |
+
license: Apache-2.0
|
| 29 |
+
references:
|
| 30 |
+
- type: software
|
| 31 |
+
title: "Qwen3.6-35B-A3B"
|
| 32 |
+
authors:
|
| 33 |
+
- name: Alibaba Qwen Team
|
| 34 |
+
url: "https://huggingface.co/Qwen/Qwen3.6-35B-A3B"
|
| 35 |
+
- type: software
|
| 36 |
+
title: "Janus-27B (dense sibling)"
|
| 37 |
+
authors:
|
| 38 |
+
- name: FoolDev
|
| 39 |
+
url: "https://huggingface.co/FoolDev/janus-27b"
|
Janus-35B-A3B.Q4_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a076aa0d3a1aab0bbfa24eb6a5163f6c8eebf6fc156f81c5820ae65dc4d19fc7
|
| 3 |
+
size 18939312896
|
LICENSE
ADDED
|
@@ -0,0 +1,201 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Apache License
|
| 2 |
+
Version 2.0, January 2004
|
| 3 |
+
http://www.apache.org/licenses/
|
| 4 |
+
|
| 5 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
| 6 |
+
|
| 7 |
+
1. Definitions.
|
| 8 |
+
|
| 9 |
+
"License" shall mean the terms and conditions for use, reproduction,
|
| 10 |
+
and distribution as defined by Sections 1 through 9 of this document.
|
| 11 |
+
|
| 12 |
+
"Licensor" shall mean the copyright owner or entity authorized by
|
| 13 |
+
the copyright owner that is granting the License.
|
| 14 |
+
|
| 15 |
+
"Legal Entity" shall mean the union of the acting entity and all
|
| 16 |
+
other entities that control, are controlled by, or are under common
|
| 17 |
+
control with that entity. For the purposes of this definition,
|
| 18 |
+
"control" means (i) the power, direct or indirect, to cause the
|
| 19 |
+
direction or management of such entity, whether by contract or
|
| 20 |
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
| 21 |
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
| 22 |
+
|
| 23 |
+
"You" (or "Your") shall mean an individual or Legal Entity
|
| 24 |
+
exercising permissions granted by this License.
|
| 25 |
+
|
| 26 |
+
"Source" form shall mean the preferred form for making modifications,
|
| 27 |
+
including but not limited to software source code, documentation
|
| 28 |
+
source, and configuration files.
|
| 29 |
+
|
| 30 |
+
"Object" form shall mean any form resulting from mechanical
|
| 31 |
+
transformation or translation of a Source form, including but
|
| 32 |
+
not limited to compiled object code, generated documentation,
|
| 33 |
+
and conversions to other media types.
|
| 34 |
+
|
| 35 |
+
"Work" shall mean the work of authorship, whether in Source or
|
| 36 |
+
Object form, made available under the License, as indicated by a
|
| 37 |
+
copyright notice that is included in or attached to the work
|
| 38 |
+
(an example is provided in the Appendix below).
|
| 39 |
+
|
| 40 |
+
"Derivative Works" shall mean any work, whether in Source or Object
|
| 41 |
+
form, that is based on (or derived from) the Work and for which the
|
| 42 |
+
editorial revisions, annotations, elaborations, or other modifications
|
| 43 |
+
represent, as a whole, an original work of authorship. For the purposes
|
| 44 |
+
of this License, Derivative Works shall not include works that remain
|
| 45 |
+
separable from, or merely link (or bind by name) to the interfaces of,
|
| 46 |
+
the Work and Derivative Works thereof.
|
| 47 |
+
|
| 48 |
+
"Contribution" shall mean any work of authorship, including
|
| 49 |
+
the original version of the Work and any modifications or additions
|
| 50 |
+
to that Work or Derivative Works thereof, that is intentionally
|
| 51 |
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
| 52 |
+
or by an individual or Legal Entity authorized to submit on behalf of
|
| 53 |
+
the copyright owner. For the purposes of this definition, "submitted"
|
| 54 |
+
means any form of electronic, verbal, or written communication sent
|
| 55 |
+
to the Licensor or its representatives, including but not limited to
|
| 56 |
+
communication on electronic mailing lists, source code control systems,
|
| 57 |
+
and issue tracking systems that are managed by, or on behalf of, the
|
| 58 |
+
Licensor for the purpose of discussing and improving the Work, but
|
| 59 |
+
excluding communication that is conspicuously marked or otherwise
|
| 60 |
+
designated in writing by the copyright owner as "Not a Contribution."
|
| 61 |
+
|
| 62 |
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
| 63 |
+
on behalf of whom a Contribution has been received by Licensor and
|
| 64 |
+
subsequently incorporated within the Work.
|
| 65 |
+
|
| 66 |
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
| 67 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 68 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 69 |
+
copyright license to reproduce, prepare Derivative Works of,
|
| 70 |
+
publicly display, publicly perform, sublicense, and distribute the
|
| 71 |
+
Work and such Derivative Works in Source or Object form.
|
| 72 |
+
|
| 73 |
+
3. Grant of Patent License. Subject to the terms and conditions of
|
| 74 |
+
this License, each Contributor hereby grants to You a perpetual,
|
| 75 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 76 |
+
(except as stated in this section) patent license to make, have made,
|
| 77 |
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
| 78 |
+
where such license applies only to those patent claims licensable
|
| 79 |
+
by such Contributor that are necessarily infringed by their
|
| 80 |
+
Contribution(s) alone or by combination of their Contribution(s)
|
| 81 |
+
with the Work to which such Contribution(s) was submitted. If You
|
| 82 |
+
institute patent litigation against any entity (including a
|
| 83 |
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
| 84 |
+
or a Contribution incorporated within the Work constitutes direct
|
| 85 |
+
or contributory patent infringement, then any patent licenses
|
| 86 |
+
granted to You under this License for that Work shall terminate
|
| 87 |
+
as of the date such litigation is filed.
|
| 88 |
+
|
| 89 |
+
4. Redistribution. You may reproduce and distribute copies of the
|
| 90 |
+
Work or Derivative Works thereof in any medium, with or without
|
| 91 |
+
modifications, and in Source or Object form, provided that You
|
| 92 |
+
meet the following conditions:
|
| 93 |
+
|
| 94 |
+
(a) You must give any other recipients of the Work or
|
| 95 |
+
Derivative Works a copy of this License; and
|
| 96 |
+
|
| 97 |
+
(b) You must cause any modified files to carry prominent notices
|
| 98 |
+
stating that You changed the files; and
|
| 99 |
+
|
| 100 |
+
(c) You must retain, in the Source form of any Derivative Works
|
| 101 |
+
that You distribute, all copyright, patent, trademark, and
|
| 102 |
+
attribution notices from the Source form of the Work,
|
| 103 |
+
excluding those notices that do not pertain to any part of
|
| 104 |
+
the Derivative Works; and
|
| 105 |
+
|
| 106 |
+
(d) If the Work includes a "NOTICE" text file as part of its
|
| 107 |
+
distribution, then any Derivative Works that You distribute must
|
| 108 |
+
include a readable copy of the attribution notices contained
|
| 109 |
+
within such NOTICE file, excluding those notices that do not
|
| 110 |
+
pertain to any part of the Derivative Works, in at least one
|
| 111 |
+
of the following places: within a NOTICE text file distributed
|
| 112 |
+
as part of the Derivative Works; within the Source form or
|
| 113 |
+
documentation, if provided along with the Derivative Works; or,
|
| 114 |
+
within a display generated by the Derivative Works, if and
|
| 115 |
+
wherever such third-party notices normally appear. The contents
|
| 116 |
+
of the NOTICE file are for informational purposes only and
|
| 117 |
+
do not modify the License. You may add Your own attribution
|
| 118 |
+
notices within Derivative Works that You distribute, alongside
|
| 119 |
+
or as an addendum to the NOTICE text from the Work, provided
|
| 120 |
+
that such additional attribution notices cannot be construed
|
| 121 |
+
as modifying the License.
|
| 122 |
+
|
| 123 |
+
You may add Your own copyright statement to Your modifications and
|
| 124 |
+
may provide additional or different license terms and conditions
|
| 125 |
+
for use, reproduction, or distribution of Your modifications, or
|
| 126 |
+
for any such Derivative Works as a whole, provided Your use,
|
| 127 |
+
reproduction, and distribution of the Work otherwise complies with
|
| 128 |
+
the conditions stated in this License.
|
| 129 |
+
|
| 130 |
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
| 131 |
+
any Contribution intentionally submitted for inclusion in the Work
|
| 132 |
+
by You to the Licensor shall be under the terms and conditions of
|
| 133 |
+
this License, without any additional terms or conditions.
|
| 134 |
+
Notwithstanding the above, nothing herein shall supersede or modify
|
| 135 |
+
the terms of any separate license agreement you may have executed
|
| 136 |
+
with Licensor regarding such Contributions.
|
| 137 |
+
|
| 138 |
+
6. Trademarks. This License does not grant permission to use the trade
|
| 139 |
+
names, trademarks, service marks, or product names of the Licensor,
|
| 140 |
+
except as required for describing the origin of the Work and
|
| 141 |
+
reproducing the content of the NOTICE file.
|
| 142 |
+
|
| 143 |
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
| 144 |
+
agreed to in writing, Licensor provides the Work (and each
|
| 145 |
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
| 146 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
| 147 |
+
implied, including, without limitation, any warranties or conditions
|
| 148 |
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
| 149 |
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
| 150 |
+
appropriateness of using or redistributing the Work and assume any
|
| 151 |
+
risks associated with Your exercise of permissions under this License.
|
| 152 |
+
|
| 153 |
+
8. Limitation of Liability. In no event and under no legal theory,
|
| 154 |
+
whether in tort (including negligence), contract, or otherwise,
|
| 155 |
+
unless required by applicable law (such as deliberate and grossly
|
| 156 |
+
negligent acts) or agreed to in writing, shall any Contributor be
|
| 157 |
+
liable to You for damages, including any direct, indirect, special,
|
| 158 |
+
incidental, or consequential damages of any character arising as a
|
| 159 |
+
result of this License or out of the use or inability to use the
|
| 160 |
+
Work (including but not limited to damages for loss of goodwill,
|
| 161 |
+
work stoppage, computer failure or malfunction, or any and all
|
| 162 |
+
other commercial damages or losses), even if such Contributor
|
| 163 |
+
has been advised of the possibility of such damages.
|
| 164 |
+
|
| 165 |
+
9. Accepting Warranty or Additional Liability. While redistributing
|
| 166 |
+
the Work or Derivative Works thereof, You may accept and charge a
|
| 167 |
+
fee for acceptance of support, warranty, indemnity, or other liability
|
| 168 |
+
obligations and/or rights consistent with this License. However, in
|
| 169 |
+
accepting such obligations, You may act only on Your own behalf and
|
| 170 |
+
on Your sole responsibility, not on behalf of any other Contributor,
|
| 171 |
+
and only if You agree to indemnify, defend, and hold each Contributor
|
| 172 |
+
harmless for any liability incurred by, or claims asserted against,
|
| 173 |
+
such Contributor by reason of your accepting any such warranty or
|
| 174 |
+
additional liability.
|
| 175 |
+
|
| 176 |
+
END OF TERMS AND CONDITIONS
|
| 177 |
+
|
| 178 |
+
APPENDIX: How to apply the Apache License to your work.
|
| 179 |
+
|
| 180 |
+
To apply the Apache License to your work, attach the following
|
| 181 |
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
| 182 |
+
replaced with your own identifying information. (Don't include
|
| 183 |
+
the brackets!) The text should be enclosed in the appropriate
|
| 184 |
+
comment syntax for the file format. We also recommend that a
|
| 185 |
+
file or class name and description of purpose be included on the
|
| 186 |
+
same "printed page" as the copyright notice for easier
|
| 187 |
+
identification within third-party archives.
|
| 188 |
+
|
| 189 |
+
Copyright 2025 FoolDev
|
| 190 |
+
|
| 191 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
| 192 |
+
you may not use this file except in compliance with the License.
|
| 193 |
+
You may obtain a copy of the License at
|
| 194 |
+
|
| 195 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 196 |
+
|
| 197 |
+
Unless required by applicable law or agreed to in writing, software
|
| 198 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
| 199 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 200 |
+
See the License for the specific language governing permissions and
|
| 201 |
+
limitations under the License.
|
Modelfile
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM ./Janus-35B-A3B.Q4_K_M.gguf
|
| 2 |
+
|
| 3 |
+
# Chat template — Qwen 3.6 ChatML in Ollama Go-template form, with the
|
| 4 |
+
# tool-calling blocks Ollama's capability detector looks for. Without a
|
| 5 |
+
# TEMPLATE that references .Tools and .ToolCalls, /api/chat and
|
| 6 |
+
# /v1/chat/completions reject any request carrying a `tools` array with
|
| 7 |
+
# `<model> does not support tools`. Same template as the 27B dense sibling
|
| 8 |
+
# (FoolDev/janus-27b) — both share the Qwen 3.6 chat format.
|
| 9 |
+
TEMPLATE """{{- $lastUserIdx := -1 -}}
|
| 10 |
+
{{- range $idx, $msg := .Messages -}}
|
| 11 |
+
{{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
|
| 12 |
+
{{- end }}
|
| 13 |
+
{{- if or .System .Tools }}<|im_start|>system
|
| 14 |
+
{{ if .System }}{{ .System }}
|
| 15 |
+
|
| 16 |
+
{{ end }}
|
| 17 |
+
{{- if .Tools }}# Tools
|
| 18 |
+
|
| 19 |
+
You may call one or more functions to assist with the user query.
|
| 20 |
+
|
| 21 |
+
You are provided with function signatures within <tools></tools> XML tags:
|
| 22 |
+
<tools>
|
| 23 |
+
{{- range .Tools }}
|
| 24 |
+
{"type": "function", "function": {{ .Function }}}
|
| 25 |
+
{{- end }}
|
| 26 |
+
</tools>
|
| 27 |
+
|
| 28 |
+
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
|
| 29 |
+
<tool_call>
|
| 30 |
+
{"name": <function-name>, "arguments": <args-json-object>}
|
| 31 |
+
</tool_call>
|
| 32 |
+
{{- end -}}<|im_end|>
|
| 33 |
+
{{ end }}
|
| 34 |
+
{{- range $i, $_ := .Messages }}
|
| 35 |
+
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
|
| 36 |
+
{{- if eq .Role "user" }}<|im_start|>user
|
| 37 |
+
{{ .Content }}<|im_end|>
|
| 38 |
+
{{ else if eq .Role "assistant" }}<|im_start|>assistant
|
| 39 |
+
{{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
|
| 40 |
+
<think>{{ .Thinking }}</think>
|
| 41 |
+
{{ end -}}
|
| 42 |
+
{{ if .Content }}{{ .Content }}{{ end }}
|
| 43 |
+
{{- if .ToolCalls }}
|
| 44 |
+
{{- range .ToolCalls }}
|
| 45 |
+
<tool_call>
|
| 46 |
+
{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
|
| 47 |
+
</tool_call>
|
| 48 |
+
{{- end }}
|
| 49 |
+
{{- end }}{{ if not $last }}<|im_end|>
|
| 50 |
+
{{ end }}
|
| 51 |
+
{{- else if eq .Role "tool" }}<|im_start|>user
|
| 52 |
+
<tool_response>
|
| 53 |
+
{{ .Content }}
|
| 54 |
+
</tool_response><|im_end|>
|
| 55 |
+
{{ end }}
|
| 56 |
+
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
|
| 57 |
+
<think>
|
| 58 |
+
{{ end }}
|
| 59 |
+
{{- end }}"""
|
| 60 |
+
|
| 61 |
+
# Sampling tuned for reasoning + general use. See README "Recommended sampling"
|
| 62 |
+
# for creative/RP alternatives.
|
| 63 |
+
PARAMETER temperature 0.6
|
| 64 |
+
PARAMETER top_p 0.95
|
| 65 |
+
PARAMETER top_k 20
|
| 66 |
+
PARAMETER repeat_penalty 1.05
|
| 67 |
+
PARAMETER num_ctx 16384
|
| 68 |
+
|
| 69 |
+
# Stop tokens. Without these, Ollama only honors <|im_end|> from the GGUF
|
| 70 |
+
# metadata; the model occasionally emits <|endoftext|> instead and Ollama
|
| 71 |
+
# keeps generating past it (synthesising a fake new user turn). Listing
|
| 72 |
+
# both — plus <|im_start|> as a belt-and-braces guard against the same
|
| 73 |
+
# loop — keeps responses cleanly terminated. Same fix the 27B sibling
|
| 74 |
+
# (FoolDev/janus-27b) shipped in commit 6672746.
|
| 75 |
+
PARAMETER stop "<|im_end|>"
|
| 76 |
+
PARAMETER stop "<|endoftext|>"
|
| 77 |
+
PARAMETER stop "<|im_start|>"
|
| 78 |
+
|
| 79 |
+
SYSTEM """You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
|
| 80 |
+
|
| 81 |
+
Behavior rules:
|
| 82 |
+
- Answer the user's actual request directly.
|
| 83 |
+
- Be accurate, complete, and structured.
|
| 84 |
+
- Think before answering, but do not get stuck in repetitive loops or meta-commentary.
|
| 85 |
+
- If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
|
| 86 |
+
- If the user wants creative writing, preserve tone, continuity, and character consistency.
|
| 87 |
+
- If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
|
| 88 |
+
- Finish with a usable answer, not just planning."""
|
| 89 |
+
|
| 90 |
+
# Hardware notes
|
| 91 |
+
# --------------
|
| 92 |
+
# This Q4_K_M is ~19 GB on disk. Real footprint at runtime:
|
| 93 |
+
# weights mmap ~19 GB
|
| 94 |
+
# compute graph alloc ~19 GB (Ollama log: device.go:272 "total memory")
|
| 95 |
+
# KV cache @ 16K ctx ~1 GB (with OLLAMA_KV_CACHE_TYPE=q8_0)
|
| 96 |
+
# total minimum ~38 GB
|
| 97 |
+
#
|
| 98 |
+
# Working configurations (verified or documented):
|
| 99 |
+
# ✓ Single H100 80GB / A100 80GB — full GPU offload
|
| 100 |
+
# ✓ RTX 5090 32GB / RTX 4090 24GB — partial offload, ~15-25 tok/s
|
| 101 |
+
# ✓ Mac Studio M2/M3 Ultra 64GB+ — unified memory, ~20+ tok/s
|
| 102 |
+
# ✓ Linux box with 64GB+ RAM (CPU-only) — ~3-6 tok/s
|
| 103 |
+
# ⚠ ASUS ROG Flow Z13 (Ryzen AI Max+, 32GB) — OOMs at default num_ctx 16384;
|
| 104 |
+
# fits with num_ctx ≤ 4096 and
|
| 105 |
+
# num_batch ≤ 256 (verified)
|
| 106 |
+
#
|
| 107 |
+
# Measured data point (ASUS ROG Flow Z13 GZ302EA-RU004W, Ryzen AI Max+ 395 +
|
| 108 |
+
# Radeon 8060S iGPU, 32 GB unified, ROCm gfx1151, OLLAMA_FLASH_ATTENTION=1,
|
| 109 |
+
# OLLAMA_KV_CACHE_TYPE=q8_0, num_ctx 4096, num_batch 256):
|
| 110 |
+
# Q4_K_M, 3-prompt mix → 28.71 tok/s aggregate
|
| 111 |
+
# (717 tokens / 25.0 s; 29.55 / 29.24 / 28.57 short/medium/long).
|
| 112 |
+
# ~97% of layers offload to the iGPU via ROCm. Compute split per
|
| 113 |
+
# `ollama ps` shows 3% CPU / 97% GPU at 4096 ctx.
|
| 114 |
+
#
|
| 115 |
+
# To run on a 32 GB unified-memory laptop, override these in your local
|
| 116 |
+
# Modelfile copy (or pass via -o on `ollama run`):
|
| 117 |
+
# PARAMETER num_ctx 4096
|
| 118 |
+
# PARAMETER num_batch 256
|
| 119 |
+
#
|
| 120 |
+
# If you have ≥48 GB RAM but want partial GPU offload, set:
|
| 121 |
+
# PARAMETER num_gpu 24 # offload most layers (model has 40)
|
README.md
ADDED
|
@@ -0,0 +1,335 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- Qwen/Qwen3.6-35B-A3B
|
| 5 |
+
datasets:
|
| 6 |
+
- crownelius/Creative_Writing_ShareGPT_Enhanced
|
| 7 |
+
- microsoft/rStar-Coder
|
| 8 |
+
- peteromallet/dataclaw-peteromallet
|
| 9 |
+
- crownelius/Opus-4.7-Reasoning
|
| 10 |
+
- openbmb/UltraData-Math
|
| 11 |
+
- Crownelius/Crow-Heretic-TeichAI-Unified
|
| 12 |
+
language:
|
| 13 |
+
- en
|
| 14 |
+
- zh
|
| 15 |
+
- ru
|
| 16 |
+
- es
|
| 17 |
+
- fr
|
| 18 |
+
- it
|
| 19 |
+
- ja
|
| 20 |
+
- ko
|
| 21 |
+
- de
|
| 22 |
+
- ar
|
| 23 |
+
- tr
|
| 24 |
+
- pl
|
| 25 |
+
- sv
|
| 26 |
+
- nl
|
| 27 |
+
- he
|
| 28 |
+
- id
|
| 29 |
+
- uk
|
| 30 |
+
- fa
|
| 31 |
+
- pt
|
| 32 |
+
- ms
|
| 33 |
+
- fi
|
| 34 |
+
- el
|
| 35 |
+
tags:
|
| 36 |
+
- qwen3_6
|
| 37 |
+
- moe
|
| 38 |
+
- conversational
|
| 39 |
+
- multimodal
|
| 40 |
+
- agent
|
| 41 |
+
- gguf
|
| 42 |
+
library_name: transformers
|
| 43 |
+
pipeline_tag: image-text-to-text
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
<img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/banner.svg" alt="Janus-35B banner" width="100%" />
|
| 47 |
+
|
| 48 |
+
[](https://opensource.org/licenses/Apache-2.0)
|
| 49 |
+
[](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
|
| 50 |
+
[](#architecture)
|
| 51 |
+
[](#whats-here)
|
| 52 |
+
[](https://buymeacoffee.com/cardoffoolm)
|
| 53 |
+
|
| 54 |
+
# Janus-35B
|
| 55 |
+
|
| 56 |
+
> **Flagship Reasoning. Sparse Footprint.**
|
| 57 |
+
> *Qwen 3.6 35B-A3B repackaged with Claude Opus 4.7 in the teacher slot.*
|
| 58 |
+
|
| 59 |
+
**`Architecture:`** `Qwen 3.6 35B-A3B (MoE)` | **`Total Params:`** `35B` | **`Active Params:`** `3B` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled MoE LLM`
|
| 60 |
+
|
| 61 |
+
A personal fork of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) — a 35B-total / 3B-active mixture-of-experts multimodal model — repackaged as Janus-35B with Claude Opus 4.7 reasoning data in the teacher slot.
|
| 62 |
+
|
| 63 |
+
## TL;DR
|
| 64 |
+
|
| 65 |
+
One-liner via Hugging Face (pulls a GGUF + this repo's root-level
|
| 66 |
+
`template` / `system` / `params` files, including the tool-calling
|
| 67 |
+
template — HF's Ollama bridge ingests those three files, not
|
| 68 |
+
`Modelfile`):
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
ollama run hf.co/FoolDev/Janus-35B # default ~19 GB Q4_K_M
|
| 72 |
+
ollama run hf.co/FoolDev/Janus-35B:Q4_K_M # same blob, explicit tag
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
Or build locally (uses this repo's `Modelfile`, kept in sync with the
|
| 76 |
+
three bridge files):
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
git clone https://huggingface.co/FoolDev/Janus-35B && cd Janus-35B
|
| 80 |
+
ollama create janus -f Modelfile && ollama run janus
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
After either path, `ollama show janus` lists `completion`, `tools`,
|
| 84 |
+
and `thinking` under Capabilities. Hardware: ~38 GB RAM at default
|
| 85 |
+
`num_ctx 16384`, or trim ctx + batch to fit 32 GB hosts (see
|
| 86 |
+
[Hardware requirements](#hardware-requirements)).
|
| 87 |
+
|
| 88 |
+
## What's here
|
| 89 |
+
|
| 90 |
+
| File | Use |
|
| 91 |
+
|---|---|
|
| 92 |
+
| `Janus-35B-A3B.Q4_K_M.gguf` | Recommended default, ~19 GB |
|
| 93 |
+
| `Modelfile` | Ollama wrapper for **local** builds (`ollama create janus -f Modelfile`) — overrides the GGUF's embedded template with one that exposes `.Tools` / `.ToolCalls` to Ollama's capability detector. |
|
| 94 |
+
| `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Janus-35B` directly. The bridge does **not** read `Modelfile` (see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)); it ingests these three root-level files instead. Kept in sync with the `Modelfile`'s `TEMPLATE` / `SYSTEM` / `PARAMETER` directives. |
|
| 95 |
+
| `scripts/check_bridge_sync.py` | Run before pushing a `Modelfile` / `template` / `system` / `params` edit to verify the four configurations remain in sync. Exits 0 if in sync, 1 with a per-key diff if not. |
|
| 96 |
+
|
| 97 |
+
GGUF-only release. Pull the upstream safetensors from `Qwen/Qwen3.6-35B-A3B` if you need the `transformers` tree.
|
| 98 |
+
|
| 99 |
+
## Architecture
|
| 100 |
+
|
| 101 |
+
<p align="left">
|
| 102 |
+
<img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/moe-routing.svg" alt="animated MoE routing visualization: 16x16 grid of 256 expert dots with 8 lit at any time, cycling through 8 routing patterns" width="640" />
|
| 103 |
+
</p>
|
| 104 |
+
|
| 105 |
+
- Qwen 3.6, 35B total / 3B active, MoE (256 experts, 8 activated per token)
|
| 106 |
+
- 40 layers, 10 × (3 × DeltaNet → MoE / 1 × Gated Attention → MoE)
|
| 107 |
+
- 262k native context, extensible to ~1M with YaRN
|
| 108 |
+
- Vision + video supported by upstream (mmproj not included in this release)
|
| 109 |
+
- Vocab 248,320
|
| 110 |
+
|
| 111 |
+
## Quick start
|
| 112 |
+
|
| 113 |
+
### llama.cpp / LM Studio
|
| 114 |
+
|
| 115 |
+
Drop the GGUF into your loader of choice. The chat template is embedded in the GGUF metadata, so llama.cpp's `--chat-template auto` and LM Studio's GGUF auto-detection handle plain conversation correctly.
|
| 116 |
+
|
| 117 |
+
### Ollama
|
| 118 |
+
|
| 119 |
+
The chat template baked into the GGUF is **not sufficient on Ollama** — it lacks the `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires, so a naive `ollama pull` reports `does not support tools` and rejects any request carrying a `tools` array. Two paths fix this:
|
| 120 |
+
|
| 121 |
+
```bash
|
| 122 |
+
# A. Pull straight from HF (uses the root-level template/system/params files):
|
| 123 |
+
ollama run hf.co/FoolDev/Janus-35B # default tag, ~19 GB Q4_K_M
|
| 124 |
+
ollama run hf.co/FoolDev/Janus-35B:Q4_K_M # same blob, explicit tag
|
| 125 |
+
# Note: HF's Ollama bridge does NOT read Modelfile; it reads template/system/params.
|
| 126 |
+
|
| 127 |
+
# B. Build locally (uses Modelfile, which is kept in sync with the three above):
|
| 128 |
+
ollama create janus -f Modelfile && ollama run janus
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
After either path, `ollama show janus` should list `completion`, `tools`, and `thinking` under Capabilities.
|
| 132 |
+
|
| 133 |
+
### Inference examples
|
| 134 |
+
|
| 135 |
+
Once the model is loaded (via `ollama run janus`, `lms server`, or `llama-server`), all the standard OpenAI-compatible clients work. Examples assume the loader is listening on `http://localhost:11434` (Ollama default) — adjust the port for LM Studio (`:1234`) or llama.cpp (`:8080`).
|
| 136 |
+
|
| 137 |
+
#### curl
|
| 138 |
+
|
| 139 |
+
```bash
|
| 140 |
+
curl -s http://localhost:11434/v1/chat/completions \
|
| 141 |
+
-H 'Content-Type: application/json' \
|
| 142 |
+
-d '{
|
| 143 |
+
"model": "janus",
|
| 144 |
+
"messages": [
|
| 145 |
+
{"role": "system", "content": "You are Janus, a precise reasoning assistant."},
|
| 146 |
+
{"role": "user", "content": "Sketch an algorithm to detect cycles in a directed graph."}
|
| 147 |
+
],
|
| 148 |
+
"temperature": 0.6,
|
| 149 |
+
"max_tokens": 800
|
| 150 |
+
}' | jq -r '.choices[0].message.content'
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
#### Python (openai-compat)
|
| 154 |
+
|
| 155 |
+
```python
|
| 156 |
+
from openai import OpenAI
|
| 157 |
+
|
| 158 |
+
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored")
|
| 159 |
+
|
| 160 |
+
resp = client.chat.completions.create(
|
| 161 |
+
model="janus",
|
| 162 |
+
messages=[
|
| 163 |
+
{"role": "user", "content": "Write a haiku about a stack overflow."}
|
| 164 |
+
],
|
| 165 |
+
temperature=0.8,
|
| 166 |
+
top_p=0.95,
|
| 167 |
+
)
|
| 168 |
+
print(resp.choices[0].message.content)
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
#### Streaming
|
| 172 |
+
|
| 173 |
+
```python
|
| 174 |
+
stream = client.chat.completions.create(
|
| 175 |
+
model="janus",
|
| 176 |
+
messages=[{"role": "user", "content": "Explain RoPE briefly."}],
|
| 177 |
+
stream=True,
|
| 178 |
+
)
|
| 179 |
+
for chunk in stream:
|
| 180 |
+
delta = chunk.choices[0].delta.content or ""
|
| 181 |
+
print(delta, end="", flush=True)
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
### Recommended sampling
|
| 185 |
+
|
| 186 |
+
| Use | temp | top_p | top_k | repeat_penalty |
|
| 187 |
+
|---|---:|---:|---:|---:|
|
| 188 |
+
| Reasoning / general | 0.6 | 0.95 | 20 | 1.05 |
|
| 189 |
+
| Creative / RP | 0.8 | 0.95 | 40 | 1.02 |
|
| 190 |
+
|
| 191 |
+
Lower temperature (0.4–0.6) and bump `repeat_penalty` to 1.08 if it loops inside `<think>` tags.
|
| 192 |
+
|
| 193 |
+
### System prompt
|
| 194 |
+
|
| 195 |
+
```text
|
| 196 |
+
You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
|
| 197 |
+
|
| 198 |
+
Behavior rules:
|
| 199 |
+
- Answer the user's actual request directly.
|
| 200 |
+
- Be accurate, complete, and structured.
|
| 201 |
+
- Think before answering, but do not get stuck in repetitive loops or meta-commentary.
|
| 202 |
+
- If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
|
| 203 |
+
- If the user wants creative writing, preserve tone, continuity, and character consistency.
|
| 204 |
+
- If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
|
| 205 |
+
- Finish with a usable answer, not just planning.
|
| 206 |
+
```
|
| 207 |
+
|
| 208 |
+
## Hardware requirements
|
| 209 |
+
|
| 210 |
+
This is an 18.9 GB Q4_K_M GGUF. Ollama's runtime footprint at default settings is **roughly 2× the model file** (weights mmap + compute graph allocation), plus KV cache — so ~38 GB total memory at `num_ctx 16384`. The compute-graph allocation scales with context and batch size, so 32 GB hosts can fit the model by trimming both (see Z13 row in the table).
|
| 211 |
+
|
| 212 |
+
| Hardware | Status |
|
| 213 |
+
|---|---|
|
| 214 |
+
| ≥48 GB RAM (CPU-only) | Works, ~3-6 tok/s |
|
| 215 |
+
| Single H100/A100 80 GB | Works, full offload, ~30+ tok/s |
|
| 216 |
+
| RTX 4090 24 GB / 5090 32 GB + 32 GB RAM | Works, partial offload, ~15-25 tok/s |
|
| 217 |
+
| Mac Studio M2/M3 Ultra 64 GB+ unified | Works, ~20+ tok/s |
|
| 218 |
+
| 32 GB unified-memory laptops (Ryzen AI Max+, Apple M-series) | Works with `num_ctx ≤ 4096` and `num_batch ≤ 256` to fit the compute graph; default 16K ctx OOMs. Measured 28.71 tok/s on ASUS ROG Flow Z13 GZ302EA at Q4_K_M (Radeon 8060S iGPU via ROCm gfx1151). |
|
| 219 |
+
|
| 220 |
+
## Chat template
|
| 221 |
+
|
| 222 |
+
The model uses the standard Qwen 3.x ChatML format with `<|im_start|>` / `<|im_end|>` role markers. The template is embedded in the GGUF metadata for plain conversation use, but Ollama users should rely on the `TEMPLATE` block in the included `Modelfile` — that version exposes the tool-calling scaffolding Ollama's capability detector requires (the embedded template alone is insufficient; see [Ollama](#ollama) above).
|
| 223 |
+
|
| 224 |
+
### Plain conversation
|
| 225 |
+
|
| 226 |
+
```text
|
| 227 |
+
<|im_start|>system
|
| 228 |
+
You are Janus, a precise and capable assistant…<|im_end|>
|
| 229 |
+
<|im_start|>user
|
| 230 |
+
What is the time complexity of mergesort?<|im_end|>
|
| 231 |
+
<|im_start|>assistant
|
| 232 |
+
```
|
| 233 |
+
|
| 234 |
+
### With reasoning trace
|
| 235 |
+
|
| 236 |
+
When the model decides to think, the assistant turn contains a `<think>…</think>` block followed by the visible answer:
|
| 237 |
+
|
| 238 |
+
```text
|
| 239 |
+
<|im_start|>assistant
|
| 240 |
+
<think>
|
| 241 |
+
The user is asking about mergesort. Mergesort divides the array, recursively sorts each half, then merges. The recurrence T(n) = 2T(n/2) + O(n) solves to O(n log n).
|
| 242 |
+
</think>
|
| 243 |
+
|
| 244 |
+
Mergesort runs in **O(n log n)** time in the worst, average, and best cases. The recurrence is T(n) = 2T(n/2) + O(n), which solves to Θ(n log n) by the master theorem.<|im_end|>
|
| 245 |
+
```
|
| 246 |
+
|
| 247 |
+
Most clients (Open WebUI, LibreChat, etc.) hide the `<think>` block by default and show only the final answer. If your client doesn't, set its "show reasoning" toggle off.
|
| 248 |
+
|
| 249 |
+
### Tool / function calling
|
| 250 |
+
|
| 251 |
+
The wire format depends on which path you take. **Both are valid** — the model adapts to whichever format the system prompt specifies.
|
| 252 |
+
|
| 253 |
+
**Ollama path** (this repo's `Modelfile`). The TEMPLATE advertises tools inside `<tools>…</tools>` and asks the model to reply in JSON-in-XML — the form Ollama's tool-call extractor parses into a structured `tool_calls` array on `/api/chat` and `/v1/chat/completions`:
|
| 254 |
+
|
| 255 |
+
```text
|
| 256 |
+
<tool_call>
|
| 257 |
+
{"name": "get_weather", "arguments": {"city": "Tokyo"}}
|
| 258 |
+
</tool_call>
|
| 259 |
+
```
|
| 260 |
+
|
| 261 |
+
**Embedded-jinja path** (llama.cpp, llama-cpp-python, LM Studio). The Qwen 3.6 native chat template baked into the GGUF instructs the model to emit a more verbose XML form. This is the shape you'll see if you talk to `llama-server` or LM Studio directly:
|
| 262 |
+
|
| 263 |
+
```text
|
| 264 |
+
<tool_call>
|
| 265 |
+
<function=get_weather>
|
| 266 |
+
<parameter=city>
|
| 267 |
+
Tokyo
|
| 268 |
+
</parameter>
|
| 269 |
+
</function>
|
| 270 |
+
</tool_call>
|
| 271 |
+
```
|
| 272 |
+
|
| 273 |
+
Pick the parser shape that matches your loader. Don't mix.
|
| 274 |
+
|
| 275 |
+
#### Example (Ollama, OpenAI-compatible API)
|
| 276 |
+
|
| 277 |
+
```python
|
| 278 |
+
from openai import OpenAI
|
| 279 |
+
|
| 280 |
+
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored")
|
| 281 |
+
|
| 282 |
+
resp = client.chat.completions.create(
|
| 283 |
+
model="janus",
|
| 284 |
+
messages=[
|
| 285 |
+
{"role": "user", "content": "Call get_weather for Tokyo. Respond ONLY with the tool call."}
|
| 286 |
+
],
|
| 287 |
+
tools=[{
|
| 288 |
+
"type": "function",
|
| 289 |
+
"function": {
|
| 290 |
+
"name": "get_weather",
|
| 291 |
+
"description": "Get current weather for a city",
|
| 292 |
+
"parameters": {
|
| 293 |
+
"type": "object",
|
| 294 |
+
"properties": {"city": {"type": "string"}},
|
| 295 |
+
"required": ["city"],
|
| 296 |
+
},
|
| 297 |
+
},
|
| 298 |
+
}],
|
| 299 |
+
temperature=0.3,
|
| 300 |
+
)
|
| 301 |
+
print(resp.choices[0].message.tool_calls)
|
| 302 |
+
# [ToolCall(id='call_xxx', type='function',
|
| 303 |
+
# function=Function(name='get_weather', arguments='{"city":"Tokyo"}'))]
|
| 304 |
+
```
|
| 305 |
+
|
| 306 |
+
#### Tips
|
| 307 |
+
|
| 308 |
+
- Use direct prompts ("Call X for Y") rather than soft hints ("Use the tool"). The model thinks before committing to a call, and weak prompts can exhaust `num_predict` inside the `<think>` block before the call is emitted.
|
| 309 |
+
- Allow at least `num_predict: 1024` (or `max_tokens: 1024`) for tool-calling turns, more if the schemas are large.
|
| 310 |
+
- The Modelfile's JSON-in-XML format is what Ollama's tool-call extractor understands; if you swap loaders, swap the parser to match (see "Embedded-jinja path" above).
|
| 311 |
+
|
| 312 |
+
## Known limitations
|
| 313 |
+
|
| 314 |
+
- **No mmproj in this release.** The base Qwen3.6 supports image and video input via a separate `mmproj` file, which is not included here. Text-only inference works out of the box; multimodal inference requires fetching `Qwen2.5-VL-*-mmproj-*.gguf` (or equivalent) from upstream.
|
| 315 |
+
- **Quantization-induced quality loss.** Q4_K_M is a strong general-purpose quant but does measurably degrade math and code accuracy compared to BF16. If you need maximum quality, run the upstream safetensors on a GPU that fits BF16 (~70 GB).
|
| 316 |
+
- **MoE expert utilization is uneven.** Stock Qwen3.6-35B-A3B routes 8 of 256 experts per token. On narrow domains (e.g. only one programming language) a small subset of experts dominates; load-balance loss was a training-time concern, not a runtime guarantee.
|
| 317 |
+
- **Thinking traces can loop.** Like most reasoning-distilled models, Janus-35B occasionally gets stuck repeating itself inside `<think>` tags. Mitigations: lower temperature to 0.4-0.6, raise `repeat_penalty` to 1.08, or set a `<think>`-token budget cap if your loader supports it.
|
| 318 |
+
- **Not aligned with any specific safety policy.** This is a personal repackage of an open-weight base model with reasoning-focused distillation. There is no RLHF refusal layer beyond what Qwen 3.6 ships with; downstream safety is the operator's responsibility.
|
| 319 |
+
- **No formal evaluation in this card.** Numbers in the hardware table are estimates, not measured. If you produce real benchmarks (MMLU, HumanEval, etc.) and want them included, file a PR.
|
| 320 |
+
|
| 321 |
+
## Related models
|
| 322 |
+
|
| 323 |
+
| Model | Size | Notes |
|
| 324 |
+
|---|---|---|
|
| 325 |
+
| [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | 35B / 3B active | Upstream base model. `transformers`-native multimodal weights. |
|
| 326 |
+
| [FoolDev/Thanatos-27B](https://huggingface.co/FoolDev/Thanatos-27B) | 27B dense | Dense sibling on the Qwen 3.6 27B base. Same teacher (Opus 4.7), same dataset family, smaller memory footprint, no MoE quirks. |
|
| 327 |
+
| [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B dense | Heretic-flavored fine-tune of the same Qwen 3.5 9B base used as a smaller starting point. Useful as a fast first-pass model when 35B is too heavy for the host. |
|
| 328 |
+
|
| 329 |
+
## Credits
|
| 330 |
+
|
| 331 |
+
- Base model: [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) (Alibaba)
|
| 332 |
+
- Reasoning teacher: Claude Opus 4.7 (Anthropic)
|
| 333 |
+
- Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius)
|
| 334 |
+
|
| 335 |
+
License inherited from upstream: Apache-2.0.
|
banner.png
ADDED
|
banner.svg
ADDED
|
|
moe-routing.svg
ADDED
|
|
params
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"temperature": 0.6,
|
| 3 |
+
"top_p": 0.95,
|
| 4 |
+
"top_k": 20,
|
| 5 |
+
"repeat_penalty": 1.05,
|
| 6 |
+
"num_ctx": 16384,
|
| 7 |
+
"stop": [
|
| 8 |
+
"<|im_end|>",
|
| 9 |
+
"<|endoftext|>",
|
| 10 |
+
"<|im_start|>"
|
| 11 |
+
]
|
| 12 |
+
}
|
scripts/check_bridge_sync.py
ADDED
|
@@ -0,0 +1,147 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Janus-35B — verify Modelfile and HF Ollama bridge files stay in sync.
|
| 4 |
+
|
| 5 |
+
The repo ships two parallel Ollama configurations:
|
| 6 |
+
|
| 7 |
+
- ``Modelfile`` is consumed by the local-build path
|
| 8 |
+
(``ollama create janus -f Modelfile``). It contains
|
| 9 |
+
``TEMPLATE`` / ``SYSTEM`` / ``PARAMETER`` directives.
|
| 10 |
+
- ``template`` / ``system`` / ``params`` at the repo root are consumed by HF's
|
| 11 |
+
Ollama bridge when users ``ollama run hf.co/FoolDev/janus`` directly. HF
|
| 12 |
+
does NOT read the Modelfile (per https://huggingface.co/docs/hub/en/ollama).
|
| 13 |
+
|
| 14 |
+
If the two configurations drift apart, ``hf.co/...`` users and local-build
|
| 15 |
+
users get different behaviour — exactly the bug fixed in commit 70ccef1
|
| 16 |
+
("Add HF Ollama bridge files (template/system/params)"). This script is
|
| 17 |
+
the regression guard: it parses the Modelfile, loads the three bridge
|
| 18 |
+
files, and fails on any mismatch.
|
| 19 |
+
|
| 20 |
+
Usage:
|
| 21 |
+
python3 scripts/check_bridge_sync.py
|
| 22 |
+
# exit 0 if in sync, 1 (with diff details) if not.
|
| 23 |
+
|
| 24 |
+
Run this manually before pushing a Modelfile / bridge-file edit. The 27B
|
| 25 |
+
sibling repo wires an equivalent script into scripts/check.sh and a
|
| 26 |
+
pre-commit hook; this repo intentionally stays leaner and runs it
|
| 27 |
+
on demand.
|
| 28 |
+
"""
|
| 29 |
+
from __future__ import annotations
|
| 30 |
+
|
| 31 |
+
import json
|
| 32 |
+
import re
|
| 33 |
+
import sys
|
| 34 |
+
from pathlib import Path
|
| 35 |
+
|
| 36 |
+
ROOT = Path(__file__).resolve().parent.parent
|
| 37 |
+
|
| 38 |
+
# Ollama Modelfile reference: https://github.com/ollama/ollama/blob/main/docs/modelfile.md
|
| 39 |
+
TEMPLATE_RE = re.compile(r'^TEMPLATE\s+"""(.*?)"""', re.DOTALL | re.MULTILINE)
|
| 40 |
+
SYSTEM_RE = re.compile(r'^SYSTEM\s+"""(.*?)"""', re.DOTALL | re.MULTILINE)
|
| 41 |
+
PARAMETER_RE = re.compile(r'^PARAMETER\s+(\S+)\s+(.*?)\s*$', re.MULTILINE)
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def parse_modelfile(text: str) -> tuple[str, str, dict[str, object]]:
|
| 45 |
+
"""Extract TEMPLATE, SYSTEM, and PARAMETER blocks from a Modelfile."""
|
| 46 |
+
tpl_match = TEMPLATE_RE.search(text)
|
| 47 |
+
if not tpl_match:
|
| 48 |
+
die("Modelfile has no TEMPLATE block")
|
| 49 |
+
template = tpl_match.group(1)
|
| 50 |
+
|
| 51 |
+
sys_match = SYSTEM_RE.search(text)
|
| 52 |
+
if not sys_match:
|
| 53 |
+
die("Modelfile has no SYSTEM block")
|
| 54 |
+
system = sys_match.group(1)
|
| 55 |
+
|
| 56 |
+
params: dict[str, object] = {}
|
| 57 |
+
stops: list[str] = []
|
| 58 |
+
for key, raw in PARAMETER_RE.findall(text):
|
| 59 |
+
# Strip outer quotes if present.
|
| 60 |
+
value: object = raw.strip()
|
| 61 |
+
if isinstance(value, str) and len(value) >= 2 and value[0] == value[-1] == '"':
|
| 62 |
+
value = value[1:-1]
|
| 63 |
+
# Stop tokens accumulate; everything else is scalar.
|
| 64 |
+
if key == "stop":
|
| 65 |
+
stops.append(value) # type: ignore[arg-type]
|
| 66 |
+
continue
|
| 67 |
+
# Cast known numeric params.
|
| 68 |
+
if key in {"temperature", "top_p", "top_k", "repeat_penalty",
|
| 69 |
+
"num_ctx", "num_predict", "num_gpu", "num_batch", "seed"}:
|
| 70 |
+
try:
|
| 71 |
+
value = float(value) if "." in str(value) else int(value) # type: ignore[arg-type]
|
| 72 |
+
except (TypeError, ValueError):
|
| 73 |
+
pass
|
| 74 |
+
params[key] = value
|
| 75 |
+
|
| 76 |
+
if stops:
|
| 77 |
+
params["stop"] = stops
|
| 78 |
+
|
| 79 |
+
return template, system, params
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def die(msg: str) -> None:
|
| 83 |
+
print(f"[FAIL] {msg}", file=sys.stderr)
|
| 84 |
+
sys.exit(1)
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
def diff_strings(label: str, expected: str, actual: str) -> bool:
|
| 88 |
+
if expected == actual:
|
| 89 |
+
return True
|
| 90 |
+
print(f"[FAIL] {label} drift detected", file=sys.stderr)
|
| 91 |
+
print(f" Modelfile len={len(expected)} bridge file len={len(actual)}", file=sys.stderr)
|
| 92 |
+
# Show the first diverging line for quick orientation.
|
| 93 |
+
e_lines = expected.splitlines()
|
| 94 |
+
a_lines = actual.splitlines()
|
| 95 |
+
for i, (e, a) in enumerate(zip(e_lines, a_lines)):
|
| 96 |
+
if e != a:
|
| 97 |
+
print(f" first diff at line {i + 1}:", file=sys.stderr)
|
| 98 |
+
print(f" modelfile : {e!r}", file=sys.stderr)
|
| 99 |
+
print(f" bridge : {a!r}", file=sys.stderr)
|
| 100 |
+
return False
|
| 101 |
+
if len(e_lines) != len(a_lines):
|
| 102 |
+
print(f" line count differs: modelfile={len(e_lines)} bridge={len(a_lines)}",
|
| 103 |
+
file=sys.stderr)
|
| 104 |
+
return False
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
def main() -> int:
|
| 108 |
+
modelfile = (ROOT / "Modelfile").read_text()
|
| 109 |
+
bridge_template = (ROOT / "template").read_text()
|
| 110 |
+
bridge_system = (ROOT / "system").read_text()
|
| 111 |
+
bridge_params = json.loads((ROOT / "params").read_text())
|
| 112 |
+
|
| 113 |
+
mf_template, mf_system, mf_params = parse_modelfile(modelfile)
|
| 114 |
+
|
| 115 |
+
ok = True
|
| 116 |
+
|
| 117 |
+
# 1. TEMPLATE: byte-for-byte.
|
| 118 |
+
ok &= diff_strings("TEMPLATE", mf_template, bridge_template)
|
| 119 |
+
|
| 120 |
+
# 2. SYSTEM: trim trailing whitespace on both ends. The bridge file
|
| 121 |
+
# typically has a trailing newline; the Modelfile block doesn't.
|
| 122 |
+
ok &= diff_strings("SYSTEM", mf_system.strip(), bridge_system.strip())
|
| 123 |
+
|
| 124 |
+
# 3. PARAMETER vs params JSON: compare normalized dicts.
|
| 125 |
+
if mf_params != bridge_params:
|
| 126 |
+
print("[FAIL] params drift detected", file=sys.stderr)
|
| 127 |
+
for k in sorted(set(mf_params) | set(bridge_params)):
|
| 128 |
+
mv = mf_params.get(k, "<missing>")
|
| 129 |
+
bv = bridge_params.get(k, "<missing>")
|
| 130 |
+
if mv != bv:
|
| 131 |
+
print(f" {k}: modelfile={mv!r} bridge={bv!r}", file=sys.stderr)
|
| 132 |
+
ok = False
|
| 133 |
+
|
| 134 |
+
if not ok:
|
| 135 |
+
print("\n[!] Modelfile and bridge files are out of sync.", file=sys.stderr)
|
| 136 |
+
print(" Edit them together: any change to TEMPLATE / SYSTEM /",
|
| 137 |
+
file=sys.stderr)
|
| 138 |
+
print(" PARAMETER must be reflected in template / system / params.",
|
| 139 |
+
file=sys.stderr)
|
| 140 |
+
return 1
|
| 141 |
+
|
| 142 |
+
print("[ ok ] Modelfile <-> bridge files in sync")
|
| 143 |
+
return 0
|
| 144 |
+
|
| 145 |
+
|
| 146 |
+
if __name__ == "__main__":
|
| 147 |
+
sys.exit(main())
|
system
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
|
| 2 |
+
|
| 3 |
+
Behavior rules:
|
| 4 |
+
- Answer the user's actual request directly.
|
| 5 |
+
- Be accurate, complete, and structured.
|
| 6 |
+
- Think before answering, but do not get stuck in repetitive loops or meta-commentary.
|
| 7 |
+
- If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
|
| 8 |
+
- If the user wants creative writing, preserve tone, continuity, and character consistency.
|
| 9 |
+
- If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
|
| 10 |
+
- Finish with a usable answer, not just planning.
|
template
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{{- $lastUserIdx := -1 -}}
|
| 2 |
+
{{- range $idx, $msg := .Messages -}}
|
| 3 |
+
{{- if eq $msg.Role "user" }}{{ $lastUserIdx = $idx }}{{ end -}}
|
| 4 |
+
{{- end }}
|
| 5 |
+
{{- if or .System .Tools }}<|im_start|>system
|
| 6 |
+
{{ if .System }}{{ .System }}
|
| 7 |
+
|
| 8 |
+
{{ end }}
|
| 9 |
+
{{- if .Tools }}# Tools
|
| 10 |
+
|
| 11 |
+
You may call one or more functions to assist with the user query.
|
| 12 |
+
|
| 13 |
+
You are provided with function signatures within <tools></tools> XML tags:
|
| 14 |
+
<tools>
|
| 15 |
+
{{- range .Tools }}
|
| 16 |
+
{"type": "function", "function": {{ .Function }}}
|
| 17 |
+
{{- end }}
|
| 18 |
+
</tools>
|
| 19 |
+
|
| 20 |
+
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
|
| 21 |
+
<tool_call>
|
| 22 |
+
{"name": <function-name>, "arguments": <args-json-object>}
|
| 23 |
+
</tool_call>
|
| 24 |
+
{{- end -}}<|im_end|>
|
| 25 |
+
{{ end }}
|
| 26 |
+
{{- range $i, $_ := .Messages }}
|
| 27 |
+
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
|
| 28 |
+
{{- if eq .Role "user" }}<|im_start|>user
|
| 29 |
+
{{ .Content }}<|im_end|>
|
| 30 |
+
{{ else if eq .Role "assistant" }}<|im_start|>assistant
|
| 31 |
+
{{ if (and $.IsThinkSet (and .Thinking (or $last (gt $i $lastUserIdx)))) -}}
|
| 32 |
+
<think>{{ .Thinking }}</think>
|
| 33 |
+
{{ end -}}
|
| 34 |
+
{{ if .Content }}{{ .Content }}{{ end }}
|
| 35 |
+
{{- if .ToolCalls }}
|
| 36 |
+
{{- range .ToolCalls }}
|
| 37 |
+
<tool_call>
|
| 38 |
+
{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
|
| 39 |
+
</tool_call>
|
| 40 |
+
{{- end }}
|
| 41 |
+
{{- end }}{{ if not $last }}<|im_end|>
|
| 42 |
+
{{ end }}
|
| 43 |
+
{{- else if eq .Role "tool" }}<|im_start|>user
|
| 44 |
+
<tool_response>
|
| 45 |
+
{{ .Content }}
|
| 46 |
+
</tool_response><|im_end|>
|
| 47 |
+
{{ end }}
|
| 48 |
+
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
|
| 49 |
+
<think>
|
| 50 |
+
{{ end }}
|
| 51 |
+
{{- end }}
|