Instructions to use FoolDev/Janus-35B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FoolDev/Janus-35B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="FoolDev/Janus-35B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("FoolDev/Janus-35B", dtype="auto")

llama-cpp-python

How to use FoolDev/Janus-35B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="FoolDev/Janus-35B",
	filename="Janus-35B-A3B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use FoolDev/Janus-35B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf FoolDev/Janus-35B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf FoolDev/Janus-35B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf FoolDev/Janus-35B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf FoolDev/Janus-35B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf FoolDev/Janus-35B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf FoolDev/Janus-35B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf FoolDev/Janus-35B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf FoolDev/Janus-35B:Q4_K_M

Use Docker

docker model run hf.co/FoolDev/Janus-35B:Q4_K_M

LM Studio
Jan

vLLM

How to use FoolDev/Janus-35B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FoolDev/Janus-35B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Janus-35B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/FoolDev/Janus-35B:Q4_K_M

SGLang

How to use FoolDev/Janus-35B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FoolDev/Janus-35B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Janus-35B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FoolDev/Janus-35B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FoolDev/Janus-35B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use FoolDev/Janus-35B with Ollama:
```
ollama run hf.co/FoolDev/Janus-35B:Q4_K_M
```

Unsloth Studio new

How to use FoolDev/Janus-35B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Janus-35B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for FoolDev/Janus-35B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for FoolDev/Janus-35B to start chatting

Pi new

How to use FoolDev/Janus-35B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf FoolDev/Janus-35B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "FoolDev/Janus-35B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use FoolDev/Janus-35B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf FoolDev/Janus-35B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default FoolDev/Janus-35B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use FoolDev/Janus-35B with Docker Model Runner:
```
docker model run hf.co/FoolDev/Janus-35B:Q4_K_M
```

Lemonade

How to use FoolDev/Janus-35B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull FoolDev/Janus-35B:Q4_K_M

Run and chat with the model

lemonade run user.Janus-35B-Q4_K_M

List all available models

lemonade list

Janus-35B / README.md

FoolDev

Update sibling link: Thanatos-27B → Thanatos-27B-Heretic

52b76d1 6 days ago

preview code

raw

history blame contribute delete

15.3 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen3.6-35B-A3B
	datasets:
	- crownelius/Creative_Writing_ShareGPT_Enhanced
	- microsoft/rStar-Coder
	- peteromallet/dataclaw-peteromallet
	- crownelius/Opus-4.7-Reasoning
	- openbmb/UltraData-Math
	- Crownelius/Crow-Heretic-TeichAI-Unified
	language:
	- en
	- zh
	- ru
	- es
	- fr
	- it
	- ja
	- ko
	- de
	- ar
	- tr
	- pl
	- sv
	- nl
	- he
	- id
	- uk
	- fa
	- pt
	- ms
	- fi
	- el
	tags:
	- qwen3_6
	- moe
	- conversational
	- multimodal
	- agent
	- gguf
	library_name: transformers
	pipeline_tag: image-text-to-text
	---

	<img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/banner.svg" alt="Janus-35B banner" width="100%" />

	[![License](https://img.shields.io/badge/License-Apache_2.0-7aa2f7?style=flat&labelColor=1a1b26)](https://opensource.org/licenses/Apache-2.0)
	[![Base Model](https://img.shields.io/badge/Base-Qwen3.6--35B--A3B-bb9af7?style=flat&labelColor=1a1b26)](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
	[![Architecture](https://img.shields.io/badge/Arch-MoE_35B/3B_active-ff9e64?style=flat&labelColor=1a1b26)](#architecture)
	[![Quant](https://img.shields.io/badge/GGUF-Q4__K__M-9ece6a?style=flat&labelColor=1a1b26)](#whats-here)
	[![Buy me a coffee](https://img.shields.io/badge/%E2%98%95%20Buy_me_a_coffee-e0af68?style=flat&logo=buymeacoffee&logoColor=1a1b26&labelColor=1a1b26)](https://buymeacoffee.com/cardoffoolm)

	# Janus-35B

	> Flagship Reasoning. Sparse Footprint.
	> Qwen 3.6 35B-A3B repackaged with Claude Opus 4.7 in the teacher slot.

	`Architecture:` `Qwen 3.6 35B-A3B (MoE)` \| `Total Params:` `35B` \| `Active Params:` `3B` \| `Teacher:` `Claude Opus 4.7` \| `Type:` `Distilled MoE LLM`

	A personal fork of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) — a 35B-total / 3B-active mixture-of-experts multimodal model — repackaged as Janus-35B with Claude Opus 4.7 reasoning data in the teacher slot.

	## TL;DR

	One-liner via Hugging Face (pulls a GGUF + this repo's root-level
	`template` / `system` / `params` files, including the tool-calling
	template — HF's Ollama bridge ingests those three files, not
	`Modelfile`):

	```bash
	ollama run hf.co/FoolDev/Janus-35B # default ~19 GB Q4_K_M
	ollama run hf.co/FoolDev/Janus-35B:Q4_K_M # same blob, explicit tag
	```

	Or build locally (uses this repo's `Modelfile`, kept in sync with the
	three bridge files):

	```bash
	git clone https://huggingface.co/FoolDev/Janus-35B && cd Janus-35B
	ollama create janus -f Modelfile && ollama run janus
	```

	After either path, `ollama show janus` lists `completion`, `tools`,
	and `thinking` under Capabilities. Hardware: ~38 GB RAM at default
	`num_ctx 16384`, or trim ctx + batch to fit 32 GB hosts (see
	[Hardware requirements](#hardware-requirements)).

	## What's here

	\| File \| Use \|
	\|---\|---\|
	\| `Janus-35B-A3B.Q4_K_M.gguf` \| Recommended default, ~19 GB \|
	\| `Modelfile` \| Ollama wrapper for local builds (`ollama create janus -f Modelfile`) — overrides the GGUF's embedded template with one that exposes `.Tools` / `.ToolCalls` to Ollama's capability detector. \|
	\| `template`, `system`, `params` \| Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Janus-35B` directly. The bridge does not read `Modelfile` (see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)); it ingests these three root-level files instead. Kept in sync with the `Modelfile`'s `TEMPLATE` / `SYSTEM` / `PARAMETER` directives. \|
	\| `scripts/check_bridge_sync.py` \| Run before pushing a `Modelfile` / `template` / `system` / `params` edit to verify the four configurations remain in sync. Exits 0 if in sync, 1 with a per-key diff if not. \|

	GGUF-only release. Pull the upstream safetensors from `Qwen/Qwen3.6-35B-A3B` if you need the `transformers` tree.

	## Architecture

	<p align="left">
	<img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/moe-routing.svg" alt="animated MoE routing visualization: 16x16 grid of 256 expert dots with 8 lit at any time, cycling through 8 routing patterns" width="640" />
	</p>

	- Qwen 3.6, 35B total / 3B active, MoE (256 experts, 8 activated per token)
	- 40 layers, 10 × (3 × DeltaNet → MoE / 1 × Gated Attention → MoE)
	- 262k native context, extensible to ~1M with YaRN
	- Vision + video supported by upstream (mmproj not included in this release)
	- Vocab 248,320

	## Quick start

	### llama.cpp / LM Studio

	Drop the GGUF into your loader of choice. The chat template is embedded in the GGUF metadata, so llama.cpp's `--chat-template auto` and LM Studio's GGUF auto-detection handle plain conversation correctly.

	### Ollama

	The chat template baked into the GGUF is not sufficient on Ollama — it lacks the `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires, so a naive `ollama pull` reports `does not support tools` and rejects any request carrying a `tools` array. Two paths fix this:

	```bash
	# A. Pull straight from HF (uses the root-level template/system/params files):
	ollama run hf.co/FoolDev/Janus-35B # default tag, ~19 GB Q4_K_M
	ollama run hf.co/FoolDev/Janus-35B:Q4_K_M # same blob, explicit tag
	# Note: HF's Ollama bridge does NOT read Modelfile; it reads template/system/params.

	# B. Build locally (uses Modelfile, which is kept in sync with the three above):
	ollama create janus -f Modelfile && ollama run janus
	```

	After either path, `ollama show janus` should list `completion`, `tools`, and `thinking` under Capabilities.

	### Inference examples

	Once the model is loaded (via `ollama run janus`, `lms server`, or `llama-server`), all the standard OpenAI-compatible clients work. Examples assume the loader is listening on `http://localhost:11434` (Ollama default) — adjust the port for LM Studio (`:1234`) or llama.cpp (`:8080`).

	#### curl

	```bash
	curl -s http://localhost:11434/v1/chat/completions \
	-H 'Content-Type: application/json' \
	-d '{
	"model": "janus",
	"messages": [
	{"role": "system", "content": "You are Janus, a precise reasoning assistant."},
	{"role": "user", "content": "Sketch an algorithm to detect cycles in a directed graph."}
	],
	"temperature": 0.6,
	"max_tokens": 800
	}' \| jq -r '.choices[0].message.content'
	```

	#### Python (openai-compat)

	```python
	from openai import OpenAI

	client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored")

	resp = client.chat.completions.create(
	model="janus",
	messages=[
	{"role": "user", "content": "Write a haiku about a stack overflow."}
	],
	temperature=0.8,
	top_p=0.95,
	)
	print(resp.choices[0].message.content)
	```

	#### Streaming

	```python
	stream = client.chat.completions.create(
	model="janus",
	messages=[{"role": "user", "content": "Explain RoPE briefly."}],
	stream=True,
	)
	for chunk in stream:
	delta = chunk.choices[0].delta.content or ""
	print(delta, end="", flush=True)
	```

	### Recommended sampling

	\| Use \| temp \| top_p \| top_k \| repeat_penalty \|
	\|---\|---:\|---:\|---:\|---:\|
	\| Reasoning / general \| 0.6 \| 0.95 \| 20 \| 1.05 \|
	\| Creative / RP \| 0.8 \| 0.95 \| 40 \| 1.02 \|

	Lower temperature (0.4–0.6) and bump `repeat_penalty` to 1.08 if it loops inside `<think>` tags.

	### System prompt

	```text
	You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.

	Behavior rules:
	- Answer the user's actual request directly.
	- Be accurate, complete, and structured.
	- Think before answering, but do not get stuck in repetitive loops or meta-commentary.
	- If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue.
	- If the user wants creative writing, preserve tone, continuity, and character consistency.
	- If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff.
	- Finish with a usable answer, not just planning.
	```

	## Hardware requirements

	This is an 18.9 GB Q4_K_M GGUF. Ollama's runtime footprint at default settings is roughly 2× the model file (weights mmap + compute graph allocation), plus KV cache — so ~38 GB total memory at `num_ctx 16384`. The compute-graph allocation scales with context and batch size, so 32 GB hosts can fit the model by trimming both (see Z13 row in the table).

	\| Hardware \| Status \|
	\|---\|---\|
	\| ≥48 GB RAM (CPU-only) \| Works, ~3-6 tok/s \|
	\| Single H100/A100 80 GB \| Works, full offload, ~30+ tok/s \|
	\| RTX 4090 24 GB / 5090 32 GB + 32 GB RAM \| Works, partial offload, ~15-25 tok/s \|
	\| Mac Studio M2/M3 Ultra 64 GB+ unified \| Works, ~20+ tok/s \|
	\| 32 GB unified-memory laptops (Ryzen AI Max+, Apple M-series) \| Works with `num_ctx ≤ 4096` and `num_batch ≤ 256` to fit the compute graph; default 16K ctx OOMs. Measured 28.71 tok/s on ASUS ROG Flow Z13 GZ302EA at Q4_K_M (Radeon 8060S iGPU via ROCm gfx1151). \|

	## Chat template

	The model uses the standard Qwen 3.x ChatML format with `<\|im_start\|>` / `<\|im_end\|>` role markers. The template is embedded in the GGUF metadata for plain conversation use, but Ollama users should rely on the `TEMPLATE` block in the included `Modelfile` — that version exposes the tool-calling scaffolding Ollama's capability detector requires (the embedded template alone is insufficient; see [Ollama](#ollama) above).

	### Plain conversation

	```text
	<\|im_start\|>system
	You are Janus, a precise and capable assistant…<\|im_end\|>
	<\|im_start\|>user
	What is the time complexity of mergesort?<\|im_end\|>
	<\|im_start\|>assistant
	```

	### With reasoning trace

	When the model decides to think, the assistant turn contains a `<think>…</think>` block followed by the visible answer:

	```text
	<\|im_start\|>assistant
	<think>
	The user is asking about mergesort. Mergesort divides the array, recursively sorts each half, then merges. The recurrence T(n) = 2T(n/2) + O(n) solves to O(n log n).
	</think>

	Mergesort runs in O(n log n) time in the worst, average, and best cases. The recurrence is T(n) = 2T(n/2) + O(n), which solves to Θ(n log n) by the master theorem.<\|im_end\|>
	```

	Most clients (Open WebUI, LibreChat, etc.) hide the `<think>` block by default and show only the final answer. If your client doesn't, set its "show reasoning" toggle off.

	### Tool / function calling

	The wire format depends on which path you take. Both are valid — the model adapts to whichever format the system prompt specifies.

	Ollama path (this repo's `Modelfile`). The TEMPLATE advertises tools inside `<tools>…</tools>` and asks the model to reply in JSON-in-XML — the form Ollama's tool-call extractor parses into a structured `tool_calls` array on `/api/chat` and `/v1/chat/completions`:

	```text
	<tool_call>
	{"name": "get_weather", "arguments": {"city": "Tokyo"}}
	</tool_call>
	```

	Embedded-jinja path (llama.cpp, llama-cpp-python, LM Studio). The Qwen 3.6 native chat template baked into the GGUF instructs the model to emit a more verbose XML form. This is the shape you'll see if you talk to `llama-server` or LM Studio directly:

	```text
	<tool_call>
	<function=get_weather>
	<parameter=city>
	Tokyo
	</parameter>
	</function>
	</tool_call>
	```

	Pick the parser shape that matches your loader. Don't mix.

	#### Example (Ollama, OpenAI-compatible API)

	```python
	from openai import OpenAI

	client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored")

	resp = client.chat.completions.create(
	model="janus",
	messages=[
	{"role": "user", "content": "Call get_weather for Tokyo. Respond ONLY with the tool call."}
	],
	tools=[{
	"type": "function",
	"function": {
	"name": "get_weather",
	"description": "Get current weather for a city",
	"parameters": {
	"type": "object",
	"properties": {"city": {"type": "string"}},
	"required": ["city"],
	},
	},
	}],
	temperature=0.3,
	)
	print(resp.choices[0].message.tool_calls)
	# [ToolCall(id='call_xxx', type='function',
	# function=Function(name='get_weather', arguments='{"city":"Tokyo"}'))]
	```

	#### Tips

	- Use direct prompts ("Call X for Y") rather than soft hints ("Use the tool"). The model thinks before committing to a call, and weak prompts can exhaust `num_predict` inside the `<think>` block before the call is emitted.
	- Allow at least `num_predict: 1024` (or `max_tokens: 1024`) for tool-calling turns, more if the schemas are large.
	- The Modelfile's JSON-in-XML format is what Ollama's tool-call extractor understands; if you swap loaders, swap the parser to match (see "Embedded-jinja path" above).

	## Known limitations

	- No mmproj in this release. The base Qwen3.6 supports image and video input via a separate `mmproj` file, which is not included here. Text-only inference works out of the box; multimodal inference requires fetching `Qwen2.5-VL--mmproj-.gguf` (or equivalent) from upstream.
	- Quantization-induced quality loss. Q4_K_M is a strong general-purpose quant but does measurably degrade math and code accuracy compared to BF16. If you need maximum quality, run the upstream safetensors on a GPU that fits BF16 (~70 GB).
	- MoE expert utilization is uneven. Stock Qwen3.6-35B-A3B routes 8 of 256 experts per token. On narrow domains (e.g. only one programming language) a small subset of experts dominates; load-balance loss was a training-time concern, not a runtime guarantee.
	- Thinking traces can loop. Like most reasoning-distilled models, Janus-35B occasionally gets stuck repeating itself inside `<think>` tags. Mitigations: lower temperature to 0.4-0.6, raise `repeat_penalty` to 1.08, or set a `<think>`-token budget cap if your loader supports it.
	- Not aligned with any specific safety policy. This is a personal repackage of an open-weight base model with reasoning-focused distillation. There is no RLHF refusal layer beyond what Qwen 3.6 ships with; downstream safety is the operator's responsibility.
	- No formal evaluation in this card. Numbers in the hardware table are estimates, not measured. If you produce real benchmarks (MMLU, HumanEval, etc.) and want them included, file a PR.

	## Related models

	\| Model \| Size \| Notes \|
	\|---\|---\|---\|
	\| [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) \| 35B / 3B active \| Upstream base model. `transformers`-native multimodal weights. \|
	\| [FoolDev/Thanatos-27B-Heretic](https://huggingface.co/FoolDev/Thanatos-27B-Heretic) \| 27B dense \| Dense sibling, now on `llmfan46/Qwen3.6-27B-uncensored-heretic-v2` (Heretic-style abliteration of the Qwen 3.6 27B base). Same teacher (Opus 4.7), same dataset family, smaller memory footprint, no MoE quirks, uncensored. (Renamed from `FoolDev/Thanatos-27B` — HF serves a 307 from the old path.) \|
	\| [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) \| 9B dense \| Heretic-flavored fine-tune of the same Qwen 3.5 9B base used as a smaller starting point. Useful as a fast first-pass model when 35B is too heavy for the host. \|

	## Credits

	- Base model: [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) (Alibaba)
	- Reasoning teacher: Claude Opus 4.7 (Anthropic)
	- Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius)

	License inherited from upstream: Apache-2.0.