Instructions to use FoolDev/Janus-35B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FoolDev/Janus-35B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="FoolDev/Janus-35B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("FoolDev/Janus-35B", dtype="auto") - llama-cpp-python
How to use FoolDev/Janus-35B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="FoolDev/Janus-35B", filename="Janus-35B-A3B.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use FoolDev/Janus-35B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Janus-35B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Janus-35B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf FoolDev/Janus-35B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf FoolDev/Janus-35B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf FoolDev/Janus-35B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf FoolDev/Janus-35B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf FoolDev/Janus-35B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf FoolDev/Janus-35B:Q4_K_M
Use Docker
docker model run hf.co/FoolDev/Janus-35B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use FoolDev/Janus-35B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FoolDev/Janus-35B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Janus-35B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/FoolDev/Janus-35B:Q4_K_M
- SGLang
How to use FoolDev/Janus-35B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FoolDev/Janus-35B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Janus-35B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FoolDev/Janus-35B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FoolDev/Janus-35B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use FoolDev/Janus-35B with Ollama:
ollama run hf.co/FoolDev/Janus-35B:Q4_K_M
- Unsloth Studio new
How to use FoolDev/Janus-35B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Janus-35B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FoolDev/Janus-35B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for FoolDev/Janus-35B to start chatting
- Pi new
How to use FoolDev/Janus-35B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Janus-35B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "FoolDev/Janus-35B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use FoolDev/Janus-35B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf FoolDev/Janus-35B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default FoolDev/Janus-35B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use FoolDev/Janus-35B with Docker Model Runner:
docker model run hf.co/FoolDev/Janus-35B:Q4_K_M
- Lemonade
How to use FoolDev/Janus-35B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull FoolDev/Janus-35B:Q4_K_M
Run and chat with the model
lemonade run user.Janus-35B-Q4_K_M
List all available models
lemonade list
| license: apache-2.0 | |
| base_model: | |
| - Qwen/Qwen3.6-35B-A3B | |
| datasets: | |
| - crownelius/Creative_Writing_ShareGPT_Enhanced | |
| - microsoft/rStar-Coder | |
| - peteromallet/dataclaw-peteromallet | |
| - crownelius/Opus-4.7-Reasoning | |
| - openbmb/UltraData-Math | |
| - Crownelius/Crow-Heretic-TeichAI-Unified | |
| language: | |
| - en | |
| - zh | |
| - ru | |
| - es | |
| - fr | |
| - it | |
| - ja | |
| - ko | |
| - de | |
| - ar | |
| - tr | |
| - pl | |
| - sv | |
| - nl | |
| - he | |
| - id | |
| - uk | |
| - fa | |
| - pt | |
| - ms | |
| - fi | |
| - el | |
| tags: | |
| - qwen3_6 | |
| - moe | |
| - conversational | |
| - multimodal | |
| - agent | |
| - gguf | |
| library_name: transformers | |
| pipeline_tag: image-text-to-text | |
| <img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/banner.svg" alt="Janus-35B banner" width="100%" /> | |
| [](https://opensource.org/licenses/Apache-2.0) | |
| [](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | |
| [](#architecture) | |
| [](#whats-here) | |
| [](https://buymeacoffee.com/cardoffoolm) | |
| # Janus-35B | |
| > **Flagship Reasoning. Sparse Footprint.** | |
| > *Qwen 3.6 35B-A3B repackaged with Claude Opus 4.7 in the teacher slot.* | |
| **`Architecture:`** `Qwen 3.6 35B-A3B (MoE)` | **`Total Params:`** `35B` | **`Active Params:`** `3B` | **`Teacher:`** `Claude Opus 4.7` | **`Type:`** `Distilled MoE LLM` | |
| A personal fork of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) — a 35B-total / 3B-active mixture-of-experts multimodal model — repackaged as Janus-35B with Claude Opus 4.7 reasoning data in the teacher slot. | |
| ## TL;DR | |
| One-liner via Hugging Face (pulls a GGUF + this repo's root-level | |
| `template` / `system` / `params` files, including the tool-calling | |
| template — HF's Ollama bridge ingests those three files, not | |
| `Modelfile`): | |
| ```bash | |
| ollama run hf.co/FoolDev/Janus-35B # default ~19 GB Q4_K_M | |
| ollama run hf.co/FoolDev/Janus-35B:Q4_K_M # same blob, explicit tag | |
| ``` | |
| Or build locally (uses this repo's `Modelfile`, kept in sync with the | |
| three bridge files): | |
| ```bash | |
| git clone https://huggingface.co/FoolDev/Janus-35B && cd Janus-35B | |
| ollama create janus -f Modelfile && ollama run janus | |
| ``` | |
| After either path, `ollama show janus` lists `completion`, `tools`, | |
| and `thinking` under Capabilities. Hardware: ~38 GB RAM at default | |
| `num_ctx 16384`, or trim ctx + batch to fit 32 GB hosts (see | |
| [Hardware requirements](#hardware-requirements)). | |
| ## What's here | |
| | File | Use | | |
| |---|---| | |
| | `Janus-35B-A3B.Q4_K_M.gguf` | Recommended default, ~19 GB | | |
| | `Modelfile` | Ollama wrapper for **local** builds (`ollama create janus -f Modelfile`) — overrides the GGUF's embedded template with one that exposes `.Tools` / `.ToolCalls` to Ollama's capability detector. | | |
| | `template`, `system`, `params` | Used by HF's Ollama bridge when users `ollama run hf.co/FoolDev/Janus-35B` directly. The bridge does **not** read `Modelfile` (see [HF Ollama docs](https://huggingface.co/docs/hub/en/ollama)); it ingests these three root-level files instead. Kept in sync with the `Modelfile`'s `TEMPLATE` / `SYSTEM` / `PARAMETER` directives. | | |
| | `scripts/check_bridge_sync.py` | Run before pushing a `Modelfile` / `template` / `system` / `params` edit to verify the four configurations remain in sync. Exits 0 if in sync, 1 with a per-key diff if not. | | |
| GGUF-only release. Pull the upstream safetensors from `Qwen/Qwen3.6-35B-A3B` if you need the `transformers` tree. | |
| ## Architecture | |
| <p align="left"> | |
| <img src="https://huggingface.co/FoolDev/Janus-35B/resolve/main/moe-routing.svg" alt="animated MoE routing visualization: 16x16 grid of 256 expert dots with 8 lit at any time, cycling through 8 routing patterns" width="640" /> | |
| </p> | |
| - Qwen 3.6, 35B total / 3B active, MoE (256 experts, 8 activated per token) | |
| - 40 layers, 10 × (3 × DeltaNet → MoE / 1 × Gated Attention → MoE) | |
| - 262k native context, extensible to ~1M with YaRN | |
| - Vision + video supported by upstream (mmproj not included in this release) | |
| - Vocab 248,320 | |
| ## Quick start | |
| ### llama.cpp / LM Studio | |
| Drop the GGUF into your loader of choice. The chat template is embedded in the GGUF metadata, so llama.cpp's `--chat-template auto` and LM Studio's GGUF auto-detection handle plain conversation correctly. | |
| ### Ollama | |
| The chat template baked into the GGUF is **not sufficient on Ollama** — it lacks the `.Tools` / `.ToolCalls` blocks Ollama's capability detector requires, so a naive `ollama pull` reports `does not support tools` and rejects any request carrying a `tools` array. Two paths fix this: | |
| ```bash | |
| # A. Pull straight from HF (uses the root-level template/system/params files): | |
| ollama run hf.co/FoolDev/Janus-35B # default tag, ~19 GB Q4_K_M | |
| ollama run hf.co/FoolDev/Janus-35B:Q4_K_M # same blob, explicit tag | |
| # Note: HF's Ollama bridge does NOT read Modelfile; it reads template/system/params. | |
| # B. Build locally (uses Modelfile, which is kept in sync with the three above): | |
| ollama create janus -f Modelfile && ollama run janus | |
| ``` | |
| After either path, `ollama show janus` should list `completion`, `tools`, and `thinking` under Capabilities. | |
| ### Inference examples | |
| Once the model is loaded (via `ollama run janus`, `lms server`, or `llama-server`), all the standard OpenAI-compatible clients work. Examples assume the loader is listening on `http://localhost:11434` (Ollama default) — adjust the port for LM Studio (`:1234`) or llama.cpp (`:8080`). | |
| #### curl | |
| ```bash | |
| curl -s http://localhost:11434/v1/chat/completions \ | |
| -H 'Content-Type: application/json' \ | |
| -d '{ | |
| "model": "janus", | |
| "messages": [ | |
| {"role": "system", "content": "You are Janus, a precise reasoning assistant."}, | |
| {"role": "user", "content": "Sketch an algorithm to detect cycles in a directed graph."} | |
| ], | |
| "temperature": 0.6, | |
| "max_tokens": 800 | |
| }' | jq -r '.choices[0].message.content' | |
| ``` | |
| #### Python (openai-compat) | |
| ```python | |
| from openai import OpenAI | |
| client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored") | |
| resp = client.chat.completions.create( | |
| model="janus", | |
| messages=[ | |
| {"role": "user", "content": "Write a haiku about a stack overflow."} | |
| ], | |
| temperature=0.8, | |
| top_p=0.95, | |
| ) | |
| print(resp.choices[0].message.content) | |
| ``` | |
| #### Streaming | |
| ```python | |
| stream = client.chat.completions.create( | |
| model="janus", | |
| messages=[{"role": "user", "content": "Explain RoPE briefly."}], | |
| stream=True, | |
| ) | |
| for chunk in stream: | |
| delta = chunk.choices[0].delta.content or "" | |
| print(delta, end="", flush=True) | |
| ``` | |
| ### Recommended sampling | |
| | Use | temp | top_p | top_k | repeat_penalty | | |
| |---|---:|---:|---:|---:| | |
| | Reasoning / general | 0.6 | 0.95 | 20 | 1.05 | | |
| | Creative / RP | 0.8 | 0.95 | 40 | 1.02 | | |
| Lower temperature (0.4–0.6) and bump `repeat_penalty` to 1.08 if it loops inside `<think>` tags. | |
| ### System prompt | |
| ```text | |
| You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue. | |
| Behavior rules: | |
| - Answer the user's actual request directly. | |
| - Be accurate, complete, and structured. | |
| - Think before answering, but do not get stuck in repetitive loops or meta-commentary. | |
| - If the request is ambiguous or incomplete, state what is missing and make the smallest reasonable assumption needed to continue. | |
| - If the user wants creative writing, preserve tone, continuity, and character consistency. | |
| - If the user wants analysis or technical help, prefer concrete steps, examples, and decisions over fluff. | |
| - Finish with a usable answer, not just planning. | |
| ``` | |
| ## Hardware requirements | |
| This is an 18.9 GB Q4_K_M GGUF. Ollama's runtime footprint at default settings is **roughly 2× the model file** (weights mmap + compute graph allocation), plus KV cache — so ~38 GB total memory at `num_ctx 16384`. The compute-graph allocation scales with context and batch size, so 32 GB hosts can fit the model by trimming both (see Z13 row in the table). | |
| | Hardware | Status | | |
| |---|---| | |
| | ≥48 GB RAM (CPU-only) | Works, ~3-6 tok/s | | |
| | Single H100/A100 80 GB | Works, full offload, ~30+ tok/s | | |
| | RTX 4090 24 GB / 5090 32 GB + 32 GB RAM | Works, partial offload, ~15-25 tok/s | | |
| | Mac Studio M2/M3 Ultra 64 GB+ unified | Works, ~20+ tok/s | | |
| | 32 GB unified-memory laptops (Ryzen AI Max+, Apple M-series) | Works with `num_ctx ≤ 4096` and `num_batch ≤ 256` to fit the compute graph; default 16K ctx OOMs. Measured 28.71 tok/s on ASUS ROG Flow Z13 GZ302EA at Q4_K_M (Radeon 8060S iGPU via ROCm gfx1151). | | |
| ## Chat template | |
| The model uses the standard Qwen 3.x ChatML format with `<|im_start|>` / `<|im_end|>` role markers. The template is embedded in the GGUF metadata for plain conversation use, but Ollama users should rely on the `TEMPLATE` block in the included `Modelfile` — that version exposes the tool-calling scaffolding Ollama's capability detector requires (the embedded template alone is insufficient; see [Ollama](#ollama) above). | |
| ### Plain conversation | |
| ```text | |
| <|im_start|>system | |
| You are Janus, a precise and capable assistant…<|im_end|> | |
| <|im_start|>user | |
| What is the time complexity of mergesort?<|im_end|> | |
| <|im_start|>assistant | |
| ``` | |
| ### With reasoning trace | |
| When the model decides to think, the assistant turn contains a `<think>…</think>` block followed by the visible answer: | |
| ```text | |
| <|im_start|>assistant | |
| <think> | |
| The user is asking about mergesort. Mergesort divides the array, recursively sorts each half, then merges. The recurrence T(n) = 2T(n/2) + O(n) solves to O(n log n). | |
| </think> | |
| Mergesort runs in **O(n log n)** time in the worst, average, and best cases. The recurrence is T(n) = 2T(n/2) + O(n), which solves to Θ(n log n) by the master theorem.<|im_end|> | |
| ``` | |
| Most clients (Open WebUI, LibreChat, etc.) hide the `<think>` block by default and show only the final answer. If your client doesn't, set its "show reasoning" toggle off. | |
| ### Tool / function calling | |
| The wire format depends on which path you take. **Both are valid** — the model adapts to whichever format the system prompt specifies. | |
| **Ollama path** (this repo's `Modelfile`). The TEMPLATE advertises tools inside `<tools>…</tools>` and asks the model to reply in JSON-in-XML — the form Ollama's tool-call extractor parses into a structured `tool_calls` array on `/api/chat` and `/v1/chat/completions`: | |
| ```text | |
| <tool_call> | |
| {"name": "get_weather", "arguments": {"city": "Tokyo"}} | |
| </tool_call> | |
| ``` | |
| **Embedded-jinja path** (llama.cpp, llama-cpp-python, LM Studio). The Qwen 3.6 native chat template baked into the GGUF instructs the model to emit a more verbose XML form. This is the shape you'll see if you talk to `llama-server` or LM Studio directly: | |
| ```text | |
| <tool_call> | |
| <function=get_weather> | |
| <parameter=city> | |
| Tokyo | |
| </parameter> | |
| </function> | |
| </tool_call> | |
| ``` | |
| Pick the parser shape that matches your loader. Don't mix. | |
| #### Example (Ollama, OpenAI-compatible API) | |
| ```python | |
| from openai import OpenAI | |
| client = OpenAI(base_url="http://localhost:11434/v1", api_key="ignored") | |
| resp = client.chat.completions.create( | |
| model="janus", | |
| messages=[ | |
| {"role": "user", "content": "Call get_weather for Tokyo. Respond ONLY with the tool call."} | |
| ], | |
| tools=[{ | |
| "type": "function", | |
| "function": { | |
| "name": "get_weather", | |
| "description": "Get current weather for a city", | |
| "parameters": { | |
| "type": "object", | |
| "properties": {"city": {"type": "string"}}, | |
| "required": ["city"], | |
| }, | |
| }, | |
| }], | |
| temperature=0.3, | |
| ) | |
| print(resp.choices[0].message.tool_calls) | |
| # [ToolCall(id='call_xxx', type='function', | |
| # function=Function(name='get_weather', arguments='{"city":"Tokyo"}'))] | |
| ``` | |
| #### Tips | |
| - Use direct prompts ("Call X for Y") rather than soft hints ("Use the tool"). The model thinks before committing to a call, and weak prompts can exhaust `num_predict` inside the `<think>` block before the call is emitted. | |
| - Allow at least `num_predict: 1024` (or `max_tokens: 1024`) for tool-calling turns, more if the schemas are large. | |
| - The Modelfile's JSON-in-XML format is what Ollama's tool-call extractor understands; if you swap loaders, swap the parser to match (see "Embedded-jinja path" above). | |
| ## Known limitations | |
| - **No mmproj in this release.** The base Qwen3.6 supports image and video input via a separate `mmproj` file, which is not included here. Text-only inference works out of the box; multimodal inference requires fetching `Qwen2.5-VL-*-mmproj-*.gguf` (or equivalent) from upstream. | |
| - **Quantization-induced quality loss.** Q4_K_M is a strong general-purpose quant but does measurably degrade math and code accuracy compared to BF16. If you need maximum quality, run the upstream safetensors on a GPU that fits BF16 (~70 GB). | |
| - **MoE expert utilization is uneven.** Stock Qwen3.6-35B-A3B routes 8 of 256 experts per token. On narrow domains (e.g. only one programming language) a small subset of experts dominates; load-balance loss was a training-time concern, not a runtime guarantee. | |
| - **Thinking traces can loop.** Like most reasoning-distilled models, Janus-35B occasionally gets stuck repeating itself inside `<think>` tags. Mitigations: lower temperature to 0.4-0.6, raise `repeat_penalty` to 1.08, or set a `<think>`-token budget cap if your loader supports it. | |
| - **Not aligned with any specific safety policy.** This is a personal repackage of an open-weight base model with reasoning-focused distillation. There is no RLHF refusal layer beyond what Qwen 3.6 ships with; downstream safety is the operator's responsibility. | |
| - **No formal evaluation in this card.** Numbers in the hardware table are estimates, not measured. If you produce real benchmarks (MMLU, HumanEval, etc.) and want them included, file a PR. | |
| ## Related models | |
| | Model | Size | Notes | | |
| |---|---|---| | |
| | [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | 35B / 3B active | Upstream base model. `transformers`-native multimodal weights. | | |
| | [FoolDev/Thanatos-27B-Heretic](https://huggingface.co/FoolDev/Thanatos-27B-Heretic) | 27B dense | Dense sibling, now on `llmfan46/Qwen3.6-27B-uncensored-heretic-v2` (Heretic-style abliteration of the Qwen 3.6 27B base). Same teacher (Opus 4.7), same dataset family, smaller memory footprint, no MoE quirks, uncensored. (Renamed from `FoolDev/Thanatos-27B` — HF serves a 307 from the old path.) | | |
| | [Crownelius/Crow-9B-HERETIC-4.6](https://huggingface.co/Crownelius/Crow-9B-HERETIC-4.6) | 9B dense | Heretic-flavored fine-tune of the same Qwen 3.5 9B base used as a smaller starting point. Useful as a fast first-pass model when 35B is too heavy for the host. | | |
| ## Credits | |
| - Base model: [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) (Alibaba) | |
| - Reasoning teacher: Claude Opus 4.7 (Anthropic) | |
| - Distillation lineage and dataset curation: [Crownelius](https://huggingface.co/Crownelius) | |
| License inherited from upstream: Apache-2.0. | |