FoolDev commited on
Commit
f605870
·
1 Parent(s): e4beea4

README polish: TL;DR up top, inline chat template

Browse files

Two small documentation improvements:

1. Move the TL;DR (3-line 'just install and run' block) to right after
the intro paragraph, before 'Why a 27B variant'. Someone landing on
the model card now sees a working command before any rationale.

2. Inline the chat-template section instead of pointing readers at the
35B sibling's README. Three short examples: plain conversation,
reasoning trace with <think>, and an XML <tool_call>. README now
stands on its own without a clickthrough.

3. Drop the 'Same as the 35B sibling:' lead-in above the system prompt
(the prompt was already inlined; the lead-in was just clutter).

Files changed (2) hide show
  1. CHANGELOG.md +13 -0
  2. README.md +72 -3
CHANGELOG.md CHANGED
@@ -7,6 +7,19 @@ and documentation**, not the underlying base model.
7
 
8
  ## [Unreleased]
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ### Added
11
  - `examples/llama_cpp_vision.py` — image-text-to-text via
12
  `llama-cpp-python` + a separate `mmproj-F16.gguf`. Currently the only
 
7
 
8
  ## [Unreleased]
9
 
10
+ ### Changed
11
+ - README: added a TL;DR section right after the intro paragraph so
12
+ someone scanning the page gets a working command without scrolling
13
+ past Why / What's here / Architecture.
14
+ - README chat-template section: replaced the cross-reference to the
15
+ 35B sibling card with inlined examples (plain conversation,
16
+ `<think>` reasoning trace, XML tool call). The README now stands
17
+ on its own.
18
+ - Minor: dropped the "Same as the 35B sibling:" lead-in above the
19
+ system prompt block; the prompt was already inlined.
20
+
21
+ ## [0.5.0] - 2026-05-02 — `e4beea4`
22
+
23
  ### Added
24
  - `examples/llama_cpp_vision.py` — image-text-to-text via
25
  `llama-cpp-python` + a separate `mmproj-F16.gguf`. Currently the only
README.md CHANGED
@@ -58,6 +58,27 @@ pipeline_tag: image-text-to-text
58
 
59
  A personal sibling to [`FoolDev/janus`](https://huggingface.co/FoolDev/janus). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  ## Why a 27B variant?
62
 
63
  The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only ~3B active per token. That makes it fast at inference but **memory-hungry at load time** — the full 35B has to live in VRAM/RAM even though only 3B is doing useful work each step.
@@ -167,7 +188,8 @@ Lower temperature (0.4-0.6) and bump `repeat_penalty` to 1.08 if it loops inside
167
 
168
  ### System prompt
169
 
170
- Same as the 35B sibling:
 
171
 
172
  ```text
173
  You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
@@ -240,9 +262,56 @@ The dense 27B is the easier of the two Janus models to deploy.
240
 
241
  ## Chat template
242
 
243
- Identical to the 35B sibling — Qwen 3.x ChatML with `<|im_start|>` / `<|im_end|>` markers, `<think>...</think>` for reasoning traces, XML-style `<tool_call>` for function calling. The template is embedded in the GGUF metadata.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
244
 
245
- See the [Janus-35B Chat template section](https://huggingface.co/FoolDev/janus#chat-template) for examples they apply unchanged here.
 
 
246
 
247
  ## Known limitations
248
 
 
58
 
59
  A personal sibling to [`FoolDev/janus`](https://huggingface.co/FoolDev/janus). Same teacher (Claude Opus 4.7), same dataset family, but built on the **dense** [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) base instead of the 35B-A3B MoE. Smaller, easier to deploy, no expert-routing surprises.
60
 
61
+ ## TL;DR
62
+
63
+ If you have Ollama and 24 GB of RAM (or a 24 GB GPU):
64
+
65
+ ```bash
66
+ git clone https://huggingface.co/FoolDev/janus-27b && cd janus-27b
67
+ make build # downloads ~17 GB GGUF and creates the model
68
+ ollama run janus-27b
69
+ ```
70
+
71
+ If you're on a 32 GB unified-memory laptop (Mac M-series, Z13, etc.) use
72
+ the smaller profile:
73
+
74
+ ```bash
75
+ make build PROFILE=z13 QUANT=Q3_K_S # ~12 GB GGUF, fits in ~17 GB total
76
+ ollama run janus-27b-z13
77
+ ```
78
+
79
+ For image input use llama.cpp directly — Ollama vision is broken for
80
+ this architecture upstream (see [Vision](#vision)).
81
+
82
  ## Why a 27B variant?
83
 
84
  The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only ~3B active per token. That makes it fast at inference but **memory-hungry at load time** — the full 35B has to live in VRAM/RAM even though only 3B is doing useful work each step.
 
188
 
189
  ### System prompt
190
 
191
+ The Modelfile bakes this in. Override per-request via the `system` role
192
+ in your client:
193
 
194
  ```text
195
  You are Janus, a precise and capable assistant for reasoning, writing, coding, and long-form dialogue.
 
262
 
263
  ## Chat template
264
 
265
+ Standard Qwen 3.x ChatML with `<|im_start|>` / `<|im_end|>` role markers,
266
+ `<think>...</think>` blocks for reasoning traces, and XML-style
267
+ `<tool_call>` for function calling. The template is embedded in the GGUF
268
+ metadata, so loaders that read it (llama.cpp, Ollama, LM Studio) handle
269
+ the formatting automatically.
270
+
271
+ #### Plain conversation
272
+
273
+ ```text
274
+ <|im_start|>system
275
+ You are Janus, a precise and capable assistant…<|im_end|>
276
+ <|im_start|>user
277
+ What is the time complexity of mergesort?<|im_end|>
278
+ <|im_start|>assistant
279
+ ```
280
+
281
+ #### With reasoning trace
282
+
283
+ ```text
284
+ <|im_start|>assistant
285
+ <think>
286
+ The user asked about mergesort. It splits, recursively sorts each half,
287
+ then merges. The recurrence T(n) = 2T(n/2) + O(n) solves to O(n log n).
288
+ </think>
289
+
290
+ Mergesort runs in **O(n log n)** time in the worst, average, and best
291
+ cases.<|im_end|>
292
+ ```
293
+
294
+ Most clients (Open WebUI, LibreChat, etc.) hide the `<think>` block by
295
+ default and surface only the visible answer. Strip it manually with
296
+ `re.sub(r"<think>.*?</think>\s*", "", content, flags=re.DOTALL)` if your
297
+ client doesn't.
298
+
299
+ #### Tool / function calling
300
+
301
+ The embedded template uses Qwen's XML format:
302
+
303
+ ```text
304
+ <tool_call>
305
+ <function=get_current_weather>
306
+ <parameter=city>Paris</parameter>
307
+ <parameter=unit>celsius</parameter>
308
+ </function>
309
+ </tool_call>
310
+ ```
311
 
312
+ Most OpenAI-compatible servers (Ollama, LM Studio, vLLM) translate
313
+ between this and the JSON `tool_calls` shape automatically. See
314
+ `examples/ollama_chat.py:tool_round_trip` for a working round-trip.
315
 
316
  ## Known limitations
317