trans2

Paused

App Files Files Community

Mayo commited on Apr 26

Commit

fefd132

unverified ·

1 Parent(s): 45f8398

feat: Qwen3.6

Browse files

Files changed (9) hide show

.cargo/config.toml +2 -1
README.md +5 -3
docs/en-US/explanation/models-and-providers.md +3 -1
docs/ja-JP/explanation/models-and-providers.md +3 -1
docs/pt-BR/explanation/models-and-providers.md +3 -1
docs/zh-CN/explanation/models-and-providers.md +3 -1
koharu-llm/src/lib.rs +67 -18
koharu-llm/src/safe/context.rs +0 -5
koharu-llm/src/safe/mtmd.rs +9 -1

.cargo/config.toml CHANGED Viewed

@@ -1,7 +1,8 @@
 [env]
 # refer: https://stackoverflow.com/questions/43577885/is-there-a-cargo-environment-variable-for-the-workspace-directory
 CARGO_WORKSPACE_DIR = { value = "", relative = true }
-LLAMA_CPP_TAG = "b8665"
 # CUDA 13.0 requires C++17
 NVCC_PREPEND_FLAGS = "-std=c++17"
 # override nvidia-smi compute capability

 [env]
 # refer: https://stackoverflow.com/questions/43577885/is-there-a-cargo-environment-variable-for-the-workspace-directory
 CARGO_WORKSPACE_DIR = { value = "", relative = true }
+# llama.cpp release tag
+LLAMA_CPP_TAG = "b8935"
 # CUDA 13.0 requires C++17
 NVCC_PREPEND_FLAGS = "-std=c++17"
 # override nvidia-smi compute capability

README.md CHANGED Viewed

@@ -229,6 +229,7 @@ These are broad instruct models that work well when you want one local model for
 - Gemma 4 instruct: [gemma4-e2b-it](https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF), [gemma4-e4b-it](https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF), [gemma4-26b-a4b-it](https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF), [gemma4-31b-it](https://huggingface.co/unsloth/gemma-4-31B-it-GGUF)
 - Qwen 3.5: [qwen3.5-0.8b](https://huggingface.co/unsloth/Qwen3.5-0.8B-GGUF), [qwen3.5-2b](https://huggingface.co/unsloth/Qwen3.5-2B-GGUF), [qwen3.5-4b](https://huggingface.co/unsloth/Qwen3.5-4B-GGUF), [qwen3.5-9b](https://huggingface.co/unsloth/Qwen3.5-9B-GGUF), [qwen3.5-27b](https://huggingface.co/unsloth/Qwen3.5-27B-GGUF), [qwen3.5-35b-a3b](https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF)
 #### NSFW-Capable Local Models
@@ -236,17 +237,18 @@ These variants relax the safety tuning applied to the corresponding base instruc
 - Gemma 4 uncensored: [gemma4-e2b-uncensored](https://huggingface.co/HauhauCS/Gemma-4-E2B-Uncensored-HauhauCS-Aggressive), [gemma4-e4b-uncensored](https://huggingface.co/HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive)
 - Qwen 3.5 uncensored: [qwen3.5-2b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-2B-Uncensored-HauhauCS-Aggressive), [qwen3.5-4b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive), [qwen3.5-9b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive), [qwen3.5-27b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive), [qwen3.5-35b-a3b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive)
 #### Fine-Tuned Translation Models
 These models are more specialized for translation quality, language coverage, or lower-resource setups.
-- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): around 8.5 GB in Q8_0, best when translation quality matters more than speed or memory use
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF): a smaller multilingual instruct model that is easier to run on CPUs or low-memory GPUs
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) and [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF): larger translation-oriented options when you have more VRAM or RAM available
-- [sakura-galtransl-7b-v3.7](https://huggingface.co/SakuraLLM/Sakura-GalTransl-7B-v3.7): around 6.3 GB, a good balance of quality and speed on 8 GB GPUs
 - [sakura-1.5b-qwen2.5-v1.0](https://huggingface.co/shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX): lighter and faster, useful on mid-range GPUs or CPU-only setups
-- [hunyuan-mt-7b](https://huggingface.co/Mungert/Hunyuan-MT-7B-GGUF): around 6.3 GB, with broad multilingual translation coverage
 LLMs are downloaded on demand when you activate a model. For constrained memory environments, start with a smaller model. When VRAM or RAM permits, 7B and 8B class models generally provide better translation quality.

 - Gemma 4 instruct: [gemma4-e2b-it](https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF), [gemma4-e4b-it](https://huggingface.co/unsloth/gemma-4-E4B-it-GGUF), [gemma4-26b-a4b-it](https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF), [gemma4-31b-it](https://huggingface.co/unsloth/gemma-4-31B-it-GGUF)
 - Qwen 3.5: [qwen3.5-0.8b](https://huggingface.co/unsloth/Qwen3.5-0.8B-GGUF), [qwen3.5-2b](https://huggingface.co/unsloth/Qwen3.5-2B-GGUF), [qwen3.5-4b](https://huggingface.co/unsloth/Qwen3.5-4B-GGUF), [qwen3.5-9b](https://huggingface.co/unsloth/Qwen3.5-9B-GGUF), [qwen3.5-27b](https://huggingface.co/unsloth/Qwen3.5-27B-GGUF), [qwen3.5-35b-a3b](https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF)
+- Qwen 3.6: [qwen3.6-27b](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF), [qwen3.6-35b-a3b](https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF)
 #### NSFW-Capable Local Models
 - Gemma 4 uncensored: [gemma4-e2b-uncensored](https://huggingface.co/HauhauCS/Gemma-4-E2B-Uncensored-HauhauCS-Aggressive), [gemma4-e4b-uncensored](https://huggingface.co/HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive)
 - Qwen 3.5 uncensored: [qwen3.5-2b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-2B-Uncensored-HauhauCS-Aggressive), [qwen3.5-4b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive), [qwen3.5-9b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive), [qwen3.5-27b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggressive), [qwen3.5-35b-a3b-uncensored](https://huggingface.co/HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive)
+- Qwen 3.6 uncensored: [qwen3.6-27b-uncensored](https://huggingface.co/HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Balanced), [qwen3.6-35b-a3b-uncensored](https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive)
 #### Fine-Tuned Translation Models
 These models are more specialized for translation quality, language coverage, or lower-resource setups.
+- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): a Q5_K_M GGUF, best when translation quality matters more than speed or memory use
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF): a smaller multilingual instruct model that is easier to run on CPUs or low-memory GPUs
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) and [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF): larger translation-oriented options when you have more VRAM or RAM available
+- [sakura-galtransl-7b-v3.7](https://huggingface.co/SakuraLLM/Sakura-GalTransl-7B-v3.7): a smaller IQ4_XS GGUF, a good balance of quality and speed on 8 GB GPUs
 - [sakura-1.5b-qwen2.5-v1.0](https://huggingface.co/shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX): lighter and faster, useful on mid-range GPUs or CPU-only setups
+- [hunyuan-mt-7b](https://huggingface.co/Mungert/Hunyuan-MT-7B-GGUF): a Q4_K_M GGUF with broad multilingual translation coverage
 LLMs are downloaded on demand when you activate a model. For constrained memory environments, start with a smaller model. When VRAM or RAM permits, 7B and 8B class models generally provide better translation quality.

docs/en-US/explanation/models-and-providers.md CHANGED Viewed

@@ -57,7 +57,7 @@ In practice, the local models are usually quantized decoder-only transformers. G
 ### Translation-focused built-in local models for English output
-- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): around 8.5 GB in Q8_0 form, best when translation quality matters most
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF): a smaller multilingual instruct option for low-memory systems or faster iteration
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) and [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF): larger translation-oriented choices when you want more headroom
@@ -78,6 +78,8 @@ The local picker also includes general-purpose families that are not translation
 - Gemma 4 uncensored: `gemma4-e2b-uncensored`, `gemma4-e4b-uncensored`
 - Qwen 3.5: `qwen3.5-0.8b`, `qwen3.5-2b`, `qwen3.5-4b`, `qwen3.5-9b`, `qwen3.5-27b`, `qwen3.5-35b-a3b`
 - Qwen 3.5 uncensored: `qwen3.5-2b-uncensored`, `qwen3.5-4b-uncensored`, `qwen3.5-9b-uncensored`, `qwen3.5-27b-uncensored`, `qwen3.5-35b-a3b-uncensored`
 ## Remote providers

 ### Translation-focused built-in local models for English output
+- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): a Q5_K_M GGUF, best when translation quality matters most
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF): a smaller multilingual instruct option for low-memory systems or faster iteration
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) and [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF): larger translation-oriented choices when you want more headroom
 - Gemma 4 uncensored: `gemma4-e2b-uncensored`, `gemma4-e4b-uncensored`
 - Qwen 3.5: `qwen3.5-0.8b`, `qwen3.5-2b`, `qwen3.5-4b`, `qwen3.5-9b`, `qwen3.5-27b`, `qwen3.5-35b-a3b`
 - Qwen 3.5 uncensored: `qwen3.5-2b-uncensored`, `qwen3.5-4b-uncensored`, `qwen3.5-9b-uncensored`, `qwen3.5-27b-uncensored`, `qwen3.5-35b-a3b-uncensored`
+- Qwen 3.6: `qwen3.6-27b`, `qwen3.6-35b-a3b`
+- Qwen 3.6 uncensored: `qwen3.6-27b-uncensored`, `qwen3.6-35b-a3b-uncensored`
 ## Remote providers

docs/ja-JP/explanation/models-and-providers.md CHANGED Viewed

@@ -57,7 +57,7 @@ Koharu は [llama.cpp](https://github.com/ggml-org/llama.cpp) を通じてロー
 ### 英語出力向けの翻訳特化組み込みローカルモデル
-- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): Q8_0 で約 8.5 GB。翻訳品質を優先するなら有力
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF): 低メモリ環境や高速な試行に向く小型の多言語 instruction モデル
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) と [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF): より多くの VRAM / RAM を使える環境向けの大型翻訳寄りモデル
@@ -78,6 +78,8 @@ LLM ピッカーには、翻訳専用ではない汎用ファミリも含まれ
 - Gemma 4 uncensored: `gemma4-e2b-uncensored`, `gemma4-e4b-uncensored`
 - Qwen 3.5: `qwen3.5-0.8b`, `qwen3.5-2b`, `qwen3.5-4b`, `qwen3.5-9b`, `qwen3.5-27b`, `qwen3.5-35b-a3b`
 - Qwen 3.5 uncensored: `qwen3.5-2b-uncensored`, `qwen3.5-4b-uncensored`, `qwen3.5-9b-uncensored`, `qwen3.5-27b-uncensored`, `qwen3.5-35b-a3b-uncensored`
 ## リモートプロバイダ

 ### 英語出力向けの翻訳特化組み込みローカルモデル
+- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): Q5_K_M GGUF。翻訳品質を優先するなら有力
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF): 低メモリ環境や高速な試行に向く小型の多言語 instruction モデル
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) と [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF): より多くの VRAM / RAM を使える環境向けの大型翻訳寄りモデル
 - Gemma 4 uncensored: `gemma4-e2b-uncensored`, `gemma4-e4b-uncensored`
 - Qwen 3.5: `qwen3.5-0.8b`, `qwen3.5-2b`, `qwen3.5-4b`, `qwen3.5-9b`, `qwen3.5-27b`, `qwen3.5-35b-a3b`
 - Qwen 3.5 uncensored: `qwen3.5-2b-uncensored`, `qwen3.5-4b-uncensored`, `qwen3.5-9b-uncensored`, `qwen3.5-27b-uncensored`, `qwen3.5-35b-a3b-uncensored`
+- Qwen 3.6: `qwen3.6-27b`, `qwen3.6-35b-a3b`
+- Qwen 3.6 uncensored: `qwen3.6-27b-uncensored`, `qwen3.6-35b-a3b-uncensored`
 ## リモートプロバイダ

docs/pt-BR/explanation/models-and-providers.md CHANGED Viewed

@@ -57,7 +57,7 @@ Na prática, os modelos locais geralmente são transformers decoder-only quantiz
 ### Modelos locais internos focados em tradução para saída em inglês
-- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): cerca de 8,5 GB na forma Q8_0, melhor quando a qualidade da tradução importa mais
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF): uma opção menor multilíngue do tipo instruct para sistemas com pouca memória ou iteração mais rápida
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) e [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF): escolhas maiores orientadas para tradução quando você quer mais folga
@@ -78,6 +78,8 @@ O seletor local também inclui famílias de propósito geral que não são espec
 - Gemma 4 uncensored: `gemma4-e2b-uncensored`, `gemma4-e4b-uncensored`
 - Qwen 3.5: `qwen3.5-0.8b`, `qwen3.5-2b`, `qwen3.5-4b`, `qwen3.5-9b`, `qwen3.5-27b`, `qwen3.5-35b-a3b`
 - Qwen 3.5 uncensored: `qwen3.5-2b-uncensored`, `qwen3.5-4b-uncensored`, `qwen3.5-9b-uncensored`, `qwen3.5-27b-uncensored`, `qwen3.5-35b-a3b-uncensored`
 ## Provedores remotos

 ### Modelos locais internos focados em tradução para saída em inglês
+- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf): um GGUF Q5_K_M, melhor quando a qualidade da tradução importa mais
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF): uma opção menor multilíngue do tipo instruct para sistemas com pouca memória ou iteração mais rápida
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) e [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF): escolhas maiores orientadas para tradução quando você quer mais folga
 - Gemma 4 uncensored: `gemma4-e2b-uncensored`, `gemma4-e4b-uncensored`
 - Qwen 3.5: `qwen3.5-0.8b`, `qwen3.5-2b`, `qwen3.5-4b`, `qwen3.5-9b`, `qwen3.5-27b`, `qwen3.5-35b-a3b`
 - Qwen 3.5 uncensored: `qwen3.5-2b-uncensored`, `qwen3.5-4b-uncensored`, `qwen3.5-9b-uncensored`, `qwen3.5-27b-uncensored`, `qwen3.5-35b-a3b-uncensored`
+- Qwen 3.6: `qwen3.6-27b`, `qwen3.6-35b-a3b`
+- Qwen 3.6 uncensored: `qwen3.6-27b-uncensored`, `qwen3.6-35b-a3b-uncensored`
 ## Provedores remotos

docs/zh-CN/explanation/models-and-providers.md CHANGED Viewed

@@ -57,7 +57,7 @@ Koharu 通过 [llama.cpp](https://github.com/ggml-org/llama.cpp) 支持本地 GG
 ### 面向英文输出的翻译型内置本地模型
-- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf)：Q8_0 约 8.5 GB，更适合追求翻译质量
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF)：更小的多语言 instruction 模型，适合低内存机器或更快的迭代
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) 和 [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF)：更大的翻译取向模型，适合有更多 VRAM / RAM 的环境
@@ -78,6 +78,8 @@ Koharu 通过 [llama.cpp](https://github.com/ggml-org/llama.cpp) 支持本地 GG
 - Gemma 4 uncensored：`gemma4-e2b-uncensored`、`gemma4-e4b-uncensored`
 - Qwen 3.5：`qwen3.5-0.8b`、`qwen3.5-2b`、`qwen3.5-4b`、`qwen3.5-9b`、`qwen3.5-27b`、`qwen3.5-35b-a3b`
 - Qwen 3.5 uncensored：`qwen3.5-2b-uncensored`、`qwen3.5-4b-uncensored`、`qwen3.5-9b-uncensored`、`qwen3.5-27b-uncensored`、`qwen3.5-35b-a3b-uncensored`
 ## 远程提供商

 ### 面向英文输出的翻译型内置本地模型
+- [vntl-llama3-8b-v2](https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-gguf)：Q5_K_M GGUF，更适合追求翻译质量
 - [lfm2.5-1.2b-instruct](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF)：更小的多语言 instruction 模型，适合低内存机器或更快的迭代
 - [sugoi-14b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-14B-Ultra-GGUF) 和 [sugoi-32b-ultra](https://huggingface.co/sugoitoolkit/Sugoi-32B-Ultra-GGUF)：更大的翻译取向模型，适合有更多 VRAM / RAM 的环境
 - Gemma 4 uncensored：`gemma4-e2b-uncensored`、`gemma4-e4b-uncensored`
 - Qwen 3.5：`qwen3.5-0.8b`、`qwen3.5-2b`、`qwen3.5-4b`、`qwen3.5-9b`、`qwen3.5-27b`、`qwen3.5-35b-a3b`
 - Qwen 3.5 uncensored：`qwen3.5-2b-uncensored`、`qwen3.5-4b-uncensored`、`qwen3.5-9b-uncensored`、`qwen3.5-27b-uncensored`、`qwen3.5-35b-a3b-uncensored`
+- Qwen 3.6：`qwen3.6-27b`、`qwen3.6-35b-a3b`
+- Qwen 3.6 uncensored：`qwen3.6-27b-uncensored`、`qwen3.6-35b-a3b-uncensored`
 ## 远程提供商

koharu-llm/src/lib.rs CHANGED Viewed

@@ -51,7 +51,7 @@ pub enum ModelId {
         serialize = "vntl-llama3-8b-v2",
         props(
             repo = "lmg-anon/vntl-llama3-8b-v2-gguf",
-            filename = "vntl-llama3-8b-v2-hf-q8_0.gguf",
             languages = "en-US"
         )
     )]
@@ -60,7 +60,7 @@ pub enum ModelId {
         serialize = "lfm2.5-1.2b-instruct",
         props(
             repo = "LiquidAI/LFM2.5-1.2B-Instruct-GGUF",
-            filename = "LFM2.5-1.2B-Instruct-Q8_0.gguf",
             languages = "en-US,ar-SA,zh-CN,fr-FR,de-DE,ja-JP,ko-KR,pt-PT,es-ES"
         )
     )]
@@ -69,7 +69,7 @@ pub enum ModelId {
         serialize = "sakura-galtransl-7b-v3.7",
         props(
             repo = "SakuraLLM/Sakura-GalTransl-7B-v3.7",
-            filename = "Sakura-Galtransl-7B-v3.7.gguf",
             languages = "zh-CN"
         )
     )]
@@ -87,7 +87,7 @@ pub enum ModelId {
         serialize = "hunyuan-mt-7b",
         props(
             repo = "Mungert/Hunyuan-MT-7B-GGUF",
-            filename = "Hunyuan-MT-7B-q6_k_m.gguf",
             languages = "zh-CN,en-US,fr-FR,pt-PT,pt-BR,es-ES,ja-JP,tr-TR,ru-RU,ar-SA,ko-KR,th-TH,it-IT,de-DE,vi-VN,ms-MY,id-ID,fil-PH,hi-IN,zh-TW,pl-PL,cs-CZ,nl-NL,km-KH,my-MM,fa-IR,gu-IN,ur-PK,te-IN,mr-IN,he-IL,bn-BD,ta-IN,uk-UA,bo-CN,kk-KZ,mn-MN,ug-CN,yue-HK"
         )
     )]
@@ -96,7 +96,7 @@ pub enum ModelId {
         serialize = "sugoi-14b-ultra",
         props(
             repo = "sugoitoolkit/Sugoi-14B-Ultra-GGUF",
-            filename = "Sugoi-14B-Ultra-Q8_0.gguf",
             languages = "en-US"
         )
     )]
@@ -114,7 +114,7 @@ pub enum ModelId {
         serialize = "gemma4-e2b-it",
         props(
             repo = "unsloth/gemma-4-E2B-it-GGUF",
-            filename = "gemma-4-e2b-it-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -123,7 +123,7 @@ pub enum ModelId {
         serialize = "gemma4-e4b-it",
         props(
             repo = "unsloth/gemma-4-E4B-it-GGUF",
-            filename = "gemma-4-e4b-it-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -132,7 +132,7 @@ pub enum ModelId {
         serialize = "gemma4-26b-a4b-it",
         props(
             repo = "unsloth/gemma-4-26B-A4B-it-GGUF",
-            filename = "gemma-4-26B-A4B-it-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -150,7 +150,7 @@ pub enum ModelId {
         serialize = "gemma4-e2b-uncensored",
         props(
             repo = "HauhauCS/Gemma-4-E2B-Uncensored-HauhauCS-Aggressive",
-            filename = "Gemma-4-E2B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf",
             languages = "*"
         )
     )]
@@ -168,7 +168,7 @@ pub enum ModelId {
         serialize = "qwen3.5-0.8b",
         props(
             repo = "unsloth/Qwen3.5-0.8B-GGUF",
-            filename = "Qwen3.5-0.8B-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -177,7 +177,7 @@ pub enum ModelId {
         serialize = "qwen3.5-2b",
         props(
             repo = "unsloth/Qwen3.5-2B-GGUF",
-            filename = "Qwen3.5-2B-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -186,7 +186,7 @@ pub enum ModelId {
         serialize = "qwen3.5-4b",
         props(
             repo = "unsloth/Qwen3.5-4B-GGUF",
-            filename = "Qwen3.5-4B-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -195,7 +195,7 @@ pub enum ModelId {
         serialize = "qwen3.5-9b",
         props(
             repo = "unsloth/Qwen3.5-9B-GGUF",
-            filename = "Qwen3.5-9B-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -213,16 +213,34 @@ pub enum ModelId {
         serialize = "qwen3.5-35b-a3b",
         props(
             repo = "unsloth/Qwen3.5-35B-A3B-GGUF",
-            filename = "Qwen3.5-35B-A3B-Q8_0.gguf",
             languages = "*"
         )
     )]
     Qwen3_5_35bA3b,
     #[strum(
         serialize = "qwen3.5-2b-uncensored",
         props(
             repo = "HauhauCS/Qwen3.5-2B-Uncensored-HauhauCS-Aggressive",
-            filename = "Qwen3.5-2B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -231,7 +249,7 @@ pub enum ModelId {
         serialize = "qwen3.5-4b-uncensored",
         props(
             repo = "HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive",
-            filename = "Qwen3.5-4B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -240,7 +258,7 @@ pub enum ModelId {
         serialize = "qwen3.5-9b-uncensored",
         props(
             repo = "HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive",
-            filename = "Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf",
             languages = "*"
         )
     )]
@@ -258,11 +276,29 @@ pub enum ModelId {
         serialize = "qwen3.5-35b-a3b-uncensored",
         props(
             repo = "HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive",
-            filename = "Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf",
             languages = "*"
         )
     )]
     Qwen3_5_35bA3bUncensored,
 }
 impl ModelId {
@@ -319,6 +355,19 @@ impl ModelId {
                 repeat_penalty: 1.0,
                 ..Default::default()
             },
             // Sugoi: temp=0.1, top_k=40, top_p=0.95, min_p=0.05, repeat=1.1
             Self::Sugoi14bUltra | Self::Sugoi32bUltra => GenerateOptions {
                 temperature: 0.1,

         serialize = "vntl-llama3-8b-v2",
         props(
             repo = "lmg-anon/vntl-llama3-8b-v2-gguf",
+            filename = "vntl-llama3-8b-v2-hf-q5_k_m.gguf",
             languages = "en-US"
         )
     )]
         serialize = "lfm2.5-1.2b-instruct",
         props(
             repo = "LiquidAI/LFM2.5-1.2B-Instruct-GGUF",
+            filename = "LFM2.5-1.2B-Instruct-Q4_K_M.gguf",
             languages = "en-US,ar-SA,zh-CN,fr-FR,de-DE,ja-JP,ko-KR,pt-PT,es-ES"
         )
     )]
         serialize = "sakura-galtransl-7b-v3.7",
         props(
             repo = "SakuraLLM/Sakura-GalTransl-7B-v3.7",
+            filename = "Sakura-Galtransl-7B-v3.7-IQ4_XS.gguf",
             languages = "zh-CN"
         )
     )]
         serialize = "hunyuan-mt-7b",
         props(
             repo = "Mungert/Hunyuan-MT-7B-GGUF",
+            filename = "Hunyuan-MT-7B-q4_k_m.gguf",
             languages = "zh-CN,en-US,fr-FR,pt-PT,pt-BR,es-ES,ja-JP,tr-TR,ru-RU,ar-SA,ko-KR,th-TH,it-IT,de-DE,vi-VN,ms-MY,id-ID,fil-PH,hi-IN,zh-TW,pl-PL,cs-CZ,nl-NL,km-KH,my-MM,fa-IR,gu-IN,ur-PK,te-IN,mr-IN,he-IL,bn-BD,ta-IN,uk-UA,bo-CN,kk-KZ,mn-MN,ug-CN,yue-HK"
         )
     )]
         serialize = "sugoi-14b-ultra",
         props(
             repo = "sugoitoolkit/Sugoi-14B-Ultra-GGUF",
+            filename = "Sugoi-14B-Ultra-Q4_K_M.gguf",
             languages = "en-US"
         )
     )]
         serialize = "gemma4-e2b-it",
         props(
             repo = "unsloth/gemma-4-E2B-it-GGUF",
+            filename = "gemma-4-E2B-it-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "gemma4-e4b-it",
         props(
             repo = "unsloth/gemma-4-E4B-it-GGUF",
+            filename = "gemma-4-E4B-it-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "gemma4-26b-a4b-it",
         props(
             repo = "unsloth/gemma-4-26B-A4B-it-GGUF",
+            filename = "gemma-4-26B-A4B-it-UD-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "gemma4-e2b-uncensored",
         props(
             repo = "HauhauCS/Gemma-4-E2B-Uncensored-HauhauCS-Aggressive",
+            filename = "Gemma-4-E2B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf",
             languages = "*"
         )
     )]
         serialize = "qwen3.5-0.8b",
         props(
             repo = "unsloth/Qwen3.5-0.8B-GGUF",
+            filename = "Qwen3.5-0.8B-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "qwen3.5-2b",
         props(
             repo = "unsloth/Qwen3.5-2B-GGUF",
+            filename = "Qwen3.5-2B-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "qwen3.5-4b",
         props(
             repo = "unsloth/Qwen3.5-4B-GGUF",
+            filename = "Qwen3.5-4B-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "qwen3.5-9b",
         props(
             repo = "unsloth/Qwen3.5-9B-GGUF",
+            filename = "Qwen3.5-9B-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "qwen3.5-35b-a3b",
         props(
             repo = "unsloth/Qwen3.5-35B-A3B-GGUF",
+            filename = "Qwen3.5-35B-A3B-Q4_K_M.gguf",
             languages = "*"
         )
     )]
     Qwen3_5_35bA3b,
+    #[strum(
+        serialize = "qwen3.6-27b",
+        props(
+            repo = "unsloth/Qwen3.6-27B-GGUF",
+            filename = "Qwen3.6-27B-IQ4_XS.gguf",
+            languages = "*"
+        )
+    )]
+    Qwen3_6_27b,
+    #[strum(
+        serialize = "qwen3.6-35b-a3b",
+        props(
+            repo = "unsloth/Qwen3.6-35B-A3B-GGUF",
+            filename = "Qwen3.6-35B-A3B-UD-IQ4_XS.gguf",
+            languages = "*"
+        )
+    )]
+    Qwen3_6_35bA3b,
     #[strum(
         serialize = "qwen3.5-2b-uncensored",
         props(
             repo = "HauhauCS/Qwen3.5-2B-Uncensored-HauhauCS-Aggressive",
+            filename = "Qwen3.5-2B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "qwen3.5-4b-uncensored",
         props(
             repo = "HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive",
+            filename = "Qwen3.5-4B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "qwen3.5-9b-uncensored",
         props(
             repo = "HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive",
+            filename = "Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
             languages = "*"
         )
     )]
         serialize = "qwen3.5-35b-a3b-uncensored",
         props(
             repo = "HauhauCS/Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive",
+            filename = "Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
             languages = "*"
         )
     )]
     Qwen3_5_35bA3bUncensored,
+    #[strum(
+        serialize = "qwen3.6-27b-uncensored",
+        props(
+            repo = "HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive",
+            filename = "Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf",
+            languages = "*"
+        )
+    )]
+    Qwen3_6_27bUncensored,
+    #[strum(
+        serialize = "qwen3.6-35b-a3b-uncensored",
+        props(
+            repo = "HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive",
+            filename = "Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf",
+            languages = "*"
+        )
+    )]
+    Qwen3_6_35bA3bUncensored,
 }
 impl ModelId {
                 repeat_penalty: 1.0,
                 ..Default::default()
             },
+            // Qwen3.6 non-thinking: temp=0.7, top_k=20, top_p=0.8, presence=1.5
+            Self::Qwen3_6_27b
+            | Self::Qwen3_6_35bA3b
+            | Self::Qwen3_6_27bUncensored
+            | Self::Qwen3_6_35bA3bUncensored => GenerateOptions {
+                temperature: 0.7,
+                top_k: Some(20),
+                top_p: Some(0.8),
+                min_p: Some(0.0),
+                presence_penalty: 1.5,
+                repeat_penalty: 1.0,
+                ..Default::default()
+            },
             // Sugoi: temp=0.1, top_k=40, top_p=0.95, min_p=0.05, repeat=1.1
             Self::Sugoi14bUltra | Self::Sugoi32bUltra => GenerateOptions {
                 temperature: 0.1,

koharu-llm/src/safe/context.rs CHANGED Viewed

@@ -363,11 +363,6 @@ impl<'model> LlamaContext<'model> {
         tracing::debug!("Remove lora adapter");
         Ok(())
     }
-    /// Print a breakdown of per-device memory use to the default logger.
-    pub fn print_memory_breakdown(&self) {
-        unsafe { crate::sys::llama_memory_breakdown_print(self.context.as_ptr()) }
-    }
 }
 impl Drop for LlamaContext<'_> {

         tracing::debug!("Remove lora adapter");
         Ok(())
     }
 }
 impl Drop for LlamaContext<'_> {

koharu-llm/src/safe/mtmd.rs CHANGED Viewed

@@ -190,7 +190,15 @@ impl MtmdContext {
     /// Check whether non-causal attention mask is needed before `llama_decode`.
     #[must_use]
     pub fn decode_use_non_causal(&self) -> bool {
-        unsafe { crate::sys::mtmd_decode_use_non_causal(self.context.as_ptr()) }
     }
     /// Check whether the current model uses M-RoPE for `llama_decode`.

     /// Check whether non-causal attention mask is needed before `llama_decode`.
     #[must_use]
     pub fn decode_use_non_causal(&self) -> bool {
+        unsafe { crate::sys::mtmd_decode_use_non_causal(self.context.as_ptr(), std::ptr::null()) }
+    }
+    /// Check whether non-causal attention mask is needed before decoding a chunk.
+    #[must_use]
+    pub fn decode_use_non_causal_for_chunk(&self, chunk: &MtmdInputChunk) -> bool {
+        unsafe {
+            crate::sys::mtmd_decode_use_non_causal(self.context.as_ptr(), chunk.chunk.as_ptr())
+        }
     }
     /// Check whether the current model uses M-RoPE for `llama_decode`.