KeyVoice Refiner 2B (v0.1)

A lightweight text-refinement LLM fine-tuned for cleaning up speech-to-text output. Specialized for tasks like punctuation, polite-form conversion, multilingual translation, summarization, and filler-word removal — not a general-purpose chat model.

If you want to try this model embedded in a finished app right away, you can use KeyVoice (a macOS voice-input app).

Model Details

  • Base model: Qwen/Qwen3.5-2B
  • Teacher model: Qwen/Qwen3.5-9B (sequence-level distillation)
  • Training method: LoRA (rank=32) + distillation (trained on the teacher's refined outputs)
  • Architecture: qwen3_5 (Dense)
  • Parameters: 2.27B
  • Quantization: Q4_K_M GGUF (refiner-2b-v0.1-Q4_K_M.gguf, 1.3 GB)
  • License: Apache 2.0 (inherited)

Intended Use

Designed for

  • Punctuation and line-break formatting on transcribed speech
  • Polite-form / honorific conversion (e.g. Japanese desu/masu tone)
  • Multilingual translation (validated across 46 languages)
  • Summarization / compression of verbose text
  • Filler-word removal (e.g. "um", "uh", or Japanese fillers like "eeto", "ano")
  • Custom prompt rewriting

Not designed for

  • General-purpose chat (refinement-specialized; conversational ability is weak)
  • Code generation or mathematical reasoning
  • Creative or long-form writing (use the Qwen3.5-9B teacher directly instead)
  • Safety-critical domains such as medical, legal, or financial advice

Performance

Internal evaluation (Phase C data-augmentation set):

  • Inference speed: ~104 tok/s (Apple Silicon M4, llama.cpp Q4_K_M)
  • Refinement-task evaluation: 46 languages, 100% pass rate
  • Memory: ~2 GB at Q4_K_M

How to Use

llama.cpp / llama-cpp-python

huggingface-cli download okayuji/KeyVoice-Refiner-2B-v0.1 \
  refiner-2b-v0.1-Q4_K_M.gguf \
  --local-dir ./models
./llama-cli -m ./models/refiner-2b-v0.1-Q4_K_M.gguf -p "..."

Chat template

Uses the same chat template as the base model (Qwen3.5-2B):

<|im_start|>system
You are an expert at refining transcribed text.
<|im_end|>
<|im_start|>user
{user instruction}
<|im_end|>
<|im_start|>assistant

Training Data

The training data was generated internally by the teacher (Qwen3.5-9B) producing refined outputs for the following tasks:

  • Transcribed speech → punctuation and line-break formatting
  • Plain text → polite form (desu/masu register)
  • Source text → translation across 46 languages
  • Verbose text → summarization / compression
  • Spoken text → filler-word removal

The exact dataset is not published (small-scale LoRA distillation by a single developer).

Limitations and Bias

  • Inherits the limitations of the base model. Knowledge cutoff and any biases of Qwen3.5-2B carry over.
  • Optimized for refinement tasks, so general-knowledge and reasoning ability may be lower than the base model in some cases.
  • Small model (2B) — extremely complex instructions may be misinterpreted.
  • For Japanese honorifics, the model targets the polite (desu/masu) register; nuanced use of higher honorifics (sonkeigo / kenjogo — respectful and humble forms) is weaker than the 9B teacher.

Citation

@misc{keyvoice-refiner-2b-v0.1,
  author = {okayuji},
  title = {KeyVoice Refiner 2B (v0.1)},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/okayuji/KeyVoice-Refiner-2B-v0.1}}
}

Base model citation

@misc{qwen3.5,
  title = {Qwen3.5},
  author = {Qwen Team},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Qwen/Qwen3.5-2B}}
}

License

This model is released under the Apache License 2.0, inheriting the license of the base model Qwen/Qwen3.5-2B and the teacher model Qwen/Qwen3.5-9B.

See LICENSE for full text.

Downloads last month
179
GGUF
Model size
2B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for okayuji/KeyVoice-Refiner-2B-v0.1

Finetuned
Qwen/Qwen3.5-2B
Finetuned
(177)
this model