QwenPaw-Flash-9B-heretic

F32 safetensors of QwenPaw-Flash-9B-heretic, a 9B dense model fine-tuned with Heretic methodology on Qwen3.5-9B.

Model Details

  • Base model: Qwen3.5-9B
  • Precision: F32 (float32 safetensors)
  • Parameters: ~9B
  • Shards: model-00001 ~ model-00008 (8 files, F32 main weights)
  • Additional: model-00009 (BF16, Multi-Token Prediction head extracted from Qwen3.5-9B)

MTP (Multi-Token Prediction)

model-00009-of-00009.safetensors contains the MTP head weights extracted from Qwen3.5-9B. MTP enables the model to predict multiple future tokens in a single forward pass, improving generation speed via speculative decoding.

  • MTP acceptance rate: ~43%
  • Speedup: ~1.5-1.9x decode throughput

For MTP-enabled GGUF inference, see the MTP GGUF repo below.

GGUF Quantized Versions

For inference with llama.cpp / Ollama / LM Studio, use the GGUF versions:

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "SC117/QwenPaw-Flash-9B-heretic",
    torch_dtype=torch.float32,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("SC117/QwenPaw-Flash-9B-heretic")

License

Same as base model (Qwen3.5-9B).

Downloads last month
679
Safetensors
Model size
9B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SC117/QwenPaw-Flash-9B-heretic

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(345)
this model
Quantizations
2 models