fix: add missing use_deterministic_attn parameter to MoonViT3dEncoder

#22

MoonViT3dEncoder.init references self.use_deterministic_attn on line 575
when constructing the MoonViTEncoderLayer blocks, but the attribute is never
set on self. Loading the model via AutoModelForCausalLM with
trust_remote_code=True raises:

AttributeError: 'MoonViT3dEncoder' object has no attribute
                'use_deterministic_attn'

The sibling class MoonViTEncoderLayer already accepts use_deterministic_attn
as a keyword parameter with default False, so the attribute on the parent
3d-encoder was clearly intended to plumb through the same flag. Restore the
missing parameter with the same default.

Production serving paths (vLLM's Kimi-K25 model executor) bypass the HF
custom modeling init and construct the vision tower differently, so this
bug is invisible at serving time but blocks transformers-based workflows
like ModelOpt NVFP4 quantization and HF-native fine-tuning.

Identical fix already merged in Kimi-K2.5 PR #91 (by @katuni4ka , approved
by @fxmarty-amd ). This mirrors it to K2.6 byte-for-byte.

Minimal repro:

from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained(
    "moonshotai/Kimi-K2.6", trust_remote_code=True, torch_dtype="auto",
)

This resolved my issue, @bigeagle can we merge this in so we don't require other users to manually update?

Moonshot AI org

thanks for your contribution!

Moonshot AI org

@bdellabe @ace-coreweave Hi, I've also added some code to fix the weight initialization issue. AutoModelForCausalLM.from_pretrained now works on my end. However, this doesn't mean transformers inference is fully supported — if you plan to implement Kimi k2.6 inference in other frameworks, please mainly refer to the vLLM/SGLang implementation.

bigmoyan changed pull request status to merged

@bigmoyan Any plan to have compatibility with Transformers v5?

  File "/root/.cache/huggingface/modules/transformers_modules/moonshotai/Kimi_hyphen_K2_dot_6/2755962d07cb42aa2d988a35bcb65cd4a9c2de82/modeling_deepseek.py", line 47, in <module>
    from transformers.utils.import_utils import is_torch_fx_available
ImportError: cannot import name 'is_torch_fx_available' from 'transformers.utils.import_utils' (/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py). Did you mean: 'is_torch_available'?

Sign up or log in to comment