Spaces:
Sleeping
fix(hf-space): drop trust_remote_code=True (Phi-4-mini native phi3 path)
Browse filesRoot cause of the persistent ZeroGPU 'ImportError' identified via HF
Space dev-mode shell:
File "/home/user/.cache/huggingface/modules/transformers_modules/
microsoft/Phi_hyphen_4_hyphen_mini_hyphen_instruct/
cfbefacb99257ffa30c83adab238a50856ac3083/modeling_phi3.py",
line 37, in <module>
from transformers.utils import (
ImportError: cannot import name 'LossKwargs' from 'transformers.utils'
(/usr/local/lib/python3.10/site-packages/transformers/utils/__init__.py)
The custom modeling_phi3.py that ships with Phi-4-mini-instruct on HF
Hub was written against an older transformers version that exported
LossKwargs from transformers.utils. transformers 4.57+ removed it.
Loading with trust_remote_code=True downloads and executes that custom
code, which fails at import time and bricks the @spaces.GPU worker —
hence the sub-3s 'ImportError' on every call.
The fix: drop trust_remote_code=True entirely. Phi-4-mini-instruct's
architecture is `phi3`, which transformers 4.46+ supports natively via
Phi3ForCausalLM. No custom code download needed; native implementation
is already CUDA-ready and includes the same attention/MoE/activation
configuration the custom code provides.
Tradeoff: any future Phi-4 features that depend on the custom code (if
Microsoft ever ships architecture extensions only in modeling_phi3.py)
won't be picked up. None observed today; if it becomes an issue we can
either (a) pin transformers older than the LossKwargs removal, or (b)
swap ZEROGPU_MODEL_ID to a non-custom-code model like Qwen2.5-3B-Instruct.
Verified the diagnostic chain via VS Code-in-browser terminal on the
deployed Space (dev mode). Tests: 63 passed, 1 skipped (unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@@ -366,20 +366,24 @@ def _load_zerogpu_model():
|
|
| 366 |
"""Load the model + tokenizer once. Called lazily on first request
|
| 367 |
so module import stays fast (the model weights are tens of GB).
|
| 368 |
|
| 369 |
-
|
| 370 |
-
|
| 371 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 372 |
global _zerogpu_model, _zerogpu_tokenizer
|
| 373 |
if _zerogpu_model is not None:
|
| 374 |
return
|
| 375 |
-
_zerogpu_tokenizer = _AutoTokenizer.from_pretrained(
|
| 376 |
-
ZEROGPU_MODEL_ID,
|
| 377 |
-
trust_remote_code=True,
|
| 378 |
-
)
|
| 379 |
_zerogpu_model = _AutoModelForCausalLM.from_pretrained(
|
| 380 |
ZEROGPU_MODEL_ID,
|
| 381 |
torch_dtype=_torch.bfloat16,
|
| 382 |
-
trust_remote_code=True,
|
| 383 |
device_map="auto",
|
| 384 |
)
|
| 385 |
|
|
|
|
| 366 |
"""Load the model + tokenizer once. Called lazily on first request
|
| 367 |
so module import stays fast (the model weights are tens of GB).
|
| 368 |
|
| 369 |
+
We deliberately do NOT pass `trust_remote_code=True`. Phi-4-mini-instruct's
|
| 370 |
+
architecture is `phi3`, which transformers 4.46+ supports natively via
|
| 371 |
+
`Phi3ForCausalLM` — no custom code download required. The custom
|
| 372 |
+
modeling code that ships with the model on HF Hub
|
| 373 |
+
(`modeling_phi3.py`) imports `LossKwargs` from `transformers.utils`,
|
| 374 |
+
which was removed in transformers 4.57+ — so loading WITH
|
| 375 |
+
`trust_remote_code=True` fails with `ImportError: cannot import
|
| 376 |
+
name 'LossKwargs' from 'transformers.utils'` and bricks the
|
| 377 |
+
`@spaces.GPU` worker. Sticking to the native phi3 implementation
|
| 378 |
+
avoids the upstream pin-mismatch entirely.
|
| 379 |
+
"""
|
| 380 |
global _zerogpu_model, _zerogpu_tokenizer
|
| 381 |
if _zerogpu_model is not None:
|
| 382 |
return
|
| 383 |
+
_zerogpu_tokenizer = _AutoTokenizer.from_pretrained(ZEROGPU_MODEL_ID)
|
|
|
|
|
|
|
|
|
|
| 384 |
_zerogpu_model = _AutoModelForCausalLM.from_pretrained(
|
| 385 |
ZEROGPU_MODEL_ID,
|
| 386 |
torch_dtype=_torch.bfloat16,
|
|
|
|
| 387 |
device_map="auto",
|
| 388 |
)
|
| 389 |
|