apingali Claude Opus 4.7 (1M context) commited on
Commit
75a8a07
·
1 Parent(s): 415ec9b

fix(hf-space): drop trust_remote_code=True (Phi-4-mini native phi3 path)

Browse files

Root cause of the persistent ZeroGPU 'ImportError' identified via HF
Space dev-mode shell:

File "/home/user/.cache/huggingface/modules/transformers_modules/
microsoft/Phi_hyphen_4_hyphen_mini_hyphen_instruct/
cfbefacb99257ffa30c83adab238a50856ac3083/modeling_phi3.py",
line 37, in <module>
from transformers.utils import (
ImportError: cannot import name 'LossKwargs' from 'transformers.utils'
(/usr/local/lib/python3.10/site-packages/transformers/utils/__init__.py)

The custom modeling_phi3.py that ships with Phi-4-mini-instruct on HF
Hub was written against an older transformers version that exported
LossKwargs from transformers.utils. transformers 4.57+ removed it.
Loading with trust_remote_code=True downloads and executes that custom
code, which fails at import time and bricks the @spaces.GPU worker —
hence the sub-3s 'ImportError' on every call.

The fix: drop trust_remote_code=True entirely. Phi-4-mini-instruct's
architecture is `phi3`, which transformers 4.46+ supports natively via
Phi3ForCausalLM. No custom code download needed; native implementation
is already CUDA-ready and includes the same attention/MoE/activation
configuration the custom code provides.

Tradeoff: any future Phi-4 features that depend on the custom code (if
Microsoft ever ships architecture extensions only in modeling_phi3.py)
won't be picked up. None observed today; if it becomes an issue we can
either (a) pin transformers older than the LossKwargs removal, or (b)
swap ZEROGPU_MODEL_ID to a non-custom-code model like Qwen2.5-3B-Instruct.

Verified the diagnostic chain via VS Code-in-browser terminal on the
deployed Space (dev mode). Tests: 63 passed, 1 skipped (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +12 -8
app.py CHANGED
@@ -366,20 +366,24 @@ def _load_zerogpu_model():
366
  """Load the model + tokenizer once. Called lazily on first request
367
  so module import stays fast (the model weights are tens of GB).
368
 
369
- `trust_remote_code=True` is required for models tagged `custom_code`
370
- on HuggingFace Hub (e.g. Phi-4-mini-instruct uses a custom Phi3-based
371
- config that ships modeling files in the repo)."""
 
 
 
 
 
 
 
 
372
  global _zerogpu_model, _zerogpu_tokenizer
373
  if _zerogpu_model is not None:
374
  return
375
- _zerogpu_tokenizer = _AutoTokenizer.from_pretrained(
376
- ZEROGPU_MODEL_ID,
377
- trust_remote_code=True,
378
- )
379
  _zerogpu_model = _AutoModelForCausalLM.from_pretrained(
380
  ZEROGPU_MODEL_ID,
381
  torch_dtype=_torch.bfloat16,
382
- trust_remote_code=True,
383
  device_map="auto",
384
  )
385
 
 
366
  """Load the model + tokenizer once. Called lazily on first request
367
  so module import stays fast (the model weights are tens of GB).
368
 
369
+ We deliberately do NOT pass `trust_remote_code=True`. Phi-4-mini-instruct's
370
+ architecture is `phi3`, which transformers 4.46+ supports natively via
371
+ `Phi3ForCausalLM` no custom code download required. The custom
372
+ modeling code that ships with the model on HF Hub
373
+ (`modeling_phi3.py`) imports `LossKwargs` from `transformers.utils`,
374
+ which was removed in transformers 4.57+ — so loading WITH
375
+ `trust_remote_code=True` fails with `ImportError: cannot import
376
+ name 'LossKwargs' from 'transformers.utils'` and bricks the
377
+ `@spaces.GPU` worker. Sticking to the native phi3 implementation
378
+ avoids the upstream pin-mismatch entirely.
379
+ """
380
  global _zerogpu_model, _zerogpu_tokenizer
381
  if _zerogpu_model is not None:
382
  return
383
+ _zerogpu_tokenizer = _AutoTokenizer.from_pretrained(ZEROGPU_MODEL_ID)
 
 
 
384
  _zerogpu_model = _AutoModelForCausalLM.from_pretrained(
385
  ZEROGPU_MODEL_ID,
386
  torch_dtype=_torch.bfloat16,
 
387
  device_map="auto",
388
  )
389