Spaces:

AshwinP
/

compounding-test

Sleeping

apingali Claude Opus 4.7 (1M context) commited on 8 days ago

Commit

75a8a07

1 Parent(s): 415ec9b

fix(hf-space): drop trust_remote_code=True (Phi-4-mini native phi3 path)

Root cause of the persistent ZeroGPU 'ImportError' identified via HF
Space dev-mode shell:

File "/home/user/.cache/huggingface/modules/transformers_modules/
microsoft/Phi_hyphen_4_hyphen_mini_hyphen_instruct/
cfbefacb99257ffa30c83adab238a50856ac3083/modeling_phi3.py",
line 37, in <module>
from transformers.utils import (
ImportError: cannot import name 'LossKwargs' from 'transformers.utils'
(/usr/local/lib/python3.10/site-packages/transformers/utils/__init__.py)

The custom modeling_phi3.py that ships with Phi-4-mini-instruct on HF
Hub was written against an older transformers version that exported
LossKwargs from transformers.utils. transformers 4.57+ removed it.
Loading with trust_remote_code=True downloads and executes that custom
code, which fails at import time and bricks the @spaces.GPU worker —
hence the sub-3s 'ImportError' on every call.

The fix: drop trust_remote_code=True entirely. Phi-4-mini-instruct's
architecture is `phi3`, which transformers 4.46+ supports natively via
Phi3ForCausalLM. No custom code download needed; native implementation
is already CUDA-ready and includes the same attention/MoE/activation
configuration the custom code provides.

Tradeoff: any future Phi-4 features that depend on the custom code (if
Microsoft ever ships architecture extensions only in modeling_phi3.py)
won't be picked up. None observed today; if it becomes an issue we can
either (a) pin transformers older than the LossKwargs removal, or (b)
swap ZEROGPU_MODEL_ID to a non-custom-code model like Qwen2.5-3B-Instruct.

Verified the diagnostic chain via VS Code-in-browser terminal on the
deployed Space (dev mode). Tests: 63 passed, 1 skipped (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

app.py +12 -8

app.py CHANGED Viewed

@@ -366,20 +366,24 @@ def _load_zerogpu_model():
     """Load the model + tokenizer once. Called lazily on first request
     so module import stays fast (the model weights are tens of GB).
-    `trust_remote_code=True` is required for models tagged `custom_code`
-    on HuggingFace Hub (e.g. Phi-4-mini-instruct uses a custom Phi3-based
-    config that ships modeling files in the repo)."""
     global _zerogpu_model, _zerogpu_tokenizer
     if _zerogpu_model is not None:
         return
-    _zerogpu_tokenizer = _AutoTokenizer.from_pretrained(
-        ZEROGPU_MODEL_ID,
-        trust_remote_code=True,
-    )
     _zerogpu_model = _AutoModelForCausalLM.from_pretrained(
         ZEROGPU_MODEL_ID,
         torch_dtype=_torch.bfloat16,
-        trust_remote_code=True,
         device_map="auto",
     )

     """Load the model + tokenizer once. Called lazily on first request
     so module import stays fast (the model weights are tens of GB).
+    We deliberately do NOT pass `trust_remote_code=True`. Phi-4-mini-instruct's
+    architecture is `phi3`, which transformers 4.46+ supports natively via
+    `Phi3ForCausalLM` — no custom code download required. The custom
+    modeling code that ships with the model on HF Hub
+    (`modeling_phi3.py`) imports `LossKwargs` from `transformers.utils`,
+    which was removed in transformers 4.57+ — so loading WITH
+    `trust_remote_code=True` fails with `ImportError: cannot import
+    name 'LossKwargs' from 'transformers.utils'` and bricks the
+    `@spaces.GPU` worker. Sticking to the native phi3 implementation
+    avoids the upstream pin-mismatch entirely.
+    """
     global _zerogpu_model, _zerogpu_tokenizer
     if _zerogpu_model is not None:
         return
+    _zerogpu_tokenizer = _AutoTokenizer.from_pretrained(ZEROGPU_MODEL_ID)
     _zerogpu_model = _AutoModelForCausalLM.from_pretrained(
         ZEROGPU_MODEL_ID,
         torch_dtype=_torch.bfloat16,
         device_map="auto",
     )