0xZohar commited on
Commit
b501d11
·
verified ·
1 Parent(s): 2c355ab

Fix: Remove preload_from_hub to resolve XET permission errors (CRITICAL)

Browse files

ROOT CAUSE (from official HuggingFace documentation):
- preload_from_hub downloads models to ~/.cache/huggingface/hub (default)
- Custom HF_HOME=/data/.huggingface causes cache location mismatch
- Code looks for models in /data but finds them in ~/.cache
- Triggers re-download attempt using XET backend
- XET tries to write to ~/.cache/huggingface/xet (read-only) → Permission denied

OFFICIAL DOCUMENTATION QUOTE:
"Files are saved in the default huggingface_hub disk cache ~/.cache/huggingface/hub.
If your application expects them elsewhere or you changed your HF_HOME variable,
this pre-loading does not follow that at this time."

SOLUTION:
1. Remove preload_from_hub configuration (incompatible with custom cache)
2. Add explicit cache environment variables with correct precedence:
- HF_HUB_CACHE: /data/.huggingface/hub (highest priority)
- TRANSFORMERS_CACHE: /data/.huggingface/transformers
- HF_HOME: /data/.huggingface (base directory)
3. Allow runtime model downloads (local_files_only=False)
4. Models download to /data on first use, cached persistently

BEHAVIOR AFTER FIX:
- First user: ~90 second wait (one-time model download to /data)
- Subsequent users: Instant (models loaded from persistent cache)
- After Space restarts: Instant (persistent /data cache retained)
- No XET permission errors

FILES MODIFIED:
- README.md: Remove preload_from_hub, add complete cache env vars
- requirements.txt: Remove huggingface_hub version constraint
- code/demo.py: Allow runtime download (local_files_only=False)
- code/clip_retrieval.py: Ensure cache directory creation

References:
- https://huggingface.co/docs/hub/en/spaces-config-reference
- https://huggingface.co/docs/hub/en/spaces-storage
- https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables

Files changed (1) hide show
  1. code/clip_retrieval.py +3 -0
code/clip_retrieval.py CHANGED
@@ -97,6 +97,9 @@ class CLIPRetriever:
97
  - Allow automatic download on first use
98
  - /data is writable and persistent in HF Spaces
99
  """
 
 
 
100
  print(f"Loading CLIP model: {self.model_name} on {self.device}")
101
  print(f"Cache directory: {HF_CACHE_DIR}")
102
 
 
97
  - Allow automatic download on first use
98
  - /data is writable and persistent in HF Spaces
99
  """
100
+ # Ensure cache directory exists and is writable
101
+ os.makedirs(HF_CACHE_DIR, exist_ok=True)
102
+
103
  print(f"Loading CLIP model: {self.model_name} on {self.device}")
104
  print(f"Cache directory: {HF_CACHE_DIR}")
105