example / TROUBLESHOOTING.md
ericaRC's picture
Add TROUBLESHOOTING.md documenting real deploy failures
a3e7ffe verified

Troubleshooting

Real failures we've hit deploying this repo to Hugging Face Inference Endpoints, and how to fix them. Read this first when the endpoint won't start.


1. Unrecognized model ... Should have a model_type key in its config.json

Endpoint logs end with a giant list of model types (albert, align, ... m2m_100, ... zoedepth) and Application startup failed.

Cause. The Hub repo doesn't actually contain model weights / config.json. Usually happens when model_loader.py was committed to git but never executed against the Hub (pushing the Python file ≠ running it).

Check.

python3 -c "from huggingface_hub import HfApi; print([s.rfilename for s in HfApi().model_info('ericaRC/example').siblings])"

You should see config.json, model.safetensors, tokenizer_config.json, tokenizer.json, handler.py, requirements.txt, README.md. If it's only .gitattributes and scripts, the weights were never pushed.

Fix.

huggingface-cli login
python3 model_loader.py

2. 403 Forbidden on .../info/lfs/objects/batch

push_to_hub dies with HfHubHTTPError: 403 Forbidden: Authorization error.

Cause. Your HF token lacks write access to the target repo. Most commonly: a fine-grained token scoped to your user only, trying to push to an org namespace. Reading works (which is why whoami succeeds) but LFS writes are rejected.

Check.

python3 -c "
from huggingface_hub import HfApi
perms = HfApi().whoami()['auth']['accessToken'].get('fineGrained', {})
for s in perms.get('scoped', []):
    print(s['entity']['type'], s['entity']['name'], '->', s['permissions'])
"

You need an entry matching the target repo's namespace (user or org) that includes repo.write.

Fix. At https://huggingface.co/settings/tokens either:

  • Edit the existing token and add the org with repo.write + repo.content.read + repo.access.read, or
  • Create a new classic "Write" token and huggingface-cli login with it.

3. AttributeError: 'list' object has no attribute 'keys' in _set_model_specific_special_tokens

Endpoint logs show a traceback through tokenization_nllb_fast.pytokenization_utils_base.py and crash on:

self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())

Cause. Transformers-version skew between save time and load time. transformers 5.x introduced an extra_special_tokens field (serialized as a list for NLLB's Flores-200 codes). The Inference Endpoints base image ships a transformers 4.x that expects extra_special_tokens to be a dict and calls .keys() on it.

Check.

python3 -c "
import json
from huggingface_hub import hf_hub_download
cfg = json.load(open(hf_hub_download('ericaRC/example', 'tokenizer_config.json')))
print('extra_special_tokens type:', type(cfg.get('extra_special_tokens')).__name__)
print('additional_special_tokens count:', len(cfg.get('additional_special_tokens') or []))
"

If extra_special_tokens is a non-empty list and additional_special_tokens is empty, you're hitting this.

Fix (already applied to this repo). tokenizer_config.json has been normalized:

  • lang codes live in additional_special_tokens (list — old and new transformers accept this)
  • extra_special_tokens is {} (empty dict — passes .keys() in old transformers, ignored in new)

And requirements.txt pins transformers>=4.40.0,<5.0 to prevent the endpoint from auto-pulling a 5.x that re-introduces the mismatch.

Prevention going forward. When running model_loader.py, use the same transformers major version the endpoint runs:

pip install "transformers<5" "huggingface_hub" "torch"
python3 model_loader.py

Don't save tokenizers from transformers 5.x and load them in a 4.x container (or vice versa) unless you've confirmed the schema matches.


4. Endpoint boots but requests return garbage / wrong language

Cause. src_lang wasn't set on the tokenizer, or forced_bos_token_id wasn't passed at generation time. NLLB needs both.

Check. Look at the request body:

{
  "inputs": "Hello, world!",
  "parameters": { "src_lang": "eng_Latn", "tgt_lang": "fra_Latn" }
}

If you're hitting the endpoint without a parameters block, handler.py falls back to eng_Latn → spa_Latn.

Fix. Always pass src_lang and tgt_lang using Flores-200 codes.


5. Container Type is set to "Text Generation Inference (TGI)"

TGI only supports decoder-only causal LMs. NLLB is seq2seq, so TGI will refuse to load it and handler.py will be ignored.

Fix. In the endpoint's Advanced configuration, set Container Type → Default (the HF inference toolkit). That container picks up handler.py automatically.


Checklist before clicking Deploy

  • HfApi().model_info(REPO).siblings lists config.json, model.safetensors, tokenizer*.json, handler.py, requirements.txt, README.md.
  • tokenizer_config.json has extra_special_tokens: {} (or absent) and additional_special_tokens populated.
  • requirements.txt pins transformers<5.
  • Local smoke test passes:
    from handler import EndpointHandler
    h = EndpointHandler("ericaRC/example")
    print(h({"inputs": "Hello, world!", "parameters": {"src_lang": "eng_Latn", "tgt_lang": "fra_Latn"}}))
    
  • Endpoint Container Type = Default, not TGI.