File size: 5,523 Bytes
a3e7ffe | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | # Troubleshooting
Real failures we've hit deploying this repo to Hugging Face Inference Endpoints, and how to fix them. Read this first when the endpoint won't start.
---
## 1. `Unrecognized model ... Should have a model_type key in its config.json`
Endpoint logs end with a giant list of model types (`albert, align, ... m2m_100, ... zoedepth`) and `Application startup failed`.
**Cause.** The Hub repo doesn't actually contain model weights / `config.json`. Usually happens when `model_loader.py` was committed to git but never *executed* against the Hub (pushing the Python file ≠ running it).
**Check.**
```bash
python3 -c "from huggingface_hub import HfApi; print([s.rfilename for s in HfApi().model_info('ericaRC/example').siblings])"
```
You should see `config.json`, `model.safetensors`, `tokenizer_config.json`, `tokenizer.json`, `handler.py`, `requirements.txt`, `README.md`. If it's only `.gitattributes` and scripts, the weights were never pushed.
**Fix.**
```bash
huggingface-cli login
python3 model_loader.py
```
---
## 2. `403 Forbidden` on `.../info/lfs/objects/batch`
`push_to_hub` dies with `HfHubHTTPError: 403 Forbidden: Authorization error.`
**Cause.** Your HF token lacks write access to the target repo. Most commonly: a fine-grained token scoped to your user only, trying to push to an org namespace. Reading works (which is why `whoami` succeeds) but LFS writes are rejected.
**Check.**
```bash
python3 -c "
from huggingface_hub import HfApi
perms = HfApi().whoami()['auth']['accessToken'].get('fineGrained', {})
for s in perms.get('scoped', []):
print(s['entity']['type'], s['entity']['name'], '->', s['permissions'])
"
```
You need an entry matching the target repo's namespace (user or org) that includes `repo.write`.
**Fix.** At https://huggingface.co/settings/tokens either:
- Edit the existing token and add the org with `repo.write` + `repo.content.read` + `repo.access.read`, **or**
- Create a new classic "Write" token and `huggingface-cli login` with it.
---
## 3. `AttributeError: 'list' object has no attribute 'keys'` in `_set_model_specific_special_tokens`
Endpoint logs show a traceback through `tokenization_nllb_fast.py` → `tokenization_utils_base.py` and crash on:
```
self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
```
**Cause.** Transformers-version skew between save time and load time. `transformers` 5.x introduced an `extra_special_tokens` field (serialized as a list for NLLB's Flores-200 codes). The Inference Endpoints base image ships a `transformers` 4.x that expects `extra_special_tokens` to be a dict and calls `.keys()` on it.
**Check.**
```bash
python3 -c "
import json
from huggingface_hub import hf_hub_download
cfg = json.load(open(hf_hub_download('ericaRC/example', 'tokenizer_config.json')))
print('extra_special_tokens type:', type(cfg.get('extra_special_tokens')).__name__)
print('additional_special_tokens count:', len(cfg.get('additional_special_tokens') or []))
"
```
If `extra_special_tokens` is a non-empty `list` and `additional_special_tokens` is empty, you're hitting this.
**Fix (already applied to this repo).** `tokenizer_config.json` has been normalized:
- lang codes live in `additional_special_tokens` (list — old *and* new transformers accept this)
- `extra_special_tokens` is `{}` (empty dict — passes `.keys()` in old transformers, ignored in new)
And `requirements.txt` pins `transformers>=4.40.0,<5.0` to prevent the endpoint from auto-pulling a 5.x that re-introduces the mismatch.
**Prevention going forward.** When running `model_loader.py`, use the same `transformers` major version the endpoint runs:
```bash
pip install "transformers<5" "huggingface_hub" "torch"
python3 model_loader.py
```
Don't save tokenizers from `transformers` 5.x and load them in a 4.x container (or vice versa) unless you've confirmed the schema matches.
---
## 4. Endpoint boots but requests return garbage / wrong language
**Cause.** `src_lang` wasn't set on the tokenizer, or `forced_bos_token_id` wasn't passed at generation time. NLLB needs both.
**Check.** Look at the request body:
```json
{
"inputs": "Hello, world!",
"parameters": { "src_lang": "eng_Latn", "tgt_lang": "fra_Latn" }
}
```
If you're hitting the endpoint without a `parameters` block, `handler.py` falls back to `eng_Latn → spa_Latn`.
**Fix.** Always pass `src_lang` and `tgt_lang` using [Flores-200 codes](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200).
---
## 5. Container Type is set to "Text Generation Inference (TGI)"
TGI only supports decoder-only causal LMs. NLLB is seq2seq, so TGI will refuse to load it and `handler.py` will be ignored.
**Fix.** In the endpoint's Advanced configuration, set **Container Type → Default** (the HF inference toolkit). That container picks up `handler.py` automatically.
---
## Checklist before clicking Deploy
- [ ] `HfApi().model_info(REPO).siblings` lists `config.json`, `model.safetensors`, `tokenizer*.json`, `handler.py`, `requirements.txt`, `README.md`.
- [ ] `tokenizer_config.json` has `extra_special_tokens: {}` (or absent) and `additional_special_tokens` populated.
- [ ] `requirements.txt` pins `transformers<5`.
- [ ] Local smoke test passes:
```python
from handler import EndpointHandler
h = EndpointHandler("ericaRC/example")
print(h({"inputs": "Hello, world!", "parameters": {"src_lang": "eng_Latn", "tgt_lang": "fra_Latn"}}))
```
- [ ] Endpoint Container Type = **Default**, not TGI.
|