Add TROUBLESHOOTING.md documenting real deploy failures
Browse files- TROUBLESHOOTING.md +134 -0
TROUBLESHOOTING.md
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Troubleshooting
|
| 2 |
+
|
| 3 |
+
Real failures we've hit deploying this repo to Hugging Face Inference Endpoints, and how to fix them. Read this first when the endpoint won't start.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## 1. `Unrecognized model ... Should have a model_type key in its config.json`
|
| 8 |
+
|
| 9 |
+
Endpoint logs end with a giant list of model types (`albert, align, ... m2m_100, ... zoedepth`) and `Application startup failed`.
|
| 10 |
+
|
| 11 |
+
**Cause.** The Hub repo doesn't actually contain model weights / `config.json`. Usually happens when `model_loader.py` was committed to git but never *executed* against the Hub (pushing the Python file ≠ running it).
|
| 12 |
+
|
| 13 |
+
**Check.**
|
| 14 |
+
|
| 15 |
+
```bash
|
| 16 |
+
python3 -c "from huggingface_hub import HfApi; print([s.rfilename for s in HfApi().model_info('ericaRC/example').siblings])"
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
You should see `config.json`, `model.safetensors`, `tokenizer_config.json`, `tokenizer.json`, `handler.py`, `requirements.txt`, `README.md`. If it's only `.gitattributes` and scripts, the weights were never pushed.
|
| 20 |
+
|
| 21 |
+
**Fix.**
|
| 22 |
+
|
| 23 |
+
```bash
|
| 24 |
+
huggingface-cli login
|
| 25 |
+
python3 model_loader.py
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## 2. `403 Forbidden` on `.../info/lfs/objects/batch`
|
| 31 |
+
|
| 32 |
+
`push_to_hub` dies with `HfHubHTTPError: 403 Forbidden: Authorization error.`
|
| 33 |
+
|
| 34 |
+
**Cause.** Your HF token lacks write access to the target repo. Most commonly: a fine-grained token scoped to your user only, trying to push to an org namespace. Reading works (which is why `whoami` succeeds) but LFS writes are rejected.
|
| 35 |
+
|
| 36 |
+
**Check.**
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
python3 -c "
|
| 40 |
+
from huggingface_hub import HfApi
|
| 41 |
+
perms = HfApi().whoami()['auth']['accessToken'].get('fineGrained', {})
|
| 42 |
+
for s in perms.get('scoped', []):
|
| 43 |
+
print(s['entity']['type'], s['entity']['name'], '->', s['permissions'])
|
| 44 |
+
"
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
You need an entry matching the target repo's namespace (user or org) that includes `repo.write`.
|
| 48 |
+
|
| 49 |
+
**Fix.** At https://huggingface.co/settings/tokens either:
|
| 50 |
+
- Edit the existing token and add the org with `repo.write` + `repo.content.read` + `repo.access.read`, **or**
|
| 51 |
+
- Create a new classic "Write" token and `huggingface-cli login` with it.
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## 3. `AttributeError: 'list' object has no attribute 'keys'` in `_set_model_specific_special_tokens`
|
| 56 |
+
|
| 57 |
+
Endpoint logs show a traceback through `tokenization_nllb_fast.py` → `tokenization_utils_base.py` and crash on:
|
| 58 |
+
|
| 59 |
+
```
|
| 60 |
+
self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
**Cause.** Transformers-version skew between save time and load time. `transformers` 5.x introduced an `extra_special_tokens` field (serialized as a list for NLLB's Flores-200 codes). The Inference Endpoints base image ships a `transformers` 4.x that expects `extra_special_tokens` to be a dict and calls `.keys()` on it.
|
| 64 |
+
|
| 65 |
+
**Check.**
|
| 66 |
+
|
| 67 |
+
```bash
|
| 68 |
+
python3 -c "
|
| 69 |
+
import json
|
| 70 |
+
from huggingface_hub import hf_hub_download
|
| 71 |
+
cfg = json.load(open(hf_hub_download('ericaRC/example', 'tokenizer_config.json')))
|
| 72 |
+
print('extra_special_tokens type:', type(cfg.get('extra_special_tokens')).__name__)
|
| 73 |
+
print('additional_special_tokens count:', len(cfg.get('additional_special_tokens') or []))
|
| 74 |
+
"
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
If `extra_special_tokens` is a non-empty `list` and `additional_special_tokens` is empty, you're hitting this.
|
| 78 |
+
|
| 79 |
+
**Fix (already applied to this repo).** `tokenizer_config.json` has been normalized:
|
| 80 |
+
- lang codes live in `additional_special_tokens` (list — old *and* new transformers accept this)
|
| 81 |
+
- `extra_special_tokens` is `{}` (empty dict — passes `.keys()` in old transformers, ignored in new)
|
| 82 |
+
|
| 83 |
+
And `requirements.txt` pins `transformers>=4.40.0,<5.0` to prevent the endpoint from auto-pulling a 5.x that re-introduces the mismatch.
|
| 84 |
+
|
| 85 |
+
**Prevention going forward.** When running `model_loader.py`, use the same `transformers` major version the endpoint runs:
|
| 86 |
+
|
| 87 |
+
```bash
|
| 88 |
+
pip install "transformers<5" "huggingface_hub" "torch"
|
| 89 |
+
python3 model_loader.py
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
Don't save tokenizers from `transformers` 5.x and load them in a 4.x container (or vice versa) unless you've confirmed the schema matches.
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
## 4. Endpoint boots but requests return garbage / wrong language
|
| 97 |
+
|
| 98 |
+
**Cause.** `src_lang` wasn't set on the tokenizer, or `forced_bos_token_id` wasn't passed at generation time. NLLB needs both.
|
| 99 |
+
|
| 100 |
+
**Check.** Look at the request body:
|
| 101 |
+
|
| 102 |
+
```json
|
| 103 |
+
{
|
| 104 |
+
"inputs": "Hello, world!",
|
| 105 |
+
"parameters": { "src_lang": "eng_Latn", "tgt_lang": "fra_Latn" }
|
| 106 |
+
}
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
If you're hitting the endpoint without a `parameters` block, `handler.py` falls back to `eng_Latn → spa_Latn`.
|
| 110 |
+
|
| 111 |
+
**Fix.** Always pass `src_lang` and `tgt_lang` using [Flores-200 codes](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200).
|
| 112 |
+
|
| 113 |
+
---
|
| 114 |
+
|
| 115 |
+
## 5. Container Type is set to "Text Generation Inference (TGI)"
|
| 116 |
+
|
| 117 |
+
TGI only supports decoder-only causal LMs. NLLB is seq2seq, so TGI will refuse to load it and `handler.py` will be ignored.
|
| 118 |
+
|
| 119 |
+
**Fix.** In the endpoint's Advanced configuration, set **Container Type → Default** (the HF inference toolkit). That container picks up `handler.py` automatically.
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## Checklist before clicking Deploy
|
| 124 |
+
|
| 125 |
+
- [ ] `HfApi().model_info(REPO).siblings` lists `config.json`, `model.safetensors`, `tokenizer*.json`, `handler.py`, `requirements.txt`, `README.md`.
|
| 126 |
+
- [ ] `tokenizer_config.json` has `extra_special_tokens: {}` (or absent) and `additional_special_tokens` populated.
|
| 127 |
+
- [ ] `requirements.txt` pins `transformers<5`.
|
| 128 |
+
- [ ] Local smoke test passes:
|
| 129 |
+
```python
|
| 130 |
+
from handler import EndpointHandler
|
| 131 |
+
h = EndpointHandler("ericaRC/example")
|
| 132 |
+
print(h({"inputs": "Hello, world!", "parameters": {"src_lang": "eng_Latn", "tgt_lang": "fra_Latn"}}))
|
| 133 |
+
```
|
| 134 |
+
- [ ] Endpoint Container Type = **Default**, not TGI.
|