File size: 5,523 Bytes
a3e7ffe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# Troubleshooting

Real failures we've hit deploying this repo to Hugging Face Inference Endpoints, and how to fix them. Read this first when the endpoint won't start.

---

## 1. `Unrecognized model ... Should have a model_type key in its config.json`

Endpoint logs end with a giant list of model types (`albert, align, ... m2m_100, ... zoedepth`) and `Application startup failed`.

**Cause.** The Hub repo doesn't actually contain model weights / `config.json`. Usually happens when `model_loader.py` was committed to git but never *executed* against the Hub (pushing the Python file ≠ running it).

**Check.** 

```bash
python3 -c "from huggingface_hub import HfApi; print([s.rfilename for s in HfApi().model_info('ericaRC/example').siblings])"
```

You should see `config.json`, `model.safetensors`, `tokenizer_config.json`, `tokenizer.json`, `handler.py`, `requirements.txt`, `README.md`. If it's only `.gitattributes` and scripts, the weights were never pushed.

**Fix.**

```bash
huggingface-cli login
python3 model_loader.py
```

---

## 2. `403 Forbidden` on `.../info/lfs/objects/batch`

`push_to_hub` dies with `HfHubHTTPError: 403 Forbidden: Authorization error.`

**Cause.** Your HF token lacks write access to the target repo. Most commonly: a fine-grained token scoped to your user only, trying to push to an org namespace. Reading works (which is why `whoami` succeeds) but LFS writes are rejected.

**Check.**

```bash
python3 -c "
from huggingface_hub import HfApi
perms = HfApi().whoami()['auth']['accessToken'].get('fineGrained', {})
for s in perms.get('scoped', []):
    print(s['entity']['type'], s['entity']['name'], '->', s['permissions'])
"
```

You need an entry matching the target repo's namespace (user or org) that includes `repo.write`.

**Fix.** At https://huggingface.co/settings/tokens either:
- Edit the existing token and add the org with `repo.write` + `repo.content.read` + `repo.access.read`, **or**
- Create a new classic "Write" token and `huggingface-cli login` with it.

---

## 3. `AttributeError: 'list' object has no attribute 'keys'` in `_set_model_specific_special_tokens`

Endpoint logs show a traceback through `tokenization_nllb_fast.py` → `tokenization_utils_base.py` and crash on:

```
self.SPECIAL_TOKENS_ATTRIBUTES + list(special_tokens.keys())
```

**Cause.** Transformers-version skew between save time and load time. `transformers` 5.x introduced an `extra_special_tokens` field (serialized as a list for NLLB's Flores-200 codes). The Inference Endpoints base image ships a `transformers` 4.x that expects `extra_special_tokens` to be a dict and calls `.keys()` on it.

**Check.**

```bash
python3 -c "
import json
from huggingface_hub import hf_hub_download
cfg = json.load(open(hf_hub_download('ericaRC/example', 'tokenizer_config.json')))
print('extra_special_tokens type:', type(cfg.get('extra_special_tokens')).__name__)
print('additional_special_tokens count:', len(cfg.get('additional_special_tokens') or []))
"
```

If `extra_special_tokens` is a non-empty `list` and `additional_special_tokens` is empty, you're hitting this.

**Fix (already applied to this repo).** `tokenizer_config.json` has been normalized:
- lang codes live in `additional_special_tokens` (list — old *and* new transformers accept this)
- `extra_special_tokens` is `{}` (empty dict — passes `.keys()` in old transformers, ignored in new)

And `requirements.txt` pins `transformers>=4.40.0,<5.0` to prevent the endpoint from auto-pulling a 5.x that re-introduces the mismatch.

**Prevention going forward.** When running `model_loader.py`, use the same `transformers` major version the endpoint runs:

```bash
pip install "transformers<5" "huggingface_hub" "torch"
python3 model_loader.py
```

Don't save tokenizers from `transformers` 5.x and load them in a 4.x container (or vice versa) unless you've confirmed the schema matches.

---

## 4. Endpoint boots but requests return garbage / wrong language

**Cause.** `src_lang` wasn't set on the tokenizer, or `forced_bos_token_id` wasn't passed at generation time. NLLB needs both.

**Check.** Look at the request body:

```json
{
  "inputs": "Hello, world!",
  "parameters": { "src_lang": "eng_Latn", "tgt_lang": "fra_Latn" }
}
```

If you're hitting the endpoint without a `parameters` block, `handler.py` falls back to `eng_Latn → spa_Latn`.

**Fix.** Always pass `src_lang` and `tgt_lang` using [Flores-200 codes](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200).

---

## 5. Container Type is set to "Text Generation Inference (TGI)"

TGI only supports decoder-only causal LMs. NLLB is seq2seq, so TGI will refuse to load it and `handler.py` will be ignored.

**Fix.** In the endpoint's Advanced configuration, set **Container Type → Default** (the HF inference toolkit). That container picks up `handler.py` automatically.

---

## Checklist before clicking Deploy

- [ ] `HfApi().model_info(REPO).siblings` lists `config.json`, `model.safetensors`, `tokenizer*.json`, `handler.py`, `requirements.txt`, `README.md`.
- [ ] `tokenizer_config.json` has `extra_special_tokens: {}` (or absent) and `additional_special_tokens` populated.
- [ ] `requirements.txt` pins `transformers<5`.
- [ ] Local smoke test passes:
  ```python
  from handler import EndpointHandler
  h = EndpointHandler("ericaRC/example")
  print(h({"inputs": "Hello, world!", "parameters": {"src_lang": "eng_Latn", "tgt_lang": "fra_Latn"}}))
  ```
- [ ] Endpoint Container Type = **Default**, not TGI.