File size: 2,842 Bytes
b79dedc
 
 
 
 
 
4043da7
 
 
 
b79dedc
 
4043da7
b79dedc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2152c36
b79dedc
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: cc-by-nc-4.0
library_name: transformers
pipeline_tag: translation
base_model: facebook/nllb-200-distilled-600M
tags:
  - translation
  - nllb
  - seq2seq
  - endpoints-template
inference: true
language:
  - multilingual
---

# baseline-nllb

A baseline clone of [`facebook/nllb-200-distilled-600M`](https://huggingface.co/facebook/nllb-200-distilled-600M), packaged for **Hugging Face Inference Endpoints** with a custom handler so callers can pass arbitrary NLLB Flores-200 language codes at request time.

## Deploying to Inference Endpoints

1. Open this repo on the Hub and click **Deploy → Inference Endpoints**.
2. Pick a GPU instance (the 600M model runs fine on a small GPU; a CPU instance also works but is slower).
3. Leave the container type as **Default** — the Endpoints runtime will auto-detect [`handler.py`](./handler.py) and install [`requirements.txt`](./requirements.txt).
4. Deploy.

## Request format

```json
{
  "inputs": "Hello, world!",
  "parameters": {
    "src_lang": "eng_Latn",
    "tgt_lang": "spa_Latn",
    "max_length": 256,
    "num_beams": 4
  }
}
```

`inputs` may be a single string or a list of strings. `src_lang` / `tgt_lang` use the [Flores-200 codes](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200) (e.g. `eng_Latn`, `spa_Latn`, `fra_Latn`, `zho_Hans`, `arb_Arab`). If omitted, the handler defaults to `eng_Latn``spa_Latn`.

### Response

```json
[{ "translation_text": "¡Hola, mundo!" }]
```

## Example clients

### cURL

```bash
curl https://<your-endpoint>.endpoints.huggingface.cloud \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
        "inputs": "Hello, world!",
        "parameters": { "src_lang": "eng_Latn", "tgt_lang": "fra_Latn" }
      }'
```

### Python

```python
import requests

resp = requests.post(
    "https://<your-endpoint>.endpoints.huggingface.cloud",
    headers={"Authorization": f"Bearer {HF_TOKEN}"},
    json={
        "inputs": ["Hello, world!", "How are you?"],
        "parameters": {"src_lang": "eng_Latn", "tgt_lang": "deu_Latn"},
    },
    timeout=30,
)
print(resp.json())
```

## Files in this repo

| File | Purpose |
| --- | --- |
| `handler.py` | Custom `EndpointHandler` used by HF Inference Endpoints. |
| `requirements.txt` | Extra Python deps installed into the endpoint container. |
| `model_loader.py` | One-off script that pushed the base NLLB weights to this repo. |
| `config.json`, `tokenizer*`, `*.safetensors` | Model + tokenizer artifacts (pushed by `model_loader.py`). |
| `TROUBLESHOOTING.md` | Real deploy failures we hit and how we fixed them — read this first if the endpoint won't start. |

## License

Inherits `CC-BY-NC-4.0` from the upstream `facebook/nllb-200-distilled-600M` model — **non-commercial use only**.