| --- |
| license: cc-by-nc-4.0 |
| library_name: transformers |
| pipeline_tag: translation |
| base_model: facebook/nllb-200-distilled-600M |
| tags: |
| - translation |
| - nllb |
| - seq2seq |
| - endpoints-template |
| inference: true |
| language: |
| - multilingual |
| --- |
| |
| # baseline-nllb |
|
|
| A baseline clone of [`facebook/nllb-200-distilled-600M`](https://huggingface.co/facebook/nllb-200-distilled-600M), packaged for **Hugging Face Inference Endpoints** with a custom handler so callers can pass arbitrary NLLB Flores-200 language codes at request time. |
|
|
| ## Deploying to Inference Endpoints |
|
|
| 1. Open this repo on the Hub and click **Deploy → Inference Endpoints**. |
| 2. Pick a GPU instance (the 600M model runs fine on a small GPU; a CPU instance also works but is slower). |
| 3. Leave the container type as **Default** — the Endpoints runtime will auto-detect [`handler.py`](./handler.py) and install [`requirements.txt`](./requirements.txt). |
| 4. Deploy. |
|
|
| ## Request format |
|
|
| ```json |
| { |
| "inputs": "Hello, world!", |
| "parameters": { |
| "src_lang": "eng_Latn", |
| "tgt_lang": "spa_Latn", |
| "max_length": 256, |
| "num_beams": 4 |
| } |
| } |
| ``` |
|
|
| `inputs` may be a single string or a list of strings. `src_lang` / `tgt_lang` use the [Flores-200 codes](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200) (e.g. `eng_Latn`, `spa_Latn`, `fra_Latn`, `zho_Hans`, `arb_Arab`). If omitted, the handler defaults to `eng_Latn` → `spa_Latn`. |
|
|
| ### Response |
|
|
| ```json |
| [{ "translation_text": "¡Hola, mundo!" }] |
| ``` |
|
|
| ## Example clients |
|
|
| ### cURL |
|
|
| ```bash |
| curl https://<your-endpoint>.endpoints.huggingface.cloud \ |
| -H "Authorization: Bearer $HF_TOKEN" \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "inputs": "Hello, world!", |
| "parameters": { "src_lang": "eng_Latn", "tgt_lang": "fra_Latn" } |
| }' |
| ``` |
|
|
| ### Python |
|
|
| ```python |
| import requests |
| |
| resp = requests.post( |
| "https://<your-endpoint>.endpoints.huggingface.cloud", |
| headers={"Authorization": f"Bearer {HF_TOKEN}"}, |
| json={ |
| "inputs": ["Hello, world!", "How are you?"], |
| "parameters": {"src_lang": "eng_Latn", "tgt_lang": "deu_Latn"}, |
| }, |
| timeout=30, |
| ) |
| print(resp.json()) |
| ``` |
|
|
| ## Files in this repo |
|
|
| | File | Purpose | |
| | --- | --- | |
| | `handler.py` | Custom `EndpointHandler` used by HF Inference Endpoints. | |
| | `requirements.txt` | Extra Python deps installed into the endpoint container. | |
| | `model_loader.py` | One-off script that pushed the base NLLB weights to this repo. | |
| | `config.json`, `tokenizer*`, `*.safetensors` | Model + tokenizer artifacts (pushed by `model_loader.py`). | |
| | `TROUBLESHOOTING.md` | Real deploy failures we hit and how we fixed them — read this first if the endpoint won't start. | |
|
|
| ## License |
|
|
| Inherits `CC-BY-NC-4.0` from the upstream `facebook/nllb-200-distilled-600M` model — **non-commercial use only**. |
|
|