example / README.md
ericaRC's picture
README: link to TROUBLESHOOTING.md
2152c36 verified
metadata
license: cc-by-nc-4.0
library_name: transformers
pipeline_tag: translation
base_model: facebook/nllb-200-distilled-600M
tags:
  - translation
  - nllb
  - seq2seq
  - endpoints-template
inference: true
language:
  - multilingual

baseline-nllb

A baseline clone of facebook/nllb-200-distilled-600M, packaged for Hugging Face Inference Endpoints with a custom handler so callers can pass arbitrary NLLB Flores-200 language codes at request time.

Deploying to Inference Endpoints

  1. Open this repo on the Hub and click Deploy → Inference Endpoints.
  2. Pick a GPU instance (the 600M model runs fine on a small GPU; a CPU instance also works but is slower).
  3. Leave the container type as Default — the Endpoints runtime will auto-detect handler.py and install requirements.txt.
  4. Deploy.

Request format

{
  "inputs": "Hello, world!",
  "parameters": {
    "src_lang": "eng_Latn",
    "tgt_lang": "spa_Latn",
    "max_length": 256,
    "num_beams": 4
  }
}

inputs may be a single string or a list of strings. src_lang / tgt_lang use the Flores-200 codes (e.g. eng_Latn, spa_Latn, fra_Latn, zho_Hans, arb_Arab). If omitted, the handler defaults to eng_Latnspa_Latn.

Response

[{ "translation_text": "¡Hola, mundo!" }]

Example clients

cURL

curl https://<your-endpoint>.endpoints.huggingface.cloud \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
        "inputs": "Hello, world!",
        "parameters": { "src_lang": "eng_Latn", "tgt_lang": "fra_Latn" }
      }'

Python

import requests

resp = requests.post(
    "https://<your-endpoint>.endpoints.huggingface.cloud",
    headers={"Authorization": f"Bearer {HF_TOKEN}"},
    json={
        "inputs": ["Hello, world!", "How are you?"],
        "parameters": {"src_lang": "eng_Latn", "tgt_lang": "deu_Latn"},
    },
    timeout=30,
)
print(resp.json())

Files in this repo

File Purpose
handler.py Custom EndpointHandler used by HF Inference Endpoints.
requirements.txt Extra Python deps installed into the endpoint container.
model_loader.py One-off script that pushed the base NLLB weights to this repo.
config.json, tokenizer*, *.safetensors Model + tokenizer artifacts (pushed by model_loader.py).
TROUBLESHOOTING.md Real deploy failures we hit and how we fixed them — read this first if the endpoint won't start.

License

Inherits CC-BY-NC-4.0 from the upstream facebook/nllb-200-distilled-600M model — non-commercial use only.