ericaRC
/

example

text2text-generation

endpoints-template

Model card Files Files and versions

example / README.md

ericaRC's picture

README: link to TROUBLESHOOTING.md

2152c36 verified 17 days ago

|

history blame contribute delete

2.84 kB

	---
	license: cc-by-nc-4.0
	library_name: transformers
	pipeline_tag: translation
	base_model: facebook/nllb-200-distilled-600M
	tags:
	- translation
	- nllb
	- seq2seq
	- endpoints-template
	inference: true
	language:
	- multilingual
	---

	# baseline-nllb

	A baseline clone of [`facebook/nllb-200-distilled-600M`](https://huggingface.co/facebook/nllb-200-distilled-600M), packaged for Hugging Face Inference Endpoints with a custom handler so callers can pass arbitrary NLLB Flores-200 language codes at request time.

	## Deploying to Inference Endpoints

	1. Open this repo on the Hub and click Deploy → Inference Endpoints.
	2. Pick a GPU instance (the 600M model runs fine on a small GPU; a CPU instance also works but is slower).
	3. Leave the container type as Default — the Endpoints runtime will auto-detect [`handler.py`](./handler.py) and install [`requirements.txt`](./requirements.txt).
	4. Deploy.

	## Request format

	```json
	{
	"inputs": "Hello, world!",
	"parameters": {
	"src_lang": "eng_Latn",
	"tgt_lang": "spa_Latn",
	"max_length": 256,
	"num_beams": 4
	}
	}
	```

	`inputs` may be a single string or a list of strings. `src_lang` / `tgt_lang` use the [Flores-200 codes](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200) (e.g. `eng_Latn`, `spa_Latn`, `fra_Latn`, `zho_Hans`, `arb_Arab`). If omitted, the handler defaults to `eng_Latn` → `spa_Latn`.

	### Response

	```json
	[{ "translation_text": "¡Hola, mundo!" }]
	```

	## Example clients

	### cURL

	```bash
	curl https://<your-endpoint>.endpoints.huggingface.cloud \
	-H "Authorization: Bearer $HF_TOKEN" \
	-H "Content-Type: application/json" \
	-d '{
	"inputs": "Hello, world!",
	"parameters": { "src_lang": "eng_Latn", "tgt_lang": "fra_Latn" }
	}'
	```

	### Python

	```python
	import requests

	resp = requests.post(
	"https://<your-endpoint>.endpoints.huggingface.cloud",
	headers={"Authorization": f"Bearer {HF_TOKEN}"},
	json={
	"inputs": ["Hello, world!", "How are you?"],
	"parameters": {"src_lang": "eng_Latn", "tgt_lang": "deu_Latn"},
	},
	timeout=30,
	)
	print(resp.json())
	```

	## Files in this repo

	\| File \| Purpose \|
	\| --- \| --- \|
	\| `handler.py` \| Custom `EndpointHandler` used by HF Inference Endpoints. \|
	\| `requirements.txt` \| Extra Python deps installed into the endpoint container. \|
	\| `model_loader.py` \| One-off script that pushed the base NLLB weights to this repo. \|
	\| `config.json`, `tokenizer`, `.safetensors` \| Model + tokenizer artifacts (pushed by `model_loader.py`). \|
	\| `TROUBLESHOOTING.md` \| Real deploy failures we hit and how we fixed them — read this first if the endpoint won't start. \|

	## License

	Inherits `CC-BY-NC-4.0` from the upstream `facebook/nllb-200-distilled-600M` model — non-commercial use only.