Instructions to use DAMO-NLP-MT/polylm-13b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DAMO-NLP-MT/polylm-13b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DAMO-NLP-MT/polylm-13b", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DAMO-NLP-MT/polylm-13b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("DAMO-NLP-MT/polylm-13b", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use DAMO-NLP-MT/polylm-13b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DAMO-NLP-MT/polylm-13b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DAMO-NLP-MT/polylm-13b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DAMO-NLP-MT/polylm-13b

SGLang

How to use DAMO-NLP-MT/polylm-13b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DAMO-NLP-MT/polylm-13b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DAMO-NLP-MT/polylm-13b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DAMO-NLP-MT/polylm-13b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DAMO-NLP-MT/polylm-13b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DAMO-NLP-MT/polylm-13b with Docker Model Runner:
```
docker model run hf.co/DAMO-NLP-MT/polylm-13b
```

The model just print <unk> tokens

by MrBananaHuman - opened Jul 14, 2023

Discussion

MrBananaHuman

Jul 14, 2023

•

edited Jul 14, 2023

I tried to generate sentence using your sample code, but I got just unk tokens

so, I add 'bad_words_ids = [[tokenizer.unk_token_id]]', and the result is

'Beijing is the capital of China. Translate this sentence from English to Chinese. [LEN0] ~~[LEN1] ~~[LEN2] ~~[LEN3] ~~[LEN4] ~~[LEN5] ~~[LEN6] ~~[LEN7] ~~[LEN8] ~~[LEN9] ~~[LEN10] ~~[LEN11] ~~[LEN12] ~~[LEN13] ~~[LEN14] ~~[LEN15] ~~[LEN16] ~~[LEN17]'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~~what is wrong?~~

pemywei

Machine Translation Team at Alibaba DAMO Academy org Jul 14, 2023

I was unable to replicate the problem.

However, I have optimized the sample code and you may try again.

MrBananaHuman

Jul 14, 2023

here is my colab code

https://colab.research.google.com/drive/108YvdvdxzDN62TX9M0d6DsqztXSeLla4?usp=sharing

(I added 'torch_dtype=torch.float16' option due to the colab vram issue)

pemywei

Machine Translation Team at Alibaba DAMO Academy org Jul 14, 2023

We incorporate the bfloat16 numerical format for polylm, fp16 should be problematic.

MrBananaHuman

Jul 14, 2023

oh, i see :) i will test without that option
thank you

MrBananaHuman

Jul 15, 2023

•

edited Jul 15, 2023

This time, I loaded the 1.7b model, but the result is as follows.

"Beijing is the capital of China.\nTranslate this sentence from English to Chinese.\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n"

Please check the same colab link.

jalves97

Jul 24, 2023

I am having the same problem with the 13B model. It only generates UNK tokens.
It does not happen with the 1.7B. Could you help us @pemywei ?
Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment