Instructions to use DAMO-NLP-MT/polylm-13b-fine-grained-shards with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DAMO-NLP-MT/polylm-13b-fine-grained-shards with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DAMO-NLP-MT/polylm-13b-fine-grained-shards", device_map="auto")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DAMO-NLP-MT/polylm-13b-fine-grained-shards")
model = AutoModelForCausalLM.from_pretrained("DAMO-NLP-MT/polylm-13b-fine-grained-shards", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use DAMO-NLP-MT/polylm-13b-fine-grained-shards with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DAMO-NLP-MT/polylm-13b-fine-grained-shards"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DAMO-NLP-MT/polylm-13b-fine-grained-shards",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DAMO-NLP-MT/polylm-13b-fine-grained-shards

SGLang

How to use DAMO-NLP-MT/polylm-13b-fine-grained-shards with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DAMO-NLP-MT/polylm-13b-fine-grained-shards" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DAMO-NLP-MT/polylm-13b-fine-grained-shards",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DAMO-NLP-MT/polylm-13b-fine-grained-shards" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DAMO-NLP-MT/polylm-13b-fine-grained-shards",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DAMO-NLP-MT/polylm-13b-fine-grained-shards with Docker Model Runner:
```
docker model run hf.co/DAMO-NLP-MT/polylm-13b-fine-grained-shards
```

Model Details

Abstract

Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English.

Model Description

The only difference between this model card and polylm-13B is that it includes finer grained shards.

Citation

BibTeX:

@misc{wei2023polylm,
      title={PolyLM: An Open Source Polyglot Large Language Model}, 
      author={Xiangpeng Wei and Haoran Wei and Huan Lin and Tianhao Li and Pei Zhang and Xingzhang Ren and Mei Li and Yu Wan and Zhiwei Cao and Binbin Xie and Tianxiang Hu and Shangjie Li and Binyuan Hui and Bowen Yu and Dayiheng Liu and Baosong Yang and Fei Huang and Jun Xie},
      year={2023},
      eprint={2307.06018},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Downloads last month: 5

Model tree for DAMO-NLP-MT/polylm-13b-fine-grained-shards

Adapters

1 model

Space using DAMO-NLP-MT/polylm-13b-fine-grained-shards 1

Paper for DAMO-NLP-MT/polylm-13b-fine-grained-shards

PolyLM: An Open Source Polyglot Large Language Model

Paper • 2307.06018 • Published Jul 12, 2023 • 27