Instructions to use Duxiaoman-DI/XuanYuan2-70B-Chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Duxiaoman-DI/XuanYuan2-70B-Chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Duxiaoman-DI/XuanYuan2-70B-Chat")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Duxiaoman-DI/XuanYuan2-70B-Chat") model = AutoModelForCausalLM.from_pretrained("Duxiaoman-DI/XuanYuan2-70B-Chat") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Duxiaoman-DI/XuanYuan2-70B-Chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Duxiaoman-DI/XuanYuan2-70B-Chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Duxiaoman-DI/XuanYuan2-70B-Chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Duxiaoman-DI/XuanYuan2-70B-Chat
- SGLang
How to use Duxiaoman-DI/XuanYuan2-70B-Chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Duxiaoman-DI/XuanYuan2-70B-Chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Duxiaoman-DI/XuanYuan2-70B-Chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Duxiaoman-DI/XuanYuan2-70B-Chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Duxiaoman-DI/XuanYuan2-70B-Chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Duxiaoman-DI/XuanYuan2-70B-Chat with Docker Model Runner:
docker model run hf.co/Duxiaoman-DI/XuanYuan2-70B-Chat
ไป็ป
XuanYuan2-70B็ณปๅๆจกๅๆฏๅจXuanYuan-70Bๅบๅบงๆจกๅๅบ็กไธ๏ผไฝฟ็จๆดๅค้ซ่ดจ้็่ฏญๆ่ฟ่ก็ปง็ปญ้ข่ฎญ็ปๅๆไปคๅพฎ่ฐ๏ผๅนถ่ฟ่กๅบไบไบบ็ฑปๅ้ฆ็ๅผบๅ่ฎญ็ป่ๅพๅฐใ็ธๆฏ็ฌฌไธไปฃXuanYuan-70B็ณปๅๆจกๅ๏ผ็ฌฌไบไปฃๆจกๅๅจ้็จๆงใๅฎๅ จๆงๅ้่่ฝๅไธ้ฝๅพๅฐไบๆๆพๆ้ซ๏ผๆจกๅ่พๅบๆดๅ ็ฌฆๅไบบ็ฑปๅๅฅฝใๅๆถ๏ผ็ฌฌไบไปฃๆจกๅๆฏๆ็ไธไธๆ้ฟๅบฆ่พพๅฐ16k๏ผ่ฝๅคๆดๅฅฝๅค็้ฟๆๆฌ่พๅ ฅ๏ผ้็จ่ๅดๆดไธบๅนฟๆณใๆจกๅ็ป่่ฏทๅ่ๆๆกฃ๏ผReport
XuanYuan2-70B็ณปๅๅ ฑๅ ๅซ4ไธชๆจกๅ๏ผๅ ๆฌๅบๅบงๆจกๅXuanYuan2-70B๏ผchatๆจกๅXuanYuan2-70B-Chat๏ผchatๆจกๅ็้ๅ็ๆฌXuanYuan2-70B-Chat-8bitๅXuanYuan2-70B-Chat-4bitใๅไธชๆจกๅ็ไธ่ฝฝ้พๆฅไธบ๏ผ
| ๅบๅบงๆจกๅ | Chatๆจกๅ | 8-bit้ๅChatๆจกๅ | 4-bit้ๅChatๆจกๅ |
|---|---|---|---|
| ๐ค XuanYuan2-70B | ๐ค XuanYuan2-70B-Chat | ๐ค XuanYuan2-70B-Chat-8bit | ๐ค XuanYuan2-70B-Chat-4bit |
ไธป่ฆ็น็น๏ผ
- ไฝฟ็จๆดๅค้ซ่ดจ้็ๆฐๆฎ่ฟ่ก็ปง็ปญ้ข่ฎญ็ปๅๆไปคๅพฎ่ฐ๏ผๅ้กน่ฝๅๆ็ปญๆๅ
- ๆฏๆ็ไธไธๆ้ฟๅบฆ่พพๅฐไบ16k๏ผไฝฟ็จ่ๅดๆดๅนฟ
- ๅบไบไบบ็ฑป็ๅ้ฆไฟกๆฏ่ฟ่กๅผบๅ่ฎญ็ป๏ผ่ฟไธๆญฅๅฏน้ฝไบไบบ็ฑปๅๅฅฝ
ๆจกๅ่ฎญ็ป
ๅจXuanYuan-70Bๅบๅบงๆจกๅ็ๅบ็กไธ๏ผๆไปฌๆ็ปญๅ ๅ ฅๆด้ซ่ดจ้็้ข่ฎญ็ปๆฐๆฎ่ฟ่ก่ฎญ็ปใๅๆถไธบไบๅ ผ้กพ่ฎญ็ปๆ็ๅ้ฟๆๆฌๅปบๆจก๏ผๆๅบไบไธ็งๆฐๆฎๅๆกถ็ๅจๆ้ข่ฎญ็ปๆนๆณใๅบไบๆฐๆฎๅๆกถๆนๅผ๏ผๆไปฌๅจ็ฌฌไธไปฃXuanYuan-70Bๅบๅบงๆจกๅ็ๅบ็กไธ้ขๅค่ฎญ็ปไบๅคง้tokensๅพๅฐXuanYuan2-70Bๅบๅบงๆจกๅ๏ผๆจกๅ็ไธญๆ็่งฃใ้่็ฅ่ฏ็ญๆๆ ่ฏๆตๅ่พพๅฐไธๅๅน ๅบฆ็ๆๅใ
ๅบไบXuanYuan2-70Bๅบๅบงๆจกๅ๏ผๆไปฌ้ๆฐๅฉ็จๆดๅค้ซ่ดจ้็ๆไปคๅพฎ่ฐๆฐๆฎๆฅ่ฟ่กๆไปคๅฏน้ฝ๏ผไธป่ฆๆๅ็ๆนๅๆฏ้็จไธ้่็ฑปๅ็ๆไปคๆฐๆฎ่ดจ้ๅๅคๆ ทๆงใ
ๅฏนไบๆไปคๅพฎ่ฐๅ็ๆจกๅ๏ผๆไปฌๆๅปบ้ซ่ดจ้็ๅๅฅฝๆฐๆฎๅpromptๆฐๆฎ๏ผ่ฟ่กไบๅบไบไบบ็ฑปๅ้ฆ็ๅผบๅ่ฎญ็ป๏ผReinforcement learning with human feedback๏ผRLHF๏ผ๏ผ่ฟไธๆญฅๅฏน้ฝไบๆจกๅไธไบบ็ฑป็ๅๅฅฝ๏ผไฝฟๆจกๅ่กจ็ฐ่ฝๆด็ฌฆๅไบบ็ฑป้ๆฑใๆจกๅๅจ้็จๆงใๅฎๅ จๆงใ้่้ขๅๅ ็่กจ็ฐๆไบ่พๆๆพ็ๆๅใ
ๆง่ฝ่ฏๆต
็ฑปไผผXuanYuan-70B๏ผๆไปฌไนๅฏนXuanYuan2-70B่ฟ่กไบ้็จๆง่ฏๆตๅ้่่ฏๆตใ
้็จ่ฏๆต
้็จ่ฏๆต็็ฎๆ ๆฏ่งๅฏXuanYuan2-70Bๅจไฝฟ็จๆดๅค้ซ่ดจ้ๆฐๆฎ่ฟ่ก็ปง็ปญ้ข่ฎญ็ปๅ๏ผ่ฑๆ่ฝๅๆฏๅฆๅพๅฐไบไฟๆ๏ผไธญๆ่ฝๅๆฏๅฆๅพๅฐไบๅขๅผบใๅๆ ท๏ผๆไปฌไน้ๆฉMMLUๆฅๆต่ฏๆจกๅๅจ่ฑๆๅบๆฏไธ็้็จ่ฝๅ๏ผๅๆถไฝฟ็จCEVALๅCMMLUๆฅๆต่ฏๆจกๅๅจไธญๆๅบๆฏไธ็ๅ้กน่ฝๅใ่ฏๆต็ปๆๅฆไธ่กจๆ็คบใไป่กจไธญๅฏไปฅ็ๅบ๏ผ็ธๆฏXuanYuan-70B๏ผXuanYuan2-70B็ไธญๆ่ฝๅๅพๅฐไบ่ฟไธๆญฅๆๅ๏ผๅๆถ่ฑๆ่ฝๅไนๆฒกๆๅบ็ฐๆๆพ็ไธ้๏ผๆดไฝ่กจ็ฐ็ฌฆๅ้ขๆใ่ฟไธๆน้ข่ฏๆไบๆไปฌๆๅ็ๅ้กนไผๅ็ๆๆๆง๏ผๅฆไธๆน้ขไนๆพ็คบๅบไบXuanYuan2-70Bๅผบๅคง็้็จ่ฝๅใๅผๅพๆณจๆ็ๆฏ๏ผๆฆๅ็ปๆๅนถไธๅฎๅ จไปฃ่กจๆจกๅ็ๅฎ้ ๆง่ฝ่กจ็ฐ๏ผๅณไพฟๅจCEVALๅCMMLUไธๆไปฌ็่ฏๆต็ปๆ่ถ ่ฟไบGPT4๏ผไฝๅฎ้ ไธญๆไปฌๆจกๅ็่กจ็ฐๅGPT4่ฟๅญๅจๆๆพ็ๅทฎ่ท๏ผๆไปฌๅฐ็ปง็ปญไผๅๅๆๅ่ฝฉ่พๆจกๅ็ๅ้กน่ฝๅใ
| ๆจกๅ | MMLU | CEVAL | CMMLU |
|---|---|---|---|
| LLaMA2-70B | 68.9 | 52.1 | 53.11 |
| XuanYuan-70B | 70.9 | 71.9 | 71.10 |
| XuanYuan2-70B | 70.8 | 72.7 | 72.7 |
| GPT4 | 83.93 | 68.4 | 70.95 |
้่่ฏๆต
ๆไปฌๅจFinanceIQไธ่ฏๆตไบๆจกๅ็้่่ฝๅใFinanceIQๆฏไธไธชไธไธ็้่้ขๅ่ฏๆต้๏ผๅ ถๆถต็ไบ10ไธช้่ๅคง็ฑปๅ36ไธช้่ๅฐ็ฑป๏ผๆป่ฎก7173ไธชๅ้กน้ๆฉ้ข๏ผๆ็ง็จๅบฆไธๅฏๅฎข่งๅๅบๆจกๅ็้่่ฝๅใ่ฏๆต็ปๆๅฆไธ่กจๆ็คบใไป่กจไธญ็ปๆๅฏไปฅ็ๅบ๏ผ็ป่ฟ็ปง็ปญไผๅ่ฎญ็ปๅ๏ผXuanYuan2-70B็็ปผๅ้่่ฝๅๅพๅฐไบ่ฟไธๆญฅๆๅ๏ผ่ฟๅๆฌก่ฏๆไบๆไปฌๆๅ็ไธ็ณปๅไผๅ็ๆๆๆงใๅๆถๆไปฌไนๅ็ฐไธไบ็ปๅ็ฑป็ฎไธๆจกๅ็่ฝๅๅบ็ฐไบไธๅฎ็จๅบฆ็้ๅ๏ผ่ฟ่ฏดๆๆจกๅไปๅญๅจไธๅฎ็ไผๅ็ฉบ้ด๏ผๆไปฌๅฐ็ปง็ปญไผๅๆๅ่ฝฉ่พๆจกๅ็้่่ฝๅใ
| ๆจกๅ | ๅนณๅๅ | ๆณจๅไผ่ฎกๅธ | ้ถ่กไปไธ่ตๆ ผ | ่ฏๅธไปไธ่ตๆ ผ | ๅบ้ไปไธ่ตๆ ผ | ไฟ้ฉไปไธ่ตๆ ผ | ็ปๆตๅธ | ็จๅกๅธ | ๆ่ดงไปไธ่ตๆ ผ | ็่ดข่งๅๅธ | ็ฒพ็ฎๅธ |
|---|---|---|---|---|---|---|---|---|---|---|---|
| XuanYuan-70B | 67.56 | 69.49 | 76.40 | 69.56 | 74.89 | 67.82 | 84.81 | 58.4 | 71.59 | 65.15 | 37.50 |
| XuanYuan2-70B | 67.83 | 68.63 | 69.72 | 79.1 | 71.51 | 69.68 | 84.81 | 58.2 | 72.98 | 71.86 | 31.82 |
| GPT4 | 60.05 | 52.33 | 68.72 | 64.8 | 68.81 | 68.68 | 75.58 | 46.93 | 63.51 | 63.84 | 27.27 |
ๅฟซ้ไฝฟ็จ
XuanYuan2-70B็ณปๅๆจกๅ็็กฌไปถ้ๆฑใ่ฝฏไปถไพ่ตใBaseๅChatๆจกๅไฝฟ็จๆนๆณๅXuanYuan-70B็ณปๅๆจกๅไธ่ดใ่ฏทๅ่XuanYuan-70B็ณปๅๆจกๅ็ไป็ปๅ ๅฎนใ
ไธบ้ไฝ็กฌไปถ้ๆฑ๏ผๆไปฌไนๆไพไบXuanYuan2-70B-Chatๆจกๅ็8bitๅ4bit้ๅ็ๆฌใ
8bitๆจกๅ
ๅจ8bit้ๅ็ฎๆณไธ๏ผๆไปฌไฝฟ็จ็ฎๅ็คพๅบๅนฟๆณไฝฟ็จ็bitsandbytesๅบใ็ปๆต่ฏ๏ผ8bit้ๅๅฏนๆจกๅ็ๆง่ฝๆๅคฑๅพไฝใ8bitๆจกๅ็ไฝฟ็จๆนๅผๅฆไธๆ็คบ๏ผ้ๆณจๆpromoptๆ ผๅผ๏ผๆไปฌๅจ่ฎญ็ปๆถ่ฎพ็ฝฎไบsystem message๏ผ๏ผ
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
model_name_or_path = "/your/model/path"
tokenizer = LlamaTokenizer.from_pretrained(model_name_or_path, use_fast=False, legacy=True)
model = LlamaForCausalLM.from_pretrained(model_name_or_path,torch_dtype=torch.float16, device_map="auto")
system_message = "ไปฅไธๆฏ็จๆทๅไบบๅทฅๆบ่ฝๅฉๆไน้ด็ๅฏน่ฏใ็จๆทไปฅHumanๅผๅคด๏ผไบบๅทฅๆบ่ฝๅฉๆไปฅAssistantๅผๅคด๏ผไผๅฏนไบบ็ฑปๆๅบ็้ฎ้ข็ปๅบๆๅธฎๅฉใ้ซ่ดจ้ใ่ฏฆ็ปๅ็คผ่ฒ็ๅ็ญ๏ผๅนถไธๆปๆฏๆ็ปๅไธ ไธไธ้ๅพทใไธๅฎๅ
จใๆไบ่ฎฎใๆฟๆฒปๆๆ็ญ็ธๅ
ณ็่ฏ้ขใ้ฎ้ขๅๆ็คบใ\n"
seps = [" ", "</s>"]
roles = ["Human", "Assistant"]
content = "ไป็ปไธไฝ ่ชๅทฑ"
prompt = system_message + seps[0] + roles[0] + ": " + content + seps[0] + roles[1] + ":"
print(f"่พๅ
ฅ: {content}")
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1)
outputs = tokenizer.decode(outputs.cpu()[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(f"่พๅบ: {outputs}")
4bitๆจกๅ๏ผ
ๅจ4bit้ๅ็ฎๆณไธ๏ผๆไปฌไฝฟ็จauto-gptqๅทฅๅ ทใ4bitๆจกๅไฝฟ็จๆนๅผๅฆไธๆ็คบ๏ผๅๆ ท๏ผ้่ฆๅฏน้ฝๆไปฌ็promptๆ ผๅผ๏ผ
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
from auto_gptq import AutoGPTQForCausalLM
model_name_or_path = "/your/model/path"
tokenizer = LlamaTokenizer.from_pretrained(model_name_or_path, use_fast=False, legacy=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,torch_dtype=torch.float16, device_map="auto")
system_message = "ไปฅไธๆฏ็จๆทๅไบบๅทฅๆบ่ฝๅฉๆไน้ด็ๅฏน่ฏใ็จๆทไปฅHumanๅผๅคด๏ผไบบๅทฅๆบ่ฝๅฉๆไปฅAssistantๅผๅคด๏ผไผๅฏนไบบ็ฑปๆๅบ็้ฎ้ข็ปๅบๆๅธฎๅฉใ้ซ่ดจ้ใ่ฏฆ็ปๅ็คผ่ฒ็ๅ็ญ๏ผๅนถไธๆปๆฏๆ็ปๅไธ ไธไธ้ๅพทใไธๅฎๅ
จใๆไบ่ฎฎใๆฟๆฒปๆๆ็ญ็ธๅ
ณ็่ฏ้ขใ้ฎ้ขๅๆ็คบใ\n"
seps = [" ", "</s>"]
roles = ["Human", "Assistant"]
content = "ไป็ปไธไฝ ่ชๅทฑ"
prompt = system_message + seps[0] + roles[0] + ": " + content + seps[0] + roles[1] + ":"
print(f"่พๅ
ฅ: {content}")
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1)
outputs = tokenizer.decode(outputs.cpu()[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(f"่พๅบ: {outputs}")
ๅจvLLMไธไฝฟ็จ4bitๆจกๅ๏ผ
ๆฎ้HuggingFace็ๆจ็่ๆฌ่ฟ่กgptq้ๅ็4bitๆจกๅๆถ๏ผๆจ็็้ๅบฆๅพๆ ข๏ผๅนถไธๅฎ็จใ่ๆๆฐ็ๆฌ็vLLMๅทฒ็ปๆฏๆๅ ๅซgptqๅจๅ ็ๅค็ง้ๅๆจกๅ็ๅ ่ฝฝ๏ผvLLMไพ้ ้ๅ็ๅ ้็ฎๅญไปฅๅpagedAttention๏ผcontinue batchingไปฅๅไธไบ่ฐๅบฆๆบๅถ๏ผๅฏไปฅๅฎ็ฐ่ณๅฐ10ๅ็ๆจ็ๅๅ็ๆๅใ
ๆจๅฏไปฅๅฎ่ฃ ๆๆฐ็ๆฌ็vLLMๅนถไฝฟ็จไปฅไธ่ๆฌไฝฟ็จๆไปฌ็4bit้ๅๆจกๅ๏ผ
from vllm import LLM, SamplingParams
sampling_params = SamplingParams(temperature=0.7, top_p=0.95,max_tokens=256)
llm = LLM(model="/your/model/path", quantization="gptq", dtype="float16")
system_message = "ไปฅไธๆฏ็จๆทๅไบบๅทฅๆบ่ฝๅฉๆไน้ด็ๅฏน่ฏใ็จๆทไปฅHumanๅผๅคด๏ผไบบๅทฅๆบ่ฝๅฉๆไปฅAssistantๅผๅคด๏ผไผๅฏนไบบ็ฑปๆๅบ็้ฎ้ข็ปๅบๆๅธฎๅฉใ้ซ่ดจ้ใ่ฏฆ็ปๅ็คผ่ฒ็ๅ็ญ๏ผๅนถไธๆปๆฏๆ็ปๅไธ ไธไธ้ๅพทใไธๅฎๅ
จใๆไบ่ฎฎใๆฟๆฒปๆๆ็ญ็ธๅ
ณ็่ฏ้ขใ้ฎ้ขๅๆ็คบใ\n"
seps = [" ", "</s>"]
roles = ["Human", "Assistant"]
content = "ไป็ปไธไฝ ่ชๅทฑ"
prompt = system_message + seps[0] + roles[0] + ": " + content + seps[0] + roles[1] + ":"
print(f"่พๅ
ฅ: {content}")
result = llm.generate(prompt, sampling_params)
result_output = [[output.outputs[0].text, output.outputs[0].token_ids] for output in result]
print(f"่พๅบ๏ผ{result_output[0]}")
็ๆ้ๅบฆ่ฏไผฐ
ๆไปฌๆต่ฏไบไธๅๆจกๅ๏ผ้ๅๅๅ้ๅๅ๏ผๅจไธๅๆจ็ๆนๅผ๏ผHuggingFaceใvLLM๏ผไธ็็ๆ้ๅบฆ๏ผ็ปๆๅฆไธๆ็คบ๏ผ
- ๅ จ้70Bๆจกๅๆจ็ๅๅๆฏ๏ผ 8.26 token/s
- 4bit 70Bๆจกๅๆจ็ๅๅๆฏ๏ผ 0.70 token/s
- 8bit 70Bๆจกๅๆจ็ๅๅๆฏ๏ผ 3.05 token/s
- 4bit 70Bๆจกๅvllmๆจ็ๅๅๆฏ๏ผ 60.32 token/s
- ๅ จ้70Bๆจกๅvllmๆจ็ๅๅๆฏ๏ผ 41.80 token/s
ๅจๆๆๆต่ฏไธญ๏ผๆไปฌๅ่ฎพ็ฝฎbatchsize=1ใไธ่ฟฐๅไธ้กน้ฝๆฏๆฎ้HuggingFaceๆจ็่ๆฌ็ๆต่ฏ็ปๆ๏ผๅฏไปฅ็ๅฐ้ๅๅๆจกๅๆจ็้ๅบฆๅนถๆ ๆๅใๆๅไธค้กนๆฏvLLM็ๆจ็ๆต่ฏ็ปๆ๏ผๆฏ่ตทHuggingFaceๆจ็๏ผๅฏไปฅ็ๅบvLLMๅฏ็จๆงๆด้ซ๏ผๆจกๅ็ๆ้ๅบฆๅๆๆพ่ๆๅใ
- Downloads last month
- 9