Instructions to use wenge-research/yayi-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wenge-research/yayi-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="wenge-research/yayi-7b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("wenge-research/yayi-7b") model = AutoModelForCausalLM.from_pretrained("wenge-research/yayi-7b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use wenge-research/yayi-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "wenge-research/yayi-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wenge-research/yayi-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/wenge-research/yayi-7b
- SGLang
How to use wenge-research/yayi-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "wenge-research/yayi-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wenge-research/yayi-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "wenge-research/yayi-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wenge-research/yayi-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use wenge-research/yayi-7b with Docker Model Runner:
docker model run hf.co/wenge-research/yayi-7b
้ ๆๅคงๆจกๅ
ไป็ป
้ ๆๅคงๆจกๅๅจ็พไธ็บงไบบๅทฅๆ้ ็้ซ่ดจ้้ขๅๆฐๆฎไธ่ฟ่กๆไปคๅพฎ่ฐๅพๅฐ๏ผ่ฎญ็ปๆฐๆฎ่ฆ็ๅชไฝๅฎฃไผ ใ่ๆ ๅๆใๅ ฌๅ ฑๅฎๅ จใ้่้ฃๆงใๅๅธๆฒป็็ญไบๅคง้ขๅ๏ผไธ็พ็ง่ช็ถ่ฏญ่จๆไปคไปปๅกใ้ ๆๅคงๆจกๅไป้ข่ฎญ็ปๅๅงๅๆ้ๅฐ้ขๅๆจกๅ็่ฟญไปฃ่ฟ็จไธญ๏ผๆไปฌ้ๆญฅๅขๅผบไบๅฎ็ไธญๆๅบ็ก่ฝๅๅ้ขๅๅๆ่ฝๅ๏ผๅนถๅขๅ ไบ้จๅๆไปถ่ฝๅใๅๆถ๏ผ็ป่ฟๆฐ็พๅ็จๆทๅ ๆต่ฟ็จไธญๆ็ปญไธๆญ็ไบบๅทฅๅ้ฆไผๅ๏ผๆไปฌ่ฟไธๆญฅๆๅไบๆจกๅๆง่ฝๅๅฎๅ จๆงใ
้่ฟ้ ๆๅคงๆจกๅ็ๅผๆบไธบไฟ่ฟไธญๆ้ข่ฎญ็ปๅคงๆจกๅๅผๆบ็คพๅบ็ๅๅฑ๏ผ่ดก็ฎ่ชๅทฑ็ไธไปฝๅ้๏ผ้่ฟๅผๆบ๏ผไธๆฏไธไฝๅไฝไผไผดๅ ฑๅปบ้ ๆๅคงๆจกๅ็ๆใ
ๅฟซ้ๅผๅง
ไปฅไธๆฏไธไธช็ฎๅ่ฐ็จ yayi-7b ่ฟ่กไธๆธธไปปๅกๆจ็็็คบไพไปฃ็ ๏ผๅฏๅจๅๅผ A100/A800/3090 ็ญGPU่ฟ่ก๏ผไฝฟ็จFP16็ฒพๅบฆๆจ็ๆถ็บฆๅ ็จ 20GB ๆพๅญใ่ฅ้่ทๅ่ฎญ็ปๆฐๆฎๆๅบไบ yayi-7b ่ฟ่กๆจกๅๅพฎ่ฐ๏ผ่ฏทๅ่ๆไปฌ็ ๐ปGithub Repoใ
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch
yayi_7b_path = "wenge-research/yayi-7b"
tokenizer = AutoTokenizer.from_pretrained(yayi_7b_path)
model = AutoModelForCausalLM.from_pretrained(yayi_7b_path, device_map="auto", torch_dtype=torch.bfloat16)
prompt = "ไฝ ๅฅฝ"
formatted_prompt = f"<|System|>:\nA chat between a human and an AI assistant named YaYi.\nYaYi is a helpful and harmless language model developed by Beijing Wenge Technology Co.,Ltd.\n\n<|Human|>:\n{prompt}\n\n<|YaYi|>:"
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
eos_token_id = tokenizer("<|End|>").input_ids[0]
generation_config = GenerationConfig(
eos_token_id=eos_token_id,
pad_token_id=eos_token_id,
do_sample=True,
max_new_tokens=100,
temperature=0.3,
repetition_penalty=1.1,
no_repeat_ngram_size=0
)
response = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(response[0]))
ๆณจๆ๏ผๆจกๅ่ฎญ็ปๆถๆทปๅ ไบ special token <|End|> ไฝไธบ็ปๆ็ฌฆ๏ผๅ ๆญคไธ่ฟฐไปฃ็ GenerationConfig ้ๅฐ eos_token_id ่ฎพ็ฝฎไธบ่ฏฅ็ปๆ็ฌฆๅฏนๅบ็ token idใ
็ธๅ ณๅ่ฎฎ
ๅฑ้ๆง
ๅบไบๅฝๅๆฐๆฎๅๅบ็กๆจกๅ่ฎญ็ปๅพๅฐ็SFTๆจกๅ๏ผๅจๆๆไธไปๅญๅจไปฅไธ้ฎ้ข๏ผ
- ๅจๆถๅไบๅฎๆง็ๆไปคไธๅฏ่ฝไผไบง็่ฟ่ไบๅฎ็้่ฏฏๅ็ญใ
- ๅฏนไบๅ ทๅคๅฑๅฎณๆง็ๆไปคๆ ๆณๅพๅฅฝ็้ดๅซ๏ผๅฏ่ฝไผไบง็ๅฑๅฎณๆง่จ่ฎบใ
- ๅจไธไบๆถๅๆจ็ใไปฃ็ ใๅค่ฝฎๅฏน่ฏ็ญๅบๆฏไธๆจกๅ็่ฝๅไปๆๅพ ๆ้ซใ
ๅ ่ดฃๅฃฐๆ
ๅบไบไปฅไธๆจกๅๅฑ้ๆง๏ผๆไปฌ่ฆๆฑๅผๅ่ ไป ๅฐๆไปฌๅผๆบ็ไปฃ็ ใๆฐๆฎใๆจกๅๅๅ็ปญ็จๆญค้กน็ฎ็ๆ็่ก็็ฉ็จไบ็ ็ฉถ็ฎ็๏ผไธๅพ็จไบๅไธ็จ้๏ผไปฅๅๅ ถไปไผๅฏน็คพไผๅธฆๆฅๅฑๅฎณ็็จ้ใ่ฏท่ฐจๆ ้ดๅซๅไฝฟ็จ้ ๆๅคงๆจกๅ็ๆ็ๅ ๅฎน๏ผ่ฏทๅฟๅฐ็ๆ็ๆๅฎณๅ ๅฎนไผ ๆญ่ณไบ่็ฝใ่ฅไบง็ไธ่ฏๅๆ๏ผ็ฑไผ ๆญ่ ่ช่ดใ
ๆฌ้กน็ฎไป ๅฏๅบ็จไบ็ ็ฉถ็ฎ็๏ผ้กน็ฎๅผๅ่ ไธๆฟๆ ไปปไฝๅ ไฝฟ็จๆฌ้กน็ฎ๏ผๅ ๅซไฝไธ้ไบๆฐๆฎใๆจกๅใไปฃ็ ็ญ๏ผๅฏผ่ด็ๅฑๅฎณๆๆๅคฑใ่ฏฆ็ป่ฏทๅ่ๅ ่ดฃๅฃฐๆใ
ๅผๆบๅ่ฎฎ
ๆฌ้กน็ฎไธญ็ไปฃ็ ไพ็ ง Apache-2.0 ๅ่ฎฎๅผๆบ๏ผๆฐๆฎ้็จ CC BY-NC 4.0 ๅ่ฎฎ๏ผYaYi ็ณปๅๆจกๅๆ้็ไฝฟ็จๅ้่ฆ้ตๅพช Model Licenseใ
่ด่ฐข
- ๆฌ้กน็ฎไฝฟ็จไบ BigScience ็ bloomz-7b-mt ๆจกๅๆ้ไฝไธบๅๅงๅๆ้๏ผๅนถๅบไบ่ฏ่กจ่ฟ่กๆฉๅฑ๏ผ
- ๆฌ้กน็ฎ่ฎญ็ปไปฃ็ ๅ่ไบ Databricks ็ dolly ้กน็ฎๅ Huggingface transformers ๅบ๏ผ
- ๆฌ้กน็ฎๅๅธๅผ่ฎญ็ปไฝฟ็จไบ Microsoft ็ DeepSpeed ๅๅธๅผ่ฎญ็ปๅทฅๅ ทๅ Huggingface transformers ๆๆกฃไธญ็ ZeRO stage 2 ้ ็ฝฎๆไปถ๏ผ
YaYi
Introduction
YaYi was fine-tuned on millions of artificially constructed high-quality domain data. This training data covers five key domains: media publicity, public opinion analysis, public safety, financial risk control, and urban governance, encompassing over a hundred natural language instruction tasks. Throughout the iterative development process of the YaYi, starting from pre-training initialization weights and progressing to domain-specific model, we have steadily enhanced its foundational Chinese language capabilities and domain analysis capabilities. We've also introduced multi-turn conversation enhancements and integrated various plug-in capabilities. Furthermore, through continuous manual feedback and optimization from hundreds of users during the internal testing phase, we've meticulously refined the model's performance and security.
By open-sourcing the YaYi model, we will contribute our own efforts to the development of the Chinese pre-trained large language model open-source community. Through this open-source initiative, we seek to collaborate with every partner to build the YaYi model ecosystem together.
Run
Below is a simple example code for invoking yayi-7b for downstream task inference. It can run on a single GPU such as A100/A800/3090 and occupies approximately 20GB of GPU memory when performing inference with FP16 precision. If you need to obtain training data or fine-tune the model based on yayi-7b, please refer to our ๐ปGithub Repo.
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch
yayi_7b_path = "wenge-research/yayi-7b"
tokenizer = AutoTokenizer.from_pretrained(yayi_7b_path)
model = AutoModelForCausalLM.from_pretrained(yayi_7b_path, device_map="auto", torch_dtype=torch.bfloat16)
prompt = "ไฝ ๅฅฝ"
formatted_prompt = f"<|System|>:\nA chat between a human and an AI assistant named YaYi.\nYaYi is a helpful and harmless language model developed by Beijing Wenge Technology Co.,Ltd.\n\n<|Human|>:\n{prompt}\n\n<|YaYi|>:"
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
eos_token_id = tokenizer("<|End|>").input_ids[0]
generation_config = GenerationConfig(
eos_token_id=eos_token_id,
pad_token_id=eos_token_id,
do_sample=True,
max_new_tokens=100,
temperature=0.3,
repetition_penalty=1.1,
no_repeat_ngram_size=0
)
response = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(response[0]))
Please note that a special token <|End|> was added as an end-of-sequence marker during model training. Therefore, in the GenerationConfig provided above, you should set eos_token_id to the token id corresponding to this end-of-sequence marker.
Related agreements
Limitations
The SFT model trained based on the current data and base model still exhibits the following issues in terms of performance:
- It may generate factually incorrect responses for factual instructions.
- It struggles to effectively identify harmful instructions, potentially leading to harmful content generation.
- Its capabilities in scenarios involving logical reasoning, code generation, scientific computation, and similar tasks still require improvement.
Disclaimer
Due to the limitations of the model mentioned above, we request that developers use the code, data, models, and any derivatives generated from this project solely for research purposes and refrain from using them for commercial or any other potentially harmful purposes to society. Please exercise caution in evaluating and utilizing content generated by the YaYi model, and do not propagate harmful content on the internet. Any adverse consequences resulting from such actions are the responsibility of the disseminator.
This project is intended for research purposes only, and the project developers bear no responsibility for any harm or losses incurred due to the use of this project, including but not limited to data, models, code, etc. For more details, please refer to the Disclaimer.
License
The code in this project is open-source under the Apache-2.0 license, the data follows the CC BY-NC 4.0 license, and the usage of YaYi series model weights must adhere to the Model License.
Acknowledgements
- In this project, we used model weights from BigScience's bloomz-7b1-mt and Meta's Llama 2 series as initialization weights, along with vocabulary expansion.
- The training code in this project was inspired by Databricks' dolly project and Huggingface's transformers library.
- Distributed training in this project utilized Microsoft's DeepSpeed distributed training tool and configuration files from Huggingface transformers' ZeRO stage 2.
- Downloads last month
- 1,030