Baichuan-13B-Base
ไป็ป
Baichuan-13B-BaseไธบBaichuan-13B็ณปๅๆจกๅไธญ็้ข่ฎญ็ป็ๆฌ๏ผ็ป่ฟๅฏน้ฝๅ็ๆจกๅๅฏ่งBaichuan-13B-Chatใ
Baichuan-13B ๆฏ็ฑ็พๅทๆบ่ฝ็ปง Baichuan-7B ไนๅๅผๅ็ๅ ๅซ 130 ไบฟๅๆฐ็ๅผๆบๅฏๅ็จ็ๅคง่งๆจก่ฏญ่จๆจกๅ๏ผๅจๆๅจ็ไธญๆๅ่ฑๆ benchmark ไธๅๅๅพๅๅฐบๅฏธๆๅฅฝ็ๆๆใๆฌๆฌกๅๅธๅ ๅซๆ้ข่ฎญ็ป (Baichuan-13B-Base) ๅๅฏน้ฝ (Baichuan-13B-Chat) ไธคไธช็ๆฌใBaichuan-13B ๆๅฆไธๅ ไธช็น็น๏ผ
- ๆดๅคงๅฐบๅฏธใๆดๅคๆฐๆฎ๏ผBaichuan-13B ๅจ Baichuan-7B ็ๅบ็กไธ่ฟไธๆญฅๆฉๅคงๅๆฐ้ๅฐ 130 ไบฟ๏ผๅนถไธๅจ้ซ่ดจ้็่ฏญๆไธ่ฎญ็ปไบ 1.4 ไธไบฟ tokens๏ผ่ถ ่ฟ LLaMA-13B 40%๏ผๆฏๅฝๅๅผๆบ 13B ๅฐบๅฏธไธ่ฎญ็ปๆฐๆฎ้ๆๅค็ๆจกๅใๆฏๆไธญ่ฑๅ่ฏญ๏ผไฝฟ็จ ALiBi ไฝ็ฝฎ็ผ็ ๏ผไธไธๆ็ชๅฃ้ฟๅบฆไธบ 4096ใ
- ๅๆถๅผๆบ้ข่ฎญ็ปๅๅฏน้ฝๆจกๅ๏ผ้ข่ฎญ็ปๆจกๅๆฏ้็จๅผๅ่ ็โๅบๅบงโ๏ผ่ๅนฟๅคงๆฎ้็จๆทๅฏนๆๅฏน่ฏๅ่ฝ็ๅฏน้ฝๆจกๅๅ ทๆๆดๅผบ็้ๆฑใๅ ๆญคๆฌๆฌกๅผๆบๆไปฌๅๆถๅๅธไบๅฏน้ฝๆจกๅ๏ผBaichuan-13B-Chat๏ผ๏ผๅ ทๆๅพๅผบ็ๅฏน่ฏ่ฝๅ๏ผๅผ็ฎฑๅณ็จ๏ผๅ ่กไปฃ็ ๅณๅฏ็ฎๅ็้จ็ฝฒใ
- ๆด้ซๆ็ๆจ็๏ผไธบไบๆฏๆๆดๅนฟๅคง็จๆท็ไฝฟ็จ๏ผๆไปฌๆฌๆฌกๅๆถๅผๆบไบ int8 ๅ int4 ็้ๅ็ๆฌ๏ผ็ธๅฏน้้ๅ็ๆฌๅจๅ ไนๆฒกๆๆๆๆๅคฑ็ๆ ๅตไธๅคงๅคง้ไฝไบ้จ็ฝฒ็ๆบๅจ่ตๆบ้จๆง๏ผๅฏไปฅ้จ็ฝฒๅจๅฆ Nvidia 3090 ่ฟๆ ท็ๆถ่ดน็บงๆพๅกไธใ
- ๅผๆบๅ ่ดนๅฏๅ็จ๏ผBaichuan-13B ไธไป ๅฏนๅญฆๆฏ็ ็ฉถๅฎๅ จๅผๆพ๏ผๅผๅ่ ไนไป ้้ฎไปถ็ณ่ฏทๅนถ่ทๅพๅฎๆนๅ็จ่ฎธๅฏๅ๏ผๅณๅฏไปฅๅ ่ดนๅ็จใ
- Baichuan-13B-Base is the pre-training version in the Baichuan-13B series of models, and the aligned model can be found at Baichuan-13B-Chat.
Baichuan-13B is an open-source, commercially usable large-scale language model developed by Baichuan Intelligence, following Baichuan-7B. With 13 billion parameters, it achieves the best performance in standard Chinese and English benchmarks among models of its size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features:
- Larger size, more data: Baichuan-13B further expands the parameter volume to 13 billion based on Baichuan-7B, and has trained 1.4 trillion tokens on high-quality corpora, exceeding LLaMA-13B by 40%. It is currently the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses ALiBi position encoding, and has a context window length of 4096.
- Open-source pre-training and alignment models simultaneously: The pre-training model is a "base" suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we also released the alignment model (Baichuan-13B-Chat), which has strong dialogue capabilities and is ready to use. It can be easily deployed with just a few lines of code.
- More efficient inference: To support a wider range of users, we have open-sourced the INT8 and INT4 quantized versions. The model can be conveniently deployed on consumer GPUs like the Nvidia 3090 with almost no performance loss.
- Open-source, free, and commercially usable: Baichuan-13B is not only fully open to academic research, but developers can also use it for free commercially after applying for and receiving official commercial permission via email.
ๆจกๅ่ฏฆๆ
ๆจกๅๆ่ฟฐ
Developed by: ็พๅทๆบ่ฝ(Baichuan Intelligent Technology)
Email: opensource@baichuan-inc.com
Language(s) (NLP): Chinese/English
License: ใCommunity License for Baichuan-13B Modelใ(ZH| EN)
ๅไธ็จ้(For commercial use): ่ฏท้่ฟ Email ่็ณป็ณ่ฏทไนฆ้ขๆๆใ(Contact us via Email above to apply for written authorization.)
ๆจกๅ็ปๆ
ๆดไฝๆจกๅๅบไบBaichuan-7B๏ผไธบไบ่ทๅพๆดๅฅฝ็ๆจ็ๆง่ฝ๏ผBaichuan-13B ไฝฟ็จไบ ALiBi ็บฟๆงๅ็ฝฎๆๆฏ๏ผ็ธๅฏนไบ Rotary Embedding ่ฎก็ฎ้ๆดๅฐ๏ผๅฏนๆจ็ๆง่ฝๆๆพ่ๆๅ๏ผไธๆ ๅ็ LLaMA-13B ็ธๆฏ๏ผ็ๆ 2000 ไธช tokens ็ๅนณๅๆจ็้ๅบฆ (tokens/s)๏ผๅฎๆตๆๅ 31.6%๏ผ
| Model | tokens/s |
|---|---|
| LLaMA-13B | 19.4 |
| Baichuan-13B | 25.4 |
ๅ ทไฝๅๆฐๅ่งไธ่กจ
| ๆจกๅๅ็งฐ | ้ๅซๅฑ็ปดๅบฆ | ๅฑๆฐ | ๅคดๆฐ | ่ฏ่กจๅคงๅฐ | ๆปๅๆฐ้ | ่ฎญ็ปๆฐๆฎ๏ผtokens๏ผ | ไฝ็ฝฎ็ผ็ | ๆๅคง้ฟๅบฆ |
|---|---|---|---|---|---|---|---|---|
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2ไธไบฟ | RoPE | 4,096 |
| Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4ไธไบฟ | ALiBi | 4,096 |
The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:
| Model | tokens/s |
|---|---|
| LLaMA-13B | 19.4 |
| Baichuan-13B | 25.4 |
The specific parameters are as follows:
| Model Name | Hidden Size | Num Layers | Num Attention Heads | Vocab Size | Total Params | Training Dats๏ผtokens๏ผ | Position Embedding | Max Length |
|---|---|---|---|---|---|---|---|---|
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2ไธไบฟ | RoPE | 4,096 |
| Baichuan-13B | 5,120 | 40 | 40 | 64,000 | 13,264,901,120 | 1.4ไธไบฟ | ALiBi | 4,096 |
ๅ ่ดฃๅฃฐๆ
ๆไปฌๅจๆญคๅฃฐๆ๏ผๆไปฌ็ๅผๅๅข้ๅนถๆชๅบไบ Baichuan-13B ๆจกๅๅผๅไปปไฝๅบ็จ๏ผๆ ่ฎบๆฏๅจ iOSใAndroidใ็ฝ้กตๆไปปไฝๅ ถไปๅนณๅฐใๆไปฌๅผบ็ๅผๅๆๆไฝฟ็จ่ ๏ผไธ่ฆๅฉ็จ Baichuan-13B ๆจกๅ่ฟ่กไปปไฝๅฑๅฎณๅฝๅฎถ็คพไผๅฎๅ จๆ่ฟๆณ็ๆดปๅจใๅฆๅค๏ผๆไปฌไน่ฆๆฑไฝฟ็จ่ ไธ่ฆๅฐ Baichuan-13B ๆจกๅ็จไบๆช็ป้ๅฝๅฎๅ จๅฎกๆฅๅๅคๆก็ไบ่็ฝๆๅกใๆไปฌๅธๆๆๆ็ไฝฟ็จ่ ้ฝ่ฝ้ตๅฎ่ฟไธชๅๅ๏ผ็กฎไฟ็งๆ็ๅๅฑ่ฝๅจ่ง่ๅๅๆณ็็ฏๅขไธ่ฟ่กใ
ๆไปฌๅทฒ็ปๅฐฝๆไปฌๆ่ฝ๏ผๆฅ็กฎไฟๆจกๅ่ฎญ็ป่ฟ็จไธญไฝฟ็จ็ๆฐๆฎ็ๅ่งๆงใ็ถ่๏ผๅฐฝ็ฎกๆไปฌๅทฒ็ปๅๅบไบๅทจๅคง็ๅชๅ๏ผไฝ็ฑไบๆจกๅๅๆฐๆฎ็ๅคๆๆง๏ผไปๆๅฏ่ฝๅญๅจไธไบๆ ๆณ้ข่ง็้ฎ้ขใๅ ๆญค๏ผๅฆๆ็ฑไบไฝฟ็จ Baichuan-13B ๅผๆบๆจกๅ่ๅฏผ่ด็ไปปไฝ้ฎ้ข๏ผๅ ๆฌไฝไธ้ไบๆฐๆฎๅฎๅ จ้ฎ้ขใๅ ฌๅ ฑ่่ฎบ้ฃ้ฉ๏ผๆๆจกๅ่ขซ่ฏฏๅฏผใๆปฅ็จใไผ ๆญๆไธๅฝๅฉ็จๆๅธฆๆฅ็ไปปไฝ้ฃ้ฉๅ้ฎ้ข๏ผๆไปฌๅฐไธๆฟๆ ไปปไฝ่ดฃไปปใ
We hereby declare that our development team has not developed any applications based on the Baichuan-13B model, whether on iOS, Android, the web, or any other platform. We strongly urge all users not to use the Baichuan-13B model for any activities that harm national social security or are illegal. In addition, we also ask users not to use the Baichuan-13B model for internet services that have not undergone appropriate security review and filing. We hope that all users will adhere to this principle to ensure that technological development takes place in a regulated and legal environment.
We have done our utmost to ensure the compliance of the data used in the model training process. However, despite our great efforts, due to the complexity of the model and data, there may still be some unforeseen issues. Therefore, we will not take any responsibility for any issues arising from the use of the Baichuan-13B open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, misused, disseminated, or improperly exploited.
่ฎญ็ป่ฏฆๆ
่ฎญ็ปๅ ทไฝ่ฎพ็ฝฎๅ่งBaichuan-13Bใ
For specific training settings, please refer to Baichuan-13B.
ๆต่ฏ็ปๆ
C-Eval
| Model 5-shot | STEM | Social Sciences | Humanities | Others | Average |
|---|---|---|---|---|---|
| Baichuan-7B | 38.2 | 52.0 | 46.2 | 39.3 | 42.8 |
| Chinese-Alpaca-Plus-13B | 35.2 | 45.6 | 40.0 | 38.2 | 38.8 |
| Vicuna-13B | 30.5 | 38.2 | 32.5 | 32.5 | 32.8 |
| Chinese-LLaMA-Plus-13B | 30.3 | 38.0 | 32.9 | 29.1 | 32.1 |
| Ziya-LLaMA-13B-Pretrain | 27.6 | 34.4 | 32.0 | 28.6 | 30.0 |
| LLaMA-13B | 27.0 | 33.6 | 27.7 | 27.6 | 28.5 |
| moss-moon-003-base (16B) | 27.0 | 29.1 | 27.2 | 26.9 | 27.4 |
| Baichuan-13B-Base | 45.9 | 63.5 | 57.2 | 49.3 | 52.4 |
| Baichuan-13B-Chat | 43.7 | 64.6 | 56.2 | 49.2 | 51.5 |
MMLU
| Model 5-shot | STEM | Social Sciences | Humanities | Others | Average |
|---|---|---|---|---|---|
| Vicuna-13B | 40.4 | 60.5 | 49.5 | 58.4 | 52.0 |
| LLaMA-13B | 36.1 | 53.0 | 44.0 | 52.8 | 46.3 |
| Chinese-Alpaca-Plus-13B | 36.9 | 48.9 | 40.5 | 50.5 | 43.9 |
| Ziya-LLaMA-13B-Pretrain | 35.6 | 47.6 | 40.1 | 49.4 | 42.9 |
| Baichuan-7B | 35.6 | 48.9 | 38.4 | 48.1 | 42.3 |
| Chinese-LLaMA-Plus-13B | 33.1 | 42.8 | 37.0 | 44.6 | 39.2 |
| moss-moon-003-base (16B) | 22.4 | 22.8 | 24.2 | 24.4 | 23.6 |
| Baichuan-13B-Base | 41.6 | 60.9 | 47.4 | 58.5 | 51.6 |
| Baichuan-13B-Chat | 40.9 | 60.9 | 48.8 | 59.0 | 52.1 |
่ฏดๆ๏ผๆไปฌ้็จไบ MMLU ๅฎๆน็่ฏๆตๆนๆกใ
CMMLU
| Model 5-shot | STEM | Humanities | Social Sciences | Others | China Specific | Average |
|---|---|---|---|---|---|---|
| Baichuan-7B | 34.4 | 47.5 | 47.6 | 46.6 | 44.3 | 44.0 |
| Vicuna-13B | 31.8 | 36.2 | 37.6 | 39.5 | 34.3 | 36.3 |
| Chinese-Alpaca-Plus-13B | 29.8 | 33.4 | 33.2 | 37.9 | 32.1 | 33.4 |
| Chinese-LLaMA-Plus-13B | 28.1 | 33.1 | 35.4 | 35.1 | 33.5 | 33.0 |
| Ziya-LLaMA-13B-Pretrain | 29.0 | 30.7 | 33.8 | 34.4 | 31.9 | 32.1 |
| LLaMA-13B | 29.2 | 30.8 | 31.6 | 33.0 | 30.5 | 31.2 |
| moss-moon-003-base (16B) | 27.2 | 30.4 | 28.8 | 32.6 | 28.7 | 29.6 |
| Baichuan-13B-Base | 41.7 | 61.1 | 59.8 | 59.0 | 56.4 | 55.3 |
| Baichuan-13B-Chat | 42.8 | 62.6 | 59.7 | 59.0 | 56.1 | 55.8 |
่ฏดๆ๏ผCMMLU ๆฏไธไธช็ปผๅๆง็ไธญๆ่ฏไผฐๅบๅ๏ผไธ้จ็จไบ่ฏไผฐ่ฏญ่จๆจกๅๅจไธญๆ่ฏญๅขไธ็็ฅ่ฏๅๆจ็่ฝๅใๆไปฌ้็จไบๅ ถๅฎๆน็่ฏๆตๆนๆกใ
ๅพฎไฟก็พค็ป
- Downloads last month
- 1,237
