Baichuan-13B-Base

ไป‹็ป

Baichuan-13B-BaseไธบBaichuan-13B็ณปๅˆ—ๆจกๅž‹ไธญ็š„้ข„่ฎญ็ปƒ็‰ˆๆœฌ๏ผŒ็ป่ฟ‡ๅฏน้ฝๅŽ็š„ๆจกๅž‹ๅฏ่งBaichuan-13B-Chatใ€‚

Baichuan-13B ๆ˜ฏ็”ฑ็™พๅทๆ™บ่ƒฝ็ปง Baichuan-7B ไน‹ๅŽๅผ€ๅ‘็š„ๅŒ…ๅซ 130 ไบฟๅ‚ๆ•ฐ็š„ๅผ€ๆบๅฏๅ•†็”จ็š„ๅคง่ง„ๆจก่ฏญ่จ€ๆจกๅž‹๏ผŒๅœจๆƒๅจ็š„ไธญๆ–‡ๅ’Œ่‹ฑๆ–‡ benchmark ไธŠๅ‡ๅ–ๅพ—ๅŒๅฐบๅฏธๆœ€ๅฅฝ็š„ๆ•ˆๆžœใ€‚ๆœฌๆฌกๅ‘ๅธƒๅŒ…ๅซๆœ‰้ข„่ฎญ็ปƒ (Baichuan-13B-Base) ๅ’Œๅฏน้ฝ (Baichuan-13B-Chat) ไธคไธช็‰ˆๆœฌใ€‚Baichuan-13B ๆœ‰ๅฆ‚ไธ‹ๅ‡ ไธช็‰น็‚น๏ผš

  1. ๆ›ดๅคงๅฐบๅฏธใ€ๆ›ดๅคšๆ•ฐๆฎ๏ผšBaichuan-13B ๅœจ Baichuan-7B ็š„ๅŸบ็ก€ไธŠ่ฟ›ไธ€ๆญฅๆ‰ฉๅคงๅ‚ๆ•ฐ้‡ๅˆฐ 130 ไบฟ๏ผŒๅนถไธ”ๅœจ้ซ˜่ดจ้‡็š„่ฏญๆ–™ไธŠ่ฎญ็ปƒไบ† 1.4 ไธ‡ไบฟ tokens๏ผŒ่ถ…่ฟ‡ LLaMA-13B 40%๏ผŒๆ˜ฏๅฝ“ๅ‰ๅผ€ๆบ 13B ๅฐบๅฏธไธ‹่ฎญ็ปƒๆ•ฐๆฎ้‡ๆœ€ๅคš็š„ๆจกๅž‹ใ€‚ๆ”ฏๆŒไธญ่‹ฑๅŒ่ฏญ๏ผŒไฝฟ็”จ ALiBi ไฝ็ฝฎ็ผ–็ ๏ผŒไธŠไธ‹ๆ–‡็ช—ๅฃ้•ฟๅบฆไธบ 4096ใ€‚
  2. ๅŒๆ—ถๅผ€ๆบ้ข„่ฎญ็ปƒๅ’Œๅฏน้ฝๆจกๅž‹๏ผš้ข„่ฎญ็ปƒๆจกๅž‹ๆ˜ฏ้€‚็”จๅผ€ๅ‘่€…็š„โ€œๅŸบๅบงโ€๏ผŒ่€Œๅนฟๅคงๆ™ฎ้€š็”จๆˆทๅฏนๆœ‰ๅฏน่ฏๅŠŸ่ƒฝ็š„ๅฏน้ฝๆจกๅž‹ๅ…ทๆœ‰ๆ›ดๅผบ็š„้œ€ๆฑ‚ใ€‚ๅ› ๆญคๆœฌๆฌกๅผ€ๆบๆˆ‘ไปฌๅŒๆ—ถๅ‘ๅธƒไบ†ๅฏน้ฝๆจกๅž‹๏ผˆBaichuan-13B-Chat๏ผ‰๏ผŒๅ…ทๆœ‰ๅพˆๅผบ็š„ๅฏน่ฏ่ƒฝๅŠ›๏ผŒๅผ€็ฎฑๅณ็”จ๏ผŒๅ‡ ่กŒไปฃ็ ๅณๅฏ็ฎ€ๅ•็š„้ƒจ็ฝฒใ€‚
  3. ๆ›ด้ซ˜ๆ•ˆ็š„ๆŽจ็†๏ผšไธบไบ†ๆ”ฏๆŒๆ›ดๅนฟๅคง็”จๆˆท็š„ไฝฟ็”จ๏ผŒๆˆ‘ไปฌๆœฌๆฌกๅŒๆ—ถๅผ€ๆบไบ† int8 ๅ’Œ int4 ็š„้‡ๅŒ–็‰ˆๆœฌ๏ผŒ็›ธๅฏน้ž้‡ๅŒ–็‰ˆๆœฌๅœจๅ‡ ไนŽๆฒกๆœ‰ๆ•ˆๆžœๆŸๅคฑ็š„ๆƒ…ๅ†ตไธ‹ๅคงๅคง้™ไฝŽไบ†้ƒจ็ฝฒ็š„ๆœบๅ™จ่ต„ๆบ้—จๆง›๏ผŒๅฏไปฅ้ƒจ็ฝฒๅœจๅฆ‚ Nvidia 3090 ่ฟ™ๆ ท็š„ๆถˆ่ดน็บงๆ˜พๅกไธŠใ€‚
  4. ๅผ€ๆบๅ…่ดนๅฏๅ•†็”จ๏ผšBaichuan-13B ไธไป…ๅฏนๅญฆๆœฏ็ ”็ฉถๅฎŒๅ…จๅผ€ๆ”พ๏ผŒๅผ€ๅ‘่€…ไนŸไป…้œ€้‚ฎไปถ็”ณ่ฏทๅนถ่Žทๅพ—ๅฎ˜ๆ–นๅ•†็”จ่ฎธๅฏๅŽ๏ผŒๅณๅฏไปฅๅ…่ดนๅ•†็”จใ€‚
  5. Baichuan-13B-Base is the pre-training version in the Baichuan-13B series of models, and the aligned model can be found at Baichuan-13B-Chat.

Baichuan-13B is an open-source, commercially usable large-scale language model developed by Baichuan Intelligence, following Baichuan-7B. With 13 billion parameters, it achieves the best performance in standard Chinese and English benchmarks among models of its size. This release includes two versions: pre-training (Baichuan-13B-Base) and alignment (Baichuan-13B-Chat). Baichuan-13B has the following features:

  1. Larger size, more data: Baichuan-13B further expands the parameter volume to 13 billion based on Baichuan-7B, and has trained 1.4 trillion tokens on high-quality corpora, exceeding LLaMA-13B by 40%. It is currently the model with the most training data in the open-source 13B size. It supports both Chinese and English, uses ALiBi position encoding, and has a context window length of 4096.
  2. Open-source pre-training and alignment models simultaneously: The pre-training model is a "base" suitable for developers, while the general public has a stronger demand for alignment models with dialogue capabilities. Therefore, in this open-source release, we also released the alignment model (Baichuan-13B-Chat), which has strong dialogue capabilities and is ready to use. It can be easily deployed with just a few lines of code.
  3. More efficient inference: To support a wider range of users, we have open-sourced the INT8 and INT4 quantized versions. The model can be conveniently deployed on consumer GPUs like the Nvidia 3090 with almost no performance loss.
  4. Open-source, free, and commercially usable: Baichuan-13B is not only fully open to academic research, but developers can also use it for free commercially after applying for and receiving official commercial permission via email.

ๆจกๅž‹่ฏฆๆƒ…

ๆจกๅž‹ๆ่ฟฐ

  • Developed by: ็™พๅทๆ™บ่ƒฝ(Baichuan Intelligent Technology)

  • Email: opensource@baichuan-inc.com

  • Language(s) (NLP): Chinese/English

  • License: ใ€Community License for Baichuan-13B Modelใ€‘(ZH| EN)

    ๅ•†ไธš็”จ้€”(For commercial use): ่ฏท้€š่ฟ‡ Email ่”็ณป็”ณ่ฏทไนฆ้ขๆŽˆๆƒใ€‚(Contact us via Email above to apply for written authorization.)

ๆจกๅž‹็ป“ๆž„

ๆ•ดไฝ“ๆจกๅž‹ๅŸบไบŽBaichuan-7B๏ผŒไธบไบ†่Žทๅพ—ๆ›ดๅฅฝ็š„ๆŽจ็†ๆ€ง่ƒฝ๏ผŒBaichuan-13B ไฝฟ็”จไบ† ALiBi ็บฟๆ€งๅ็ฝฎๆŠ€ๆœฏ๏ผŒ็›ธๅฏนไบŽ Rotary Embedding ่ฎก็ฎ—้‡ๆ›ดๅฐ๏ผŒๅฏนๆŽจ็†ๆ€ง่ƒฝๆœ‰ๆ˜พ่‘—ๆๅ‡๏ผ›ไธŽๆ ‡ๅ‡†็š„ LLaMA-13B ็›ธๆฏ”๏ผŒ็”Ÿๆˆ 2000 ไธช tokens ็š„ๅนณๅ‡ๆŽจ็†้€Ÿๅบฆ (tokens/s)๏ผŒๅฎžๆต‹ๆๅ‡ 31.6%๏ผš

Model tokens/s
LLaMA-13B 19.4
Baichuan-13B 25.4

ๅ…ทไฝ“ๅ‚ๆ•ฐๅ’Œ่งไธ‹่กจ

ๆจกๅž‹ๅ็งฐ ้šๅซๅฑ‚็ปดๅบฆ ๅฑ‚ๆ•ฐ ๅคดๆ•ฐ ่ฏ่กจๅคงๅฐ ๆ€ปๅ‚ๆ•ฐ้‡ ่ฎญ็ปƒๆ•ฐๆฎ๏ผˆtokens๏ผ‰ ไฝ็ฝฎ็ผ–็  ๆœ€ๅคง้•ฟๅบฆ
Baichuan-7B 4,096 32 32 64,000 7,000,559,616 1.2ไธ‡ไบฟ RoPE 4,096
Baichuan-13B 5,120 40 40 64,000 13,264,901,120 1.4ไธ‡ไบฟ ALiBi 4,096

The overall model is based on Baichuan-7B. In order to achieve better inference performance, Baichuan-13B uses ALiBi linear bias technology, which has a smaller computational load compared to Rotary Embedding, and significantly improves inference performance. Compared with the standard LLaMA-13B, the average inference speed (tokens/s) for generating 2000 tokens has been tested to increase by 31.6%:

Model tokens/s
LLaMA-13B 19.4
Baichuan-13B 25.4

The specific parameters are as follows:

Model Name Hidden Size Num Layers Num Attention Heads Vocab Size Total Params Training Dats๏ผˆtokens๏ผ‰ Position Embedding Max Length
Baichuan-7B 4,096 32 32 64,000 7,000,559,616 1.2ไธ‡ไบฟ RoPE 4,096
Baichuan-13B 5,120 40 40 64,000 13,264,901,120 1.4ไธ‡ไบฟ ALiBi 4,096

ๅ…่ดฃๅฃฐๆ˜Ž

ๆˆ‘ไปฌๅœจๆญคๅฃฐๆ˜Ž๏ผŒๆˆ‘ไปฌ็š„ๅผ€ๅ‘ๅ›ข้˜ŸๅนถๆœชๅŸบไบŽ Baichuan-13B ๆจกๅž‹ๅผ€ๅ‘ไปปไฝ•ๅบ”็”จ๏ผŒๆ— ่ฎบๆ˜ฏๅœจ iOSใ€Androidใ€็ฝ‘้กตๆˆ–ไปปไฝ•ๅ…ถไป–ๅนณๅฐใ€‚ๆˆ‘ไปฌๅผบ็ƒˆๅ‘ผๅๆ‰€ๆœ‰ไฝฟ็”จ่€…๏ผŒไธ่ฆๅˆฉ็”จ Baichuan-13B ๆจกๅž‹่ฟ›่กŒไปปไฝ•ๅฑๅฎณๅ›ฝๅฎถ็คพไผšๅฎ‰ๅ…จๆˆ–่ฟๆณ•็š„ๆดปๅŠจใ€‚ๅฆๅค–๏ผŒๆˆ‘ไปฌไนŸ่ฆๆฑ‚ไฝฟ็”จ่€…ไธ่ฆๅฐ† Baichuan-13B ๆจกๅž‹็”จไบŽๆœช็ป้€‚ๅฝ“ๅฎ‰ๅ…จๅฎกๆŸฅๅ’Œๅค‡ๆกˆ็š„ไบ’่”็ฝ‘ๆœๅŠกใ€‚ๆˆ‘ไปฌๅธŒๆœ›ๆ‰€ๆœ‰็š„ไฝฟ็”จ่€…้ƒฝ่ƒฝ้ตๅฎˆ่ฟ™ไธชๅŽŸๅˆ™๏ผŒ็กฎไฟ็ง‘ๆŠ€็š„ๅ‘ๅฑ•่ƒฝๅœจ่ง„่Œƒๅ’Œๅˆๆณ•็š„็Žฏๅขƒไธ‹่ฟ›่กŒใ€‚

ๆˆ‘ไปฌๅทฒ็ปๅฐฝๆˆ‘ไปฌๆ‰€่ƒฝ๏ผŒๆฅ็กฎไฟๆจกๅž‹่ฎญ็ปƒ่ฟ‡็จ‹ไธญไฝฟ็”จ็š„ๆ•ฐๆฎ็š„ๅˆ่ง„ๆ€งใ€‚็„ถ่€Œ๏ผŒๅฐฝ็ฎกๆˆ‘ไปฌๅทฒ็ปๅšๅ‡บไบ†ๅทจๅคง็š„ๅŠชๅŠ›๏ผŒไฝ†็”ฑไบŽๆจกๅž‹ๅ’Œๆ•ฐๆฎ็š„ๅคๆ‚ๆ€ง๏ผŒไปๆœ‰ๅฏ่ƒฝๅญ˜ๅœจไธ€ไบ›ๆ— ๆณ•้ข„่ง็š„้—ฎ้ข˜ใ€‚ๅ› ๆญค๏ผŒๅฆ‚ๆžœ็”ฑไบŽไฝฟ็”จ Baichuan-13B ๅผ€ๆบๆจกๅž‹่€Œๅฏผ่‡ด็š„ไปปไฝ•้—ฎ้ข˜๏ผŒๅŒ…ๆ‹ฌไฝ†ไธ้™ไบŽๆ•ฐๆฎๅฎ‰ๅ…จ้—ฎ้ข˜ใ€ๅ…ฌๅ…ฑ่ˆ†่ฎบ้ฃŽ้™ฉ๏ผŒๆˆ–ๆจกๅž‹่ขซ่ฏฏๅฏผใ€ๆปฅ็”จใ€ไผ ๆ’ญๆˆ–ไธๅฝ“ๅˆฉ็”จๆ‰€ๅธฆๆฅ็š„ไปปไฝ•้ฃŽ้™ฉๅ’Œ้—ฎ้ข˜๏ผŒๆˆ‘ไปฌๅฐ†ไธๆ‰ฟๆ‹…ไปปไฝ•่ดฃไปปใ€‚

We hereby declare that our development team has not developed any applications based on the Baichuan-13B model, whether on iOS, Android, the web, or any other platform. We strongly urge all users not to use the Baichuan-13B model for any activities that harm national social security or are illegal. In addition, we also ask users not to use the Baichuan-13B model for internet services that have not undergone appropriate security review and filing. We hope that all users will adhere to this principle to ensure that technological development takes place in a regulated and legal environment.

We have done our utmost to ensure the compliance of the data used in the model training process. However, despite our great efforts, due to the complexity of the model and data, there may still be some unforeseen issues. Therefore, we will not take any responsibility for any issues arising from the use of the Baichuan-13B open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, misused, disseminated, or improperly exploited.

่ฎญ็ปƒ่ฏฆๆƒ…

่ฎญ็ปƒๅ…ทไฝ“่ฎพ็ฝฎๅ‚่งBaichuan-13Bใ€‚

For specific training settings, please refer to Baichuan-13B.

ๆต‹่ฏ„็ป“ๆžœ

C-Eval

Model 5-shot STEM Social Sciences Humanities Others Average
Baichuan-7B 38.2 52.0 46.2 39.3 42.8
Chinese-Alpaca-Plus-13B 35.2 45.6 40.0 38.2 38.8
Vicuna-13B 30.5 38.2 32.5 32.5 32.8
Chinese-LLaMA-Plus-13B 30.3 38.0 32.9 29.1 32.1
Ziya-LLaMA-13B-Pretrain 27.6 34.4 32.0 28.6 30.0
LLaMA-13B 27.0 33.6 27.7 27.6 28.5
moss-moon-003-base (16B) 27.0 29.1 27.2 26.9 27.4
Baichuan-13B-Base 45.9 63.5 57.2 49.3 52.4
Baichuan-13B-Chat 43.7 64.6 56.2 49.2 51.5

MMLU

Model 5-shot STEM Social Sciences Humanities Others Average
Vicuna-13B 40.4 60.5 49.5 58.4 52.0
LLaMA-13B 36.1 53.0 44.0 52.8 46.3
Chinese-Alpaca-Plus-13B 36.9 48.9 40.5 50.5 43.9
Ziya-LLaMA-13B-Pretrain 35.6 47.6 40.1 49.4 42.9
Baichuan-7B 35.6 48.9 38.4 48.1 42.3
Chinese-LLaMA-Plus-13B 33.1 42.8 37.0 44.6 39.2
moss-moon-003-base (16B) 22.4 22.8 24.2 24.4 23.6
Baichuan-13B-Base 41.6 60.9 47.4 58.5 51.6
Baichuan-13B-Chat 40.9 60.9 48.8 59.0 52.1

่ฏดๆ˜Ž๏ผšๆˆ‘ไปฌ้‡‡็”จไบ† MMLU ๅฎ˜ๆ–น็š„่ฏ„ๆต‹ๆ–นๆกˆใ€‚

CMMLU

Model 5-shot STEM Humanities Social Sciences Others China Specific Average
Baichuan-7B 34.4 47.5 47.6 46.6 44.3 44.0
Vicuna-13B 31.8 36.2 37.6 39.5 34.3 36.3
Chinese-Alpaca-Plus-13B 29.8 33.4 33.2 37.9 32.1 33.4
Chinese-LLaMA-Plus-13B 28.1 33.1 35.4 35.1 33.5 33.0
Ziya-LLaMA-13B-Pretrain 29.0 30.7 33.8 34.4 31.9 32.1
LLaMA-13B 29.2 30.8 31.6 33.0 30.5 31.2
moss-moon-003-base (16B) 27.2 30.4 28.8 32.6 28.7 29.6
Baichuan-13B-Base 41.7 61.1 59.8 59.0 56.4 55.3
Baichuan-13B-Chat 42.8 62.6 59.7 59.0 56.1 55.8

่ฏดๆ˜Ž๏ผšCMMLU ๆ˜ฏไธ€ไธช็ปผๅˆๆ€ง็š„ไธญๆ–‡่ฏ„ไผฐๅŸบๅ‡†๏ผŒไธ“้—จ็”จไบŽ่ฏ„ไผฐ่ฏญ่จ€ๆจกๅž‹ๅœจไธญๆ–‡่ฏญๅขƒไธ‹็š„็Ÿฅ่ฏ†ๅ’ŒๆŽจ็†่ƒฝๅŠ›ใ€‚ๆˆ‘ไปฌ้‡‡็”จไบ†ๅ…ถๅฎ˜ๆ–น็š„่ฏ„ๆต‹ๆ–นๆกˆใ€‚

ๅพฎไฟก็พค็ป„

WeChat

Downloads last month
1,237
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for baichuan-inc/Baichuan-13B-Base

Quantizations
1 model

Spaces using baichuan-inc/Baichuan-13B-Base 32

Papers for baichuan-inc/Baichuan-13B-Base