File size: 3,174 Bytes
8fb2bc7 ca47897 8fb2bc7 ca47897 8fb2bc7 ca47897 8fb2bc7 ca47897 8fb2bc7 06bab66 8fb2bc7 ca47897 8fb2bc7 06bab66 d89cd8d 06bab66 98e959a 06bab66 d89cd8d ca47897 8fb2bc7 ca47897 8fb2bc7 8528699 8fb2bc7 ca47897 8528699 ca47897 8fb2bc7 ca47897 8fb2bc7 ca47897 8fb2bc7 ca47897 8528699 ca47897 8528699 ca47897 8528699 ca47897 8fb2bc7 8528699 8fb2bc7 ca47897 8fb2bc7 ca47897 8528699 ca47897 8fb2bc7 06bab66 ca47897 8fb2bc7 ca47897 8fb2bc7 ca47897 8fb2bc7 ca47897 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
library_name: transformers
license: mit
language:
- ja
- en
---
# Stockmark-2-100B-Instruct

## Model description
**Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. This version improves instruction-following ability and adds support for long-context (32k), compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)).
This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
### Features
- Model Type: Causal Language Model
- Number of Parameters: 96B
- Number of Layers: 86
- Number of Attention Heads (GQA): 72 for Q and 8 for KV
- Context Length: 32k
- Supported Languages: Japanese and English
## Model performance
### Japanese MT-bench
| Model | Average | coding | extraction | humanities | math | reasoning | roleplay | stem |
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| Stockmark-2-100B-Instruct | 7.87 | 7.07 | 8.35 | 8.73 | 7.57 | 5.45 | 8.65 | 8.33 | 8.83 |
| Stockmark-2-100B-Instruct-beta | 7.71 | 6.73 | 8.23 | 8.63 | 7.01 | 5.85 | 8.54 | 8.07 | 8.61 |
## How to use
### transformers
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "stockmark/Stockmark-2-100B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="bfloat16")
instruction = "自然言語処理とは?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": instruction}],
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.inference_mode():
tokens = model.generate(
input_ids,
max_new_tokens = 512,
do_sample = True,
temperature = 0.7,
top_p = 0.95
)
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)
```
### vLLM
```python
from vllm import LLM, SamplingParams
llm = LLM(
model="stockmark/Stockmark-2-100B-Instruct",
tensor_parallel_size=4,
dtype="bfloat16"
)
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.95,
max_tokens=512
)
conversation = [{"role": "user", "content": "自然言語処理とは?"}]
outputs = llm.chat(conversation, sampling_params=sampling_params)
for output in outputs:
generated_text = output.outputs[0].text
print(generated_text)
```
## Libraries used for training
- Pretraining: [NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
- Posttraining: [huggingface/trl](https://github.com/huggingface/trl)
## License
[MIT](https://opensource.org/licenses/MIT)
## Developed by
[Stockmark Inc.](https://stockmark.co.jp/) |