File size: 3,174 Bytes
8fb2bc7
 
ca47897
 
 
 
8fb2bc7
 
ca47897
8fb2bc7
ca47897
8fb2bc7
ca47897
8fb2bc7
06bab66
8fb2bc7
ca47897
8fb2bc7
06bab66
d89cd8d
 
 
 
 
 
 
 
06bab66
 
98e959a
06bab66
 
 
 
 
d89cd8d
ca47897
8fb2bc7
ca47897
 
 
 
8fb2bc7
8528699
 
 
 
8fb2bc7
ca47897
 
8528699
 
 
ca47897
8fb2bc7
ca47897
 
 
 
 
 
 
 
 
 
 
 
8fb2bc7
ca47897
 
 
8fb2bc7
ca47897
8528699
 
 
ca47897
8528699
ca47897
8528699
 
 
ca47897
8fb2bc7
8528699
8fb2bc7
ca47897
8fb2bc7
ca47897
8528699
 
ca47897
8fb2bc7
06bab66
 
 
 
ca47897
8fb2bc7
ca47897
8fb2bc7
ca47897
8fb2bc7
ca47897
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
library_name: transformers
license: mit
language:
- ja
- en
---

# Stockmark-2-100B-Instruct

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/607ef1c3e758c3c5a2959eab/AbyPvKu-FBY6RDYhGi1KX.jpeg)

## Model description

**Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. This version improves instruction-following ability and adds support for long-context (32k), compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)).

This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).

### Features

- Model Type: Causal Language Model
- Number of Parameters: 96B
- Number of Layers: 86
- Number of Attention Heads (GQA): 72 for Q and 8 for KV
- Context Length: 32k
- Supported Languages: Japanese and English

## Model performance

### Japanese MT-bench
| Model | Average | coding | extraction | humanities | math | reasoning | roleplay | stem |
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| Stockmark-2-100B-Instruct | 7.87 | 7.07 | 8.35 | 8.73 | 7.57 | 5.45 | 8.65 | 8.33 | 8.83 |
| Stockmark-2-100B-Instruct-beta | 7.71 | 6.73 | 8.23 | 8.63 | 7.01 | 5.85 | 8.54 | 8.07 | 8.61 |


## How to use

### transformers
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "stockmark/Stockmark-2-100B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="bfloat16")

instruction = "自然言語処理とは?"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": instruction}],
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.inference_mode():
  tokens = model.generate(
    input_ids,
    max_new_tokens = 512,
    do_sample = True,
    temperature = 0.7,
    top_p = 0.95
  )
    
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)
```

### vLLM
```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="stockmark/Stockmark-2-100B-Instruct",
    tensor_parallel_size=4,
    dtype="bfloat16"
)

sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.95,
    max_tokens=512
)

conversation = [{"role": "user", "content": "自然言語処理とは?"}]

outputs = llm.chat(conversation, sampling_params=sampling_params)

for output in outputs:
    generated_text = output.outputs[0].text
    print(generated_text)
```

## Libraries used for training
- Pretraining: [NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
- Posttraining: [huggingface/trl](https://github.com/huggingface/trl)

## License

[MIT](https://opensource.org/licenses/MIT)

## Developed by

[Stockmark Inc.](https://stockmark.co.jp/)