|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
language: |
|
|
- ja |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Stockmark-2-100B-Instruct |
|
|
|
|
|
 |
|
|
|
|
|
## Model description |
|
|
|
|
|
**Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. This version improves instruction-following ability and adds support for long-context (32k), compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)). |
|
|
|
|
|
This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html). |
|
|
|
|
|
### Features |
|
|
|
|
|
- Model Type: Causal Language Model |
|
|
- Number of Parameters: 96B |
|
|
- Number of Layers: 86 |
|
|
- Number of Attention Heads (GQA): 72 for Q and 8 for KV |
|
|
- Context Length: 32k |
|
|
- Supported Languages: Japanese and English |
|
|
|
|
|
## Model performance |
|
|
|
|
|
### Japanese MT-bench |
|
|
| Model | Average | coding | extraction | humanities | math | reasoning | roleplay | stem | |
|
|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |
|
|
| Stockmark-2-100B-Instruct | 7.87 | 7.07 | 8.35 | 8.73 | 7.57 | 5.45 | 8.65 | 8.33 | 8.83 | |
|
|
| Stockmark-2-100B-Instruct-beta | 7.71 | 6.73 | 8.23 | 8.63 | 7.01 | 5.85 | 8.54 | 8.07 | 8.61 | |
|
|
|
|
|
|
|
|
## How to use |
|
|
|
|
|
### transformers |
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_name = "stockmark/Stockmark-2-100B-Instruct" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="bfloat16") |
|
|
|
|
|
instruction = "自然言語処理とは?" |
|
|
input_ids = tokenizer.apply_chat_template( |
|
|
[{"role": "user", "content": instruction}], |
|
|
add_generation_prompt=True, |
|
|
return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
with torch.inference_mode(): |
|
|
tokens = model.generate( |
|
|
input_ids, |
|
|
max_new_tokens = 512, |
|
|
do_sample = True, |
|
|
temperature = 0.7, |
|
|
top_p = 0.95 |
|
|
) |
|
|
|
|
|
output = tokenizer.decode(tokens[0], skip_special_tokens=True) |
|
|
print(output) |
|
|
``` |
|
|
|
|
|
### vLLM |
|
|
```python |
|
|
from vllm import LLM, SamplingParams |
|
|
|
|
|
llm = LLM( |
|
|
model="stockmark/Stockmark-2-100B-Instruct", |
|
|
tensor_parallel_size=4, |
|
|
dtype="bfloat16" |
|
|
) |
|
|
|
|
|
sampling_params = SamplingParams( |
|
|
temperature=0.7, |
|
|
top_p=0.95, |
|
|
max_tokens=512 |
|
|
) |
|
|
|
|
|
conversation = [{"role": "user", "content": "自然言語処理とは?"}] |
|
|
|
|
|
outputs = llm.chat(conversation, sampling_params=sampling_params) |
|
|
|
|
|
for output in outputs: |
|
|
generated_text = output.outputs[0].text |
|
|
print(generated_text) |
|
|
``` |
|
|
|
|
|
## Libraries used for training |
|
|
- Pretraining: [NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) |
|
|
- Posttraining: [huggingface/trl](https://github.com/huggingface/trl) |
|
|
|
|
|
## License |
|
|
|
|
|
[MIT](https://opensource.org/licenses/MIT) |
|
|
|
|
|
## Developed by |
|
|
|
|
|
[Stockmark Inc.](https://stockmark.co.jp/) |