|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
language: |
|
|
- ja |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Stockmark-2-100B-Instruct-beta |
|
|
|
|
|
 |
|
|
|
|
|
## Model description |
|
|
|
|
|
**Stockmark-2-100B-Instruct-beta** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 1.5 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training with synthetic data in Japanese to enhance its ability to follow instructions. This synthetic data was generated using Qwen2.5-32B-Instruct. |
|
|
|
|
|
As a beta release, Stockmark-2-100b-Instruct-beta is still undergoing improvements and evaluations. Feedback and insights from users will help refine future versions. |
|
|
|
|
|
See [our blog](https://stockmark-tech.hatenablog.com/entry/2025/03/06/114203) for the detail. |
|
|
|
|
|
This project is supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html). |
|
|
|
|
|
## How to use |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta") |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"stockmark/Stockmark-2-100B-Instruct-beta", device_map="auto", torch_dtype=torch.bfloat16 |
|
|
) |
|
|
|
|
|
instruction = "自然言語処理とは?" |
|
|
input_ids = tokenizer.apply_chat_template( |
|
|
[{"role": "user", "content": instruction}], add_generation_prompt=True, return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
with torch.inference_mode(): |
|
|
tokens = model.generate( |
|
|
input_ids, |
|
|
max_new_tokens = 512, |
|
|
do_sample = True, |
|
|
temperature = 0.7, |
|
|
top_p = 0.95, |
|
|
repetition_penalty = 1.05 |
|
|
) |
|
|
|
|
|
output = tokenizer.decode(tokens[0], skip_special_tokens=True) |
|
|
print(output) |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
[MIT](https://opensource.org/licenses/MIT) |
|
|
|
|
|
## Developed by |
|
|
|
|
|
[Stockmark Inc.](https://stockmark.co.jp/) |
|
|
|
|
|
## Author |
|
|
|
|
|
Takahiro Omi |