|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
base_model: |
|
|
- stockmark/Stockmark-2-100B-Instruct-beta |
|
|
--- |
|
|
|
|
|
# Stockmark-2-100B-Instruct-beta-AWQ |
|
|
|
|
|
This repo contains the AWQ-quantized 4-bit version of [Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta) |
|
|
|
|
|
## Example |
|
|
|
|
|
**Please use the float16 data type when loading the model. The bfloat16 data type is not supported in this model.** |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ") |
|
|
model = AutoModelForCausalLM.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ", device_map="auto", torch_dtype=torch.float16) |
|
|
|
|
|
instruction = "自然言語処理とは?" |
|
|
input_ids = tokenizer.apply_chat_template( |
|
|
[{"role": "user", "content": instruction}], add_generation_prompt=True, return_tensors="pt" |
|
|
).to(model.device) |
|
|
|
|
|
with torch.inference_mode(): |
|
|
tokens = model.generate( |
|
|
input_ids, |
|
|
max_new_tokens = 512, |
|
|
do_sample = True, |
|
|
temperature = 0.7, |
|
|
top_p = 0.95, |
|
|
repetition_penalty = 1.05 |
|
|
) |
|
|
|
|
|
output = tokenizer.decode(tokens[0], skip_special_tokens=True) |
|
|
print(output) |
|
|
``` |