File size: 1,249 Bytes

54f067d
 
684ad10
 
 
54f067d
 
f3564c8

---
library_name: transformers
license: mit
base_model:
- stockmark/Stockmark-2-100B-Instruct-beta
---

# Stockmark-2-100B-Instruct-beta-AWQ

This repo contains the AWQ-quantized 4-bit version of [Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)

## Example

**Please use the float16 data type when loading the model. The bfloat16 data type is not supported in this model.**

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ")
model = AutoModelForCausalLM.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ", device_map="auto", torch_dtype=torch.float16)

instruction = "自然言語処理とは？"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": instruction}], add_generation_prompt=True, return_tensors="pt"
).to(model.device)

with torch.inference_mode():
    tokens = model.generate(
        input_ids,
        max_new_tokens = 512,
        do_sample = True,
        temperature = 0.7,
        top_p = 0.95,
        repetition_penalty = 1.05
    )
    
output = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(output)
```