--- library_name: transformers license: mit base_model: - stockmark/Stockmark-2-100B-Instruct-beta --- # Stockmark-2-100B-Instruct-beta-AWQ This repo contains the AWQ-quantized 4-bit version of [Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta) ## Example **Please use the float16 data type when loading the model. The bfloat16 data type is not supported in this model.** ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ") model = AutoModelForCausalLM.from_pretrained("stockmark/Stockmark-2-100B-Instruct-beta-AWQ", device_map="auto", torch_dtype=torch.float16) instruction = "自然言語処理とは?" input_ids = tokenizer.apply_chat_template( [{"role": "user", "content": instruction}], add_generation_prompt=True, return_tensors="pt" ).to(model.device) with torch.inference_mode(): tokens = model.generate( input_ids, max_new_tokens = 512, do_sample = True, temperature = 0.7, top_p = 0.95, repetition_penalty = 1.05 ) output = tokenizer.decode(tokens[0], skip_special_tokens=True) print(output) ```