allenai
/

Olmo-3.1-32B-Instruct

@@ -46,14 +46,14 @@ You can use OLMo with the standard HuggingFace transformers library:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
 tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
-message = ["Language modeling is "]
-inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
 # optional verifying cuda
 # inputs = {k: v.to('cuda') for k,v in inputs.items()}
 # olmo = olmo.to('cuda')
 response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
-print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
->> 'Language modeling is  a key component of any text-based application, but its effectiveness...'
 ```
 For faster performance, you can quantize the model using the following method:

 from transformers import AutoModelForCausalLM, AutoTokenizer
 olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
 tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
+message = [{"role": "user", "content": "Who would win in a fight - a dinosaur or a cow named Moo Moo?"}]
+inputs = tokenizer.apply_chat_template(message, add_generation_prompt=True, return_tensors='pt', return_dict=True)
 # optional verifying cuda
 # inputs = {k: v.to('cuda') for k,v in inputs.items()}
 # olmo = olmo.to('cuda')
 response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
+print(tokenizer.decode(response[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
+>> 'This is a fun and imaginative question! Let’s break it down...'
 ```
 For faster performance, you can quantize the model using the following method: