Update inference examples to use the correct chat template

#2
by mario-sanz - opened
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -46,14 +46,14 @@ You can use OLMo with the standard HuggingFace transformers library:
46
  from transformers import AutoModelForCausalLM, AutoTokenizer
47
  olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
48
  tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
49
- message = ["Language modeling is "]
50
- inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
51
  # optional verifying cuda
52
  # inputs = {k: v.to('cuda') for k,v in inputs.items()}
53
  # olmo = olmo.to('cuda')
54
  response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
55
- print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
56
- >> 'Language modeling is a key component of any text-based application, but its effectiveness...'
57
  ```
58
 
59
  For faster performance, you can quantize the model using the following method:
 
46
  from transformers import AutoModelForCausalLM, AutoTokenizer
47
  olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
48
  tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
49
+ message = [{"role": "user", "content": "Who would win in a fight - a dinosaur or a cow named Moo Moo?"}]
50
+ inputs = tokenizer.apply_chat_template(message, add_generation_prompt=True, return_tensors='pt', return_dict=True)
51
  # optional verifying cuda
52
  # inputs = {k: v.to('cuda') for k,v in inputs.items()}
53
  # olmo = olmo.to('cuda')
54
  response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
55
+ print(tokenizer.decode(response[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
56
+ >> 'This is a fun and imaginative question! Let’s break it down...'
57
  ```
58
 
59
  For faster performance, you can quantize the model using the following method: