Model doesn't stop generating.

by rb17 - opened Apr 29, 2024

Discussion

rb17

Apr 29, 2024

•

edited Apr 29, 2024

when setting the max_tokens to 1024 the model doesn't stop generating.

It looks like in the official Llama HF repo, they do pay attention to the stop tokens -

Taken from their HF official repo (https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)

"""
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
"""

How can we overcome it with mlx?

here is a screenshot taken -

kechan

MLX Community org Apr 29, 2024

I had the same issue and posted this. https://github.com/ml-explore/mlx-examples/issues/737

i wasn’t aware this was recently changed in the json, mlx-community/Meta-Llama-3-8B-Instruct-4bit (#1) - Fix eos_token in tokenizer_config.json

according to the chat template i mentioned, “eot_id” seemed to be the right terminating signal token. I guess instead of my hack, i can go back and change the config json (hadn’t played with transformers and forgotten all the config params)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment