zero embedding for token_id == `1` (<BOS>)

by poedator - opened Apr 21, 2024

Apr 21, 2024

WARNING : This model and its 160M sibling have ALL ZEROS EMBEDDING for token_id==1 (<BOS>). This creates confusion in measuring StatisCache sequence length. The transformers maintainers chose to detect it based on non-zero cache values, but the all-zeros embedding distorts the get_seq_length(). No blame here, just a combination of design decisions with unpredictable results.
See the relevant transformers line here https://github.com/huggingface/transformers/blob/8c12690cecbb97e187861e386f7a0ac790e4236c/src/transformers/cache_utils.py#L414

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment