V4.22.0 model update

by lewtun HF Staff - opened Sep 8, 2022

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+40

-24

lewtun

Hugging Face Internal Testing Organization org Sep 8, 2022

•

edited Sep 8, 2022

This PR:

updates the CLIP model to be compatible with transformers v4.22. The previous version throws an error when trying to load the tokenizer (requires from_slow=True)
sets the vocab size to the default value of associated with the checkpoint this model was derived from (https://huggingface.co/openai/clip-vit-base-patch32/blob/main/config.json#L79). With this change, the model can actually run inference without hitting index out of range errors

cc @ydshieh

V4.22.0 updateb2bf362d

ydshieh

Hugging Face Internal Testing Organization org Sep 15, 2022

Hi @lewtun I understand that the fix is to avoid index out of range, but this also makes the model not-that-tiny -> as it would have somehow larger embedding matrix.
The issue was coming from the tokenizer created here has 1000 tokens, but the tiny model was created using model tester config (where the vocab has size 99)

I believe you can change 49408 to 1000 and it will fix the index error. Let me know if you still encounter issue.

The tiny model creation task needs to be improved - I am working on it.

lewtun changed pull request title from V4.22.0 update to V4.22.0 model update Sep 15, 2022

lewtun

Hugging Face Internal Testing Organization org Sep 15, 2022

As discussed offline, resizing the vocab size in the model config isn't enough - the tokenizer length must also match to ensure the correct input IDs are sent to the model.

One alternative is to:

Train a new tokenizer from scratch on a tiny corpus of vocab size ~100 tokens
Use that new vocab size in the model

In the interest of being pragmatic, we will take this resizing issue in separate PRs to focus on speeding up the ONNX test suite (which is the source of this PR)

ydshieh

Hugging Face Internal Testing Organization org Sep 15, 2022

OK, thanks!

ydshieh changed pull request status to merged Sep 15, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment