Add TF weights

by joaogante - opened Jun 6, 2022

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-0

joaogante

Jun 6, 2022

Validated by the pt_to_tf CLI. Max crossload hidden state difference=1.121e-05; Max converted hidden state difference=1.121e-05.

Add TF weights291b8b62

joaogante

Jun 6, 2022

•

edited Jun 8, 2022

The weights look good according to our conversion tool, but they take 2x storage. Are these weights stored in a 16-bit format? ( @valhalla )

EDIT -- after checking with stricter tests, there are further differences between PT and TF. The original question is still relevant, but do not merge these weights.

patrickvonplaten

Jun 10, 2022

Sounds good! @valhalla do you know?

patrickvonplaten

Jun 10, 2022

Just checked the PT weights are in float16 indeed. BTW an easy rule of thumb is "size of model checkpoint" / 4 = model parameters if in float32 . Here 1GB file would mean 250M parameters but we have 564 -> so it's most likely fp16

joaogante

Jun 13, 2022

That makes sense. There is a slightly higher PT-to-TF error than usual (~1e-4) in the internal layers, but being float16 probably explains the difference 👍

joaogante

Jun 13, 2022

Merging!

joaogante changed pull request status to merged Jun 13, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment