From Megratron GPT-2 or GPT-3?

#75

by jmassot - opened Aug 10, 2022

Aug 10, 2022

Hi all,
I have a question regarding the architecture. In the HF documentation, it is mentioned that Bloom is similar to GPT-3 but the model card indicates that the architecture is derived from Megatron-GPT2.
GPT-3 has a similar parameter count as Bloom and also has fixed bugs compared to GPT-2.
What is the best way to describe Bloom's architecture? GPT-2 or GPT-3 like?
I am a little bit lost here when I need to choose the best ancestor for Bloom :-)
Thanks
Best regards
Jerome

R1MN

Aug 14, 2022

The authors state under the ,,technical specifications" that BLOOM's code is a modified version of Megatron-LM GPT2.

jmassot

Aug 16, 2022

Thanks R1MN. Technical specifications indicate modified from Megraton-LM GPT2 but after it is mentioned in the HF documentation that Bloom is a GPT-3 like model...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment