phi-j-6b / README.md

Update README.md

bfc05a4 verified 9 months ago

1.6 kB

license: apache-2.0
datasets:
  - EleutherAI/pile
language:
  - en
base_model:
  - EleutherAI/gpt-j-6b
tags:
  - gptj
  - causal-lm

⚠️ This model is lightly subtly busted in 20 different ways compared to the original. It is mostly designed for further training (that will implicitly heal it from these subtle busts). You have been warned.

This is a conversion of GPT-J-6b by EleutherAI into a more modern architecture that it still closely maps to (in this case, the Phi 1/1.5/2 architecture). This allows for, primarily, rope scaling, as well as for creating GGUFs (it does not currently support GPT-J's original arch.) See convert.py for the file used to convert the weights.

Note that I was originally going to use the GPT-NeoX architecture because it felt more befitting, there appears to be a bug in the most recent versions of Transformers, so Phi it is!

Also, the partial_rotary_factor is selected to be 0.5 here, despite the fact that this makes no logical sense, as even though it should be 0.25 (rotary_dim / head_dim = 64 / 256 = 0.25), 0.25 is completely babblingly incoherent and 0.5 is basically the same as the original. Whatever.