license: apache-2.0
datasets:
- EleutherAI/pile
language:
- en
base_model:
- EleutherAI/gpt-j-6b
tags:
- gptj
- causal-lm
This is a conversion of GPT-J-6b by EleutherAI into a more modern architecture that it still closely maps to (in this case, the Phi 1/1.5/2 architecture). This allows for, primarily, rope scaling, as well as for creating GGUFs (it does not currently support GPT-J's original arch.) See convert.py for the file used to convert the weights.
Note that I was originally going to use the GPT-NeoX architecture because it felt more befitting, there appears to be a bug in the most recent versions of Transformers, so Phi it is!
Also, the partial_rotary_factor is selected to be 0.5 here, despite the fact that this makes no logical sense, as even though it should be 0.25 (rotary_dim / head_dim = 64 / 256 = 0.25), 0.25 is completely babblingly incoherent and 0.5 is basically the same as the original. Whatever.