| 160 intermediary checkpoints from the tr1-13B training | |
| these models have a bug in them. While we are fixing things if you try to use any of these please run it through this script: | |
| ``` | |
| python -c ' | |
| import sys, torch | |
| f=sys.argv[1] | |
| sd=torch.load(f) | |
| d=2048 | |
| for k in sd.keys(): | |
| if k.endswith(".attn.bias"): | |
| sd[k] = torch.tril(torch.ones((d, d), dtype=torch.float16)).view(1, 1, d, d) | |
| torch.save(sd, f) | |
| ' global_step594/pytorch_model.bin | |
| ``` |