keyfan
/

grok-1-hf

Text Generation

Model card Files Files and versions

Is RoPE scaling correct?

#2

by Noeda - opened Mar 20, 2024

Rope theta is 100k here: https://huggingface.co/keyfan/grok-1-hf/blob/main/config.json#L30 (unless I missed it being overridden anywhere in code).

It's 10k here: https://github.com/xai-org/grok-1/blob/main/model.py#L801

Owner Mar 20, 2024

You're right, I forget to correct that. Thank you for spotting this out.

Thanks :) Also thanks for the HF version. It's much easier to follow than the original Jax implementation.

Noeda changed discussion status to closed Mar 20, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment