No default RoPE scaling for long context?

#4
by TomLucidor - opened

Sorry for asking, but the config seems to be only supporting 32K https://huggingface.co/inclusionAI/Ring-mini-sparse-2.0-exp/blob/main/config.json

inclusionAI org

When exceeding a sequence length of 32K, the model needs to enable YaRN. You can refer to SGLang's YaRN configuration and add the following to config.json:

"rope_scaling": {
  "factor": 4.0,
  "original_max_position_embeddings": 32768,
  "rope_type": "yarn"
}

Any setup recommendations for vLLM instead of SGLang?

Sign up or log in to comment