which is the context length of GLM-4.7 ? 202752 or 128000 ?

#33
by tuo02 - opened

In config.json file, max_position_embeddings=202752; while in tokenizer_config.json file, model_max_length=128000, which is the context length of GLM-4.7 ? And sglang output this message "Token indices sequence length is longer than the specified maximum sequence length for this model (192619 > 128000). Running this sequence through the model will result in indexing errors"

The short answer is that max_position_embeddings=202752 represents the actual context length.
The model_max_length=128000 in the tokenizer configuration is a soft default that causes your sglang warning. It’s not the actual architectural limit.


Here’s what’s happening:

max_position_embeddings (config.json) defines the maximum number of positions that the rotary positional embeddings (RoPE) can encode. This is the hard architectural ceiling.
The model was trained to handle up to 202,752 tokens. Beyond this, you’d encounter actual indexing errors because there are no learned or interpolated position encodings.

model_max_length (tokenizer_config.json) is a tokenizer-level hint that frameworks use as a default truncation or warning threshold.
It’s often set conservatively (here, 128K is the maximum output length). This doesn’t reflect what the model can actually accept as input. This is what sglang is reading when it throws that warning.

To fix the sglang warning, you can override the tokenizer’s limit when launching the model:

# If using sglang’s server:
python -m sglang.launch_server \
  —model-path /path/to/glm-4.7 \
  —context-length 202752 \
  …

Alternatively, if you’re loading the model via the Python API, set context_length=202752 explicitly. The warning is solely due to the mismatch between the tokenizer’s model_max_length and the model’s actual input capacity. As long as you stay under 202,752 tokens, you won’t encounter real indexing errors. It’s safe to ignore the warning or suppress it by overriding the tokenizer’s limit.

It’s worth noting that practical usable context may still be closer to ~200K tokens once you account for system prompt overhead and special tokens. However, the 128K figure is definitely incorrect for input. It’s the output cap.

The short answer is that max_position_embeddings=202752 represents the actual context length.
The model_max_length=128000 in the tokenizer configuration is a soft default that causes your sglang warning. It’s not the actual architectural limit.


Here’s what’s happening:

max_position_embeddings (config.json) defines the maximum number of positions that the rotary positional embeddings (RoPE) can encode. This is the hard architectural ceiling.
The model was trained to handle up to 202,752 tokens. Beyond this, you’d encounter actual indexing errors because there are no learned or interpolated position encodings.

model_max_length (tokenizer_config.json) is a tokenizer-level hint that frameworks use as a default truncation or warning threshold.
It’s often set conservatively (here, 128K is the maximum output length). This doesn’t reflect what the model can actually accept as input. This is what sglang is reading when it throws that warning.

To fix the sglang warning, you can override the tokenizer’s limit when launching the model:

# If using sglang’s server:
python -m sglang.launch_server \
  —model-path /path/to/glm-4.7 \
  —context-length 202752 \
  …

Alternatively, if you’re loading the model via the Python API, set context_length=202752 explicitly. The warning is solely due to the mismatch between the tokenizer’s model_max_length and the model’s actual input capacity. As long as you stay under 202,752 tokens, you won’t encounter real indexing errors. It’s safe to ignore the warning or suppress it by overriding the tokenizer’s limit.

It’s worth noting that practical usable context may still be closer to ~200K tokens once you account for system prompt overhead and special tokens. However, the 128K figure is definitely incorrect for input. It’s the output cap.

Thank you

202752 is the max

Sign up or log in to comment