Clean up rope params; ensure transformers 4.55/5.0 compatibility

#1
by abhgarg - opened
NVIDIA org
  • Remove duplicate top-level rope_scaling block and stray rope_theta from config.json
  • Remove duplicate 'type' key from rope_parameters
  • For 3B-Base/8B-Base: set max_position_embeddings=4096 and factor=0.25 to match training
  • Mirror rope_theta and rope_scaling from rope_parameters in MinistralDLMConfig for v4.55 yarn
  • Drop unused sdpa_mask_older_torch import (removed in transformers v5.0)
  • Bump transformers_version to 5.0.0
  • In linear_spec_generate_mp, guard direct past_kv.key_cache / value_cache access behind a hasattr(past_kv, 'layers') check so v5.0's DynamicCache API works too
YongganFu changed pull request status to merged

Sign up or log in to comment