Can this be loaded in transformers? FP8Linear: size mismatch for weight_scale_inv

#21
by mratsim - opened

On Transformers v4.57.3, I get the following error trying to load the model:

RuntimeError: Error(s) in loading state_dict for FP8Linear:
        size mismatch for weight_scale_inv: copying a param with shape torch.Size([8, 32]) from checkpoint, the shape in current model is torch.Size([6, 32]).
Full stacktrace ``` Traceback (most recent call last): File "[...]/main_mimo_v2_flash-nvfp4.py", line 108, in model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/.venv/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 597, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 277, in _wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "[...]/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 5048, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 5468, in _load_pretrained_model _error_msgs, disk_offload_index = load_shard_file(args) ^^^^^^^^^^^^^^^^^^^^^ File "[...]/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 843, in load_shard_file disk_offload_index = _load_state_dict_into_meta_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "[...]/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 770, in _load_state_dict_into_meta_model _load_parameter_into_model(model, param_name, param.to(param_device)) File "[...]/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 667, in _load_parameter_into_model module.load_state_dict({param_type: tensor}, strict=False, assign=True) File "[...]/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2629, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for FP8Linear: size mismatch for weight_scale_inv: copying a param with shape torch.Size([8, 32]) from checkpoint, the shape in current model is torch.Size([6, 32]). ```

Called with

model = AutoModelForCausalLM.from_pretrained(MODEL_ID, offload_folder = "./offload/")

Sign up or log in to comment