Multiple bugs prevent model.generate() from working - AttributeError and KeyError issues
The model's implementation has several bugs that prevent model.generate() from working properly. The issues appear to be related to inadequate null checking when handling cached key-value pairs during generation.
Steps to Reproduce:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("lelapa/InkubaLM-0.4B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("lelapa/InkubaLM-0.4B", trust_remote_code=True)
model.to('cuda')
text = "Today I planned to"
inputs = tokenizer(text, return_tensors="pt").to('cuda')
outputs = model.generate(
inputs.input_ids,
attention_mask=inputs.attention_mask,
max_length=20,
pad_token_id=tokenizer.eos_token_id
)
Errors Encountered:
- Line 578 in
VulavulaLlamaModel.forward():
AttributeError: 'NoneType' object has no attribute 'shape'
Code: past_key_values_length = past_key_values[0][0].shape[2]
- Line 232 in
VulavulaLlamaAttention.forward():
AttributeError: 'NoneType' object has no attribute 'shape'
Code: kv_seq_len += past_key_value[0].shape[-2]
- Lines 149-150 in
VulavulaLlamaAttention.forward():
TypeError: expected Tensor as element 0 in argument 0, but got NoneType
Code: key_states = torch.cat([past_key_value[0], key_states], dim=2)
- Cache access issues:
KeyError: 'Cache only has 1 layers, attempted to access layer with index 1'
Pattern:
All errors seem to stem from the code assuming that if past_key_values or past_key_value is not None, then it contains valid tensor data. However, during the generation process, these structures can contain None elements or have indexing mismatches.
Current Workaround:
Setting use_cache=False works but significantly impacts generation speed.
Environment:
- transformers version: 4.55.4
- torch version: 2.7.1+cu128
- Python version: 3.12.11