Transformers backend incompatible: conflicting imports between SlidingWindowCache and use_kernel_forward_from_hub

#88
by aahringer - opened

Environment

  • transformers: tested with 4.47.1, 5.2.0
  • Python: 3.11
  • CUDA: 12.4
  • Model: PaddlePaddle/PaddleOCR-VL (revision: main / 2b77538)

Description

The PaddleOCR-VL model cannot be loaded via HuggingFace Transformers due to conflicting imports in modeling_paddleocr_vl.py. The model code imports from both old and new transformers APIs that are mutually exclusive:

Line 27:

from transformers.cache_utils import (
    Cache,
    DynamicCache,
    SlidingWindowCache,  # Removed in transformers 4.48.0
    StaticCache,
)

Line 34:

from transformers.integrations import use_kernel_forward_from_hub  # Added in transformers 5.x

The Problem

  • SlidingWindowCache was removed from transformers.cache_utils in version 4.48.0
  • use_kernel_forward_from_hub was added to transformers.integrations in version 5.x

No version of transformers supports both imports, making direct Transformers inference impossible.

Reproduction Steps

from transformers import AutoModel, AutoTokenizer

model_name = "PaddlePaddle/PaddleOCR-VL"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)  # Fails here

Error Messages

With transformers < 4.48.0 (e.g., 4.47.1):

ImportError: cannot import name 'use_kernel_forward_from_hub' from 'transformers.integrations'

With transformers >= 4.48.0 (e.g., 5.2.0):

ImportError: cannot import name 'SlidingWindowCache' from 'transformers.cache_utils'

Suggested Fix

Option 1: Use conditional imports based on transformers version

try:
    from transformers.cache_utils import SlidingWindowCache
except ImportError:
    SlidingWindowCache = None  # Or implement a fallback

try:
    from transformers.integrations import use_kernel_forward_from_hub
except ImportError:
    use_kernel_forward_from_hub = lambda x: x  # No-op decorator

Option 2: Pin to a specific transformers version range and update imports accordingly

Workaround

Currently, the only working option is to use vLLM (which has native PaddleOCR-VL support and doesn't execute the model's custom Python code). However, vLLM requires CUDA 12.8+, limiting compatibility with older GPU drivers.

Additional Context

  • vLLM backend works fine because it uses native PaddleOCRVLForConditionalGeneration implementation
  • This issue blocks users who need CUDA 12.4 compatibility (vLLM requires 12.8+)
  • Related: Similar issue reported for Phi-4-multimodal-instruct (discussion #71)

Sign up or log in to comment