Transformers backend incompatible: conflicting imports between SlidingWindowCache and use_kernel_forward_from_hub
Environment
- transformers: tested with 4.47.1, 5.2.0
- Python: 3.11
- CUDA: 12.4
- Model: PaddlePaddle/PaddleOCR-VL (revision: main / 2b77538)
Description
The PaddleOCR-VL model cannot be loaded via HuggingFace Transformers due to conflicting imports in modeling_paddleocr_vl.py. The model code imports from both old and new transformers APIs that are mutually exclusive:
Line 27:
from transformers.cache_utils import (
Cache,
DynamicCache,
SlidingWindowCache, # Removed in transformers 4.48.0
StaticCache,
)
Line 34:
from transformers.integrations import use_kernel_forward_from_hub # Added in transformers 5.x
The Problem
SlidingWindowCachewas removed fromtransformers.cache_utilsin version 4.48.0use_kernel_forward_from_hubwas added totransformers.integrationsin version 5.x
No version of transformers supports both imports, making direct Transformers inference impossible.
Reproduction Steps
from transformers import AutoModel, AutoTokenizer
model_name = "PaddlePaddle/PaddleOCR-VL"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True) # Fails here
Error Messages
With transformers < 4.48.0 (e.g., 4.47.1):
ImportError: cannot import name 'use_kernel_forward_from_hub' from 'transformers.integrations'
With transformers >= 4.48.0 (e.g., 5.2.0):
ImportError: cannot import name 'SlidingWindowCache' from 'transformers.cache_utils'
Suggested Fix
Option 1: Use conditional imports based on transformers version
try:
from transformers.cache_utils import SlidingWindowCache
except ImportError:
SlidingWindowCache = None # Or implement a fallback
try:
from transformers.integrations import use_kernel_forward_from_hub
except ImportError:
use_kernel_forward_from_hub = lambda x: x # No-op decorator
Option 2: Pin to a specific transformers version range and update imports accordingly
Workaround
Currently, the only working option is to use vLLM (which has native PaddleOCR-VL support and doesn't execute the model's custom Python code). However, vLLM requires CUDA 12.8+, limiting compatibility with older GPU drivers.
Additional Context
- vLLM backend works fine because it uses native
PaddleOCRVLForConditionalGenerationimplementation - This issue blocks users who need CUDA 12.4 compatibility (vLLM requires 12.8+)
- Related: Similar issue reported for Phi-4-multimodal-instruct (discussion #71)