--- license: apache-2.0 library_name: PaddleOCR tags: - PaddleOCR - PaddleOCR-VL - vision-encoder - multimodal - document-parsing --- # PaddleOCR-VL Split Vision Encoder This repository contains the extracted PaddleOCR-VL split visual artifacts uploaded separately from the full VLM. ## Contents - `vision_tower_config.json` - `vision_tower.safetensors` - `projector_config.json` - `projector.safetensors` ## Architecture - Vision tower hidden size: `1152` - Projector output hidden size: `1024` - Target repo: `acsfid/PaddleOCR-VL-VisionEncoder` ## Usage ```python from model.extracted_vision_encoder import PaddleOCRVLVisionTower, PaddleOCRVLProjector artifact_dir = "." vision_tower = PaddleOCRVLVisionTower.from_pretrained(artifact_dir) projector = PaddleOCRVLProjector.from_pretrained(artifact_dir) ``` The intended split flow is: ```text image_processor -> vision_tower -> projector -> decoder-ready image embeddings ``` ## Included Python Source This repo also includes the Python source files needed to load and use the split artifacts: - `model/__init__.py` - `model/configuration_paddleocr_vl.py` - `model/image_processing_paddleocr_vl.py` - `model/modeling_paddleocr_vl.py` - `model/extracted_vision_encoder.py` - `requirements.txt` That means after cloning or downloading this repo, you can directly import the split classes for inference or later training work.