| --- |
| license: apache-2.0 |
| library_name: PaddleOCR |
| tags: |
| - PaddleOCR |
| - PaddleOCR-VL |
| - vision-encoder |
| - multimodal |
| - document-parsing |
| --- |
| |
| # PaddleOCR-VL Split Vision Encoder |
|
|
| This repository contains the extracted PaddleOCR-VL split visual artifacts uploaded separately from the full VLM. |
|
|
| ## Contents |
|
|
| - `vision_tower_config.json` |
| - `vision_tower.safetensors` |
| - `projector_config.json` |
| - `projector.safetensors` |
|
|
|
|
| ## Architecture |
|
|
| - Vision tower hidden size: `1152` |
| - Projector output hidden size: `1024` |
| - Target repo: `acsfid/PaddleOCR-VL-VisionEncoder` |
|
|
| ## Usage |
|
|
| ```python |
| from model.extracted_vision_encoder import PaddleOCRVLVisionTower, PaddleOCRVLProjector |
| |
| artifact_dir = "." |
| vision_tower = PaddleOCRVLVisionTower.from_pretrained(artifact_dir) |
| projector = PaddleOCRVLProjector.from_pretrained(artifact_dir) |
| ``` |
|
|
| The intended split flow is: |
|
|
| ```text |
| image_processor -> vision_tower -> projector -> decoder-ready image embeddings |
| ``` |
|
|
| ## Included Python Source |
|
|
| This repo also includes the Python source files needed to load and use the split artifacts: |
|
|
| - `model/__init__.py` |
| - `model/configuration_paddleocr_vl.py` |
| - `model/image_processing_paddleocr_vl.py` |
| - `model/modeling_paddleocr_vl.py` |
| - `model/extracted_vision_encoder.py` |
| - `requirements.txt` |
|
|
| That means after cloning or downloading this repo, you can directly import the split classes for inference or later training work. |
|
|
|
|
|
|