Upload PaddleOCR-VL split vision encoder artifacts

d00ea0a verified about 2 months ago

1.39 kB

license: apache-2.0
library_name: PaddleOCR
tags:
  - PaddleOCR
  - PaddleOCR-VL
  - vision-encoder
  - multimodal
  - document-parsing

PaddleOCR-VL Split Vision Encoder

This repository contains the extracted PaddleOCR-VL split visual artifacts uploaded separately from the full VLM.

vision_tower_config.json
vision_tower.safetensors
projector_config.json
projector.safetensors

Architecture

Vision tower hidden size: 1152
Projector output hidden size: 1024
Target repo: acsfid/PaddleOCR-VL-VisionEncoder

Usage

from model.extracted_vision_encoder import PaddleOCRVLVisionTower, PaddleOCRVLProjector

artifact_dir = "."
vision_tower = PaddleOCRVLVisionTower.from_pretrained(artifact_dir)
projector = PaddleOCRVLProjector.from_pretrained(artifact_dir)

The intended split flow is:

image_processor -> vision_tower -> projector -> decoder-ready image embeddings

Included Python Source

This repo also includes the Python source files needed to load and use the split artifacts:

model/__init__.py
model/configuration_paddleocr_vl.py
model/image_processing_paddleocr_vl.py
model/modeling_paddleocr_vl.py
model/extracted_vision_encoder.py
requirements.txt

That means after cloning or downloading this repo, you can directly import the split classes for inference or later training work.

acsfid
/

PaddleOCR-VL-1.5-VisionEncoder

PaddleOCR-VL Split Vision Encoder

Contents

Architecture

Usage

Included Python Source