Upload PaddleOCR-VL split vision encoder artifacts

d00ea0a verified about 2 months ago

1.39 kB

	---
	license: apache-2.0
	library_name: PaddleOCR
	tags:
	- PaddleOCR
	- PaddleOCR-VL
	- vision-encoder
	- multimodal
	- document-parsing
	---

	# PaddleOCR-VL Split Vision Encoder

	This repository contains the extracted PaddleOCR-VL split visual artifacts uploaded separately from the full VLM.

	## Contents

	- `vision_tower_config.json`
	- `vision_tower.safetensors`
	- `projector_config.json`
	- `projector.safetensors`


	## Architecture

	- Vision tower hidden size: `1152`
	- Projector output hidden size: `1024`
	- Target repo: `acsfid/PaddleOCR-VL-VisionEncoder`

	## Usage

	```python
	from model.extracted_vision_encoder import PaddleOCRVLVisionTower, PaddleOCRVLProjector

	artifact_dir = "."
	vision_tower = PaddleOCRVLVisionTower.from_pretrained(artifact_dir)
	projector = PaddleOCRVLProjector.from_pretrained(artifact_dir)
	```

	The intended split flow is:

	```text
	image_processor -> vision_tower -> projector -> decoder-ready image embeddings
	```

	## Included Python Source

	This repo also includes the Python source files needed to load and use the split artifacts:

	- `model/__init__.py`
	- `model/configuration_paddleocr_vl.py`
	- `model/image_processing_paddleocr_vl.py`
	- `model/modeling_paddleocr_vl.py`
	- `model/extracted_vision_encoder.py`
	- `requirements.txt`

	That means after cloning or downloading this repo, you can directly import the split classes for inference or later training work.