Instructions to use acsfid/PaddleOCR-VL-1.5-VisionEncoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PaddleOCR
How to use acsfid/PaddleOCR-VL-1.5-VisionEncoder with PaddleOCR:
# Please refer to the document for information on how to use the model. # https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/module_overview.html
- Notebooks
- Google Colab
- Kaggle
File size: 1,393 Bytes
d00ea0a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | ---
license: apache-2.0
library_name: PaddleOCR
tags:
- PaddleOCR
- PaddleOCR-VL
- vision-encoder
- multimodal
- document-parsing
---
# PaddleOCR-VL Split Vision Encoder
This repository contains the extracted PaddleOCR-VL split visual artifacts uploaded separately from the full VLM.
## Contents
- `vision_tower_config.json`
- `vision_tower.safetensors`
- `projector_config.json`
- `projector.safetensors`
## Architecture
- Vision tower hidden size: `1152`
- Projector output hidden size: `1024`
- Target repo: `acsfid/PaddleOCR-VL-VisionEncoder`
## Usage
```python
from model.extracted_vision_encoder import PaddleOCRVLVisionTower, PaddleOCRVLProjector
artifact_dir = "."
vision_tower = PaddleOCRVLVisionTower.from_pretrained(artifact_dir)
projector = PaddleOCRVLProjector.from_pretrained(artifact_dir)
```
The intended split flow is:
```text
image_processor -> vision_tower -> projector -> decoder-ready image embeddings
```
## Included Python Source
This repo also includes the Python source files needed to load and use the split artifacts:
- `model/__init__.py`
- `model/configuration_paddleocr_vl.py`
- `model/image_processing_paddleocr_vl.py`
- `model/modeling_paddleocr_vl.py`
- `model/extracted_vision_encoder.py`
- `requirements.txt`
That means after cloning or downloading this repo, you can directly import the split classes for inference or later training work.
|