File size: 1,393 Bytes
d00ea0a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
license: apache-2.0
library_name: PaddleOCR
tags:
- PaddleOCR
- PaddleOCR-VL
- vision-encoder
- multimodal
- document-parsing
---

# PaddleOCR-VL Split Vision Encoder

This repository contains the extracted PaddleOCR-VL split visual artifacts uploaded separately from the full VLM.

## Contents

- `vision_tower_config.json`
- `vision_tower.safetensors`
- `projector_config.json`
- `projector.safetensors`


## Architecture

- Vision tower hidden size: `1152`
- Projector output hidden size: `1024`
- Target repo: `acsfid/PaddleOCR-VL-VisionEncoder`

## Usage

```python
from model.extracted_vision_encoder import PaddleOCRVLVisionTower, PaddleOCRVLProjector

artifact_dir = "."
vision_tower = PaddleOCRVLVisionTower.from_pretrained(artifact_dir)
projector = PaddleOCRVLProjector.from_pretrained(artifact_dir)
```

The intended split flow is:

```text
image_processor -> vision_tower -> projector -> decoder-ready image embeddings
```

## Included Python Source

This repo also includes the Python source files needed to load and use the split artifacts:

- `model/__init__.py`
- `model/configuration_paddleocr_vl.py`
- `model/image_processing_paddleocr_vl.py`
- `model/modeling_paddleocr_vl.py`
- `model/extracted_vision_encoder.py`
- `requirements.txt`

That means after cloning or downloading this repo, you can directly import the split classes for inference or later training work.