lkeab commited on
Commit
537c6f2
Β·
verified Β·
1 Parent(s): dda5e62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -8
README.md CHANGED
@@ -1,7 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
  <p align="center">
2
- <img src="assets/logo.png" width="160" />
3
  </p>
4
 
 
5
  <h2 align="center">Vision Encoder of PenguinVL</h2>
6
  <h4 align="center">
7
  Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
@@ -18,9 +31,9 @@ Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
18
 
19
  ## 🌟 Model Overview
20
 
21
- PenguinVL is a compact Vision-Language Model, designed to explore the efficiency limits of small-scale VLMs.
22
 
23
- Unlike most existing VLMs that rely on contrastive-pretrained vision encoders (e.g., CLIP/SigLIP), PG-VL initializes its vision encoder directly from a **text-only LLM**. This design avoids the objective mismatch between contrastive learning and autoregressive language modeling, enabling tighter alignment between visual representations and the language backbone.
24
 
25
  ### Key Characteristics
26
 
@@ -49,8 +62,8 @@ import torch
49
  from transformers import AutoModel, AutoImageProcessor
50
  from transformers.image_utils import load_image
51
 
52
- model_name = "pg-team/pg-vision-encoder"
53
- image_path = "xxx"
54
  images = load_image(image_path)
55
 
56
  model = AutoModel.from_pretrained(
@@ -72,9 +85,10 @@ image_features = model(**inputs)
72
  ## 🌎 Model Zoo
73
  | Model | Base Model | HF Link |
74
  | -------------------- | ------------ | ------------------------------------------------------------ |
75
- | PenguinVL-8B | Qwen3-8B | [pg-team/pg-vl-8b-hf](https://huggingface.co/pg-team/pg-vl-8b-hf) |
76
- | PenguinVL-2B | Qwen3-1.7B | [pg-team/pg-vl-2b-hf](https://huggingface.co/pg-team/pg-vl-2b-hf) |
77
- | PenguinVL-Encoder | Qwen3-0.6B | [pg-team/pg-vision-encoder](https://huggingface.co/pg-team/pg-vision-encoder) |
 
78
 
79
  ## πŸš€ Main Results
80
  xxx
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ pipeline_tag: image-text-to-text
6
+ tags:
7
+ - vision-language-model
8
+ - multimodal
9
+ - custom_code
10
+ library_name: transformers
11
+ ---
12
+
13
  <p align="center">
14
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6258a6455ea3a0a9b6de3f22/mIMYeUFquGSbm89lT61TG.png" width="160" />
15
  </p>
16
 
17
+
18
  <h2 align="center">Vision Encoder of PenguinVL</h2>
19
  <h4 align="center">
20
  Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
 
31
 
32
  ## 🌟 Model Overview
33
 
34
+ PenguinVL is a compact Vision-Language Model designed to explore the efficiency limits of small-scale VLMs. Rather than being only an instruction-tuned model, PenguinVL is built from the ground up through **LLM-based vision encoder construction, multimodal pretraining, and subsequent instruction tuning**.
35
 
36
+ Unlike most existing VLMs that rely on contrastive-pretrained vision encoders (e.g., CLIP/SigLIP), PenguinVL initializes its vision encoder directly from a **text-only LLM**. This design avoids the objective mismatch between contrastive learning and autoregressive language modeling, enabling tighter alignment between visual representations and the language backbone.
37
 
38
  ### Key Characteristics
39
 
 
62
  from transformers import AutoModel, AutoImageProcessor
63
  from transformers.image_utils import load_image
64
 
65
+ model_name = "tencent/Penguin-Encoder"
66
+ image_path = "assets/xxxx.jpg"
67
  images = load_image(image_path)
68
 
69
  model = AutoModel.from_pretrained(
 
85
  ## 🌎 Model Zoo
86
  | Model | Base Model | HF Link |
87
  | -------------------- | ------------ | ------------------------------------------------------------ |
88
+ | PenguinVL-8B | Qwen3-8B | [tencent/Penguin-VL-8B](https://huggingface.co/tencent/Penguin-VL-8B) |
89
+ | PenguinVL-2B | Qwen3-1.7B | [tencent/Penguin-VL-2B](https://huggingface.co/tencent/Penguin-VL-2B) |
90
+ | PenguinVL-Encoder | Qwen3-0.6B | [tencent/Penguin-Encoder](https://huggingface.co/tencent/Penguin-Encoder) |
91
+
92
 
93
  ## πŸš€ Main Results
94
  xxx