Update README.md

Browse files

Files changed (1) hide show

README.md +66 -64

README.md CHANGED Viewed

@@ -1,64 +1,66 @@
----
-license: cc-by-nc-4.0
-tags:
-  - vision
-  - image-classification
-  - vit
-  - ViTP
-  - InternVL
-  - domain-adaptation
-  - general
-language:
-  - en
-library_name: transformers
-pipeline_tag: image-feature-extraction
----
-# ViTP-InternVL-1B-General
-ViTP (Visual Instruction Pretraining) vision backbone — **InternVL 1B** variant pretrained on **general** domain visual instruction data. Compatible with `InternVisionModel` from [InternVL](https://github.com/OpenGVLab/InternVL).
-## Model Details
-- **Architecture**: InternVisionModel (24 layers, 1024 hidden, 16 heads)
-- **Image size**: 448×448
-- **Patch size**: 14
-- **Domain**: General
-## Usage
-The model repo includes the modeling code. Load with `transformers` (no ViTP repo needed):
-```python
-from transformers import AutoModel, AutoImageProcessor
-import torch
-device = "cuda"
-model = AutoModel.from_pretrained(
-    "BiliSakura/ViTP-InternVL-1B-General",
-    trust_remote_code=True,
-    torch_dtype=torch.bfloat16,
-    device_map=device,
-).eval()
-processor = AutoImageProcessor.from_pretrained("BiliSakura/ViTP-InternVL-1B-General")
-pixel_values = processor(images="image.jpg", return_tensors="pt").pixel_values.to(device, model.dtype)
-with torch.no_grad():
-    outputs = model(pixel_values=pixel_values)
-# Pooled CLS token: (1, 1024)
-features = outputs.pooler_output
-# Or full sequence: outputs.last_hidden_state
-```
-## Citation
-```bibtex
-@article{Li_2025_ViTP,
-  title={Visual Instruction Pretraining for Domain-Specific Foundation Models},
-  author={Li, Yuxuan and Zhang, Yicheng and Tang, Wenhao and Dai, Yimian and Cheng, Ming-Ming and Li, Xiang and Yang, Jian},
-  journal={arXiv},
-  year={2025}
-}
-```

+---
+license: cc-by-nc-4.0
+tags:
+- vision
+- image-classification
+- vit
+- ViTP
+- InternVL
+- domain-adaptation
+- general
+language:
+- en
+library_name: transformers
+pipeline_tag: image-feature-extraction
+base_model:
+- GreatBird/ViTP
+---
+# ViTP-InternVL-1B-General
+ViTP (Visual Instruction Pretraining) vision backbone — **InternVL 1B** variant pretrained on **general** domain visual instruction data. Compatible with `InternVisionModel` from [InternVL](https://github.com/OpenGVLab/InternVL).
+## Model Details
+- **Architecture**: InternVisionModel (24 layers, 1024 hidden, 16 heads)
+- **Image size**: 448×448
+- **Patch size**: 14
+- **Domain**: General
+## Usage
+The model repo includes the modeling code. Load with `transformers` (no ViTP repo needed):
+```python
+from transformers import AutoModel, AutoImageProcessor
+import torch
+device = "cuda"
+model = AutoModel.from_pretrained(
+    "BiliSakura/ViTP-InternVL-1B-General",
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16,
+    device_map=device,
+).eval()
+processor = AutoImageProcessor.from_pretrained("BiliSakura/ViTP-InternVL-1B-General")
+pixel_values = processor(images="image.jpg", return_tensors="pt").pixel_values.to(device, model.dtype)
+with torch.no_grad():
+    outputs = model(pixel_values=pixel_values)
+# Pooled CLS token: (1, 1024)
+features = outputs.pooler_output
+# Or full sequence: outputs.last_hidden_state
+```
+## Citation
+```bibtex
+@article{Li_2025_ViTP,
+  title={Visual Instruction Pretraining for Domain-Specific Foundation Models},
+  author={Li, Yuxuan and Zhang, Yicheng and Tang, Wenhao and Dai, Yimian and Cheng, Ming-Ming and Li, Xiang and Yang, Jian},
+  journal={arXiv},
+  year={2025}
+}
+```