lmms-lab-encoder
/

onevision-encoder-large

xiangan commited on Jan 1

Commit

e53d7f0

1 Parent(s): 7dd6f8f

Upload folder using huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -44,6 +44,13 @@ license: apache-2.0
 ### Quick Start
 > **Note:** This model supports native resolution input. For optimal performance:
 > - **Image**: 448×448 resolution (pre-trained)
 > - **Video**: 224×224 resolution with 256 tokens per frame (pre-trained)
@@ -96,6 +103,27 @@ with torch.no_grad():
     outputs = model(video, visible_indices=visible_indices)
 ```
 ### LMM Probe Results

 ### Quick Start
+> [!IMPORTANT]
+> **Transformers Version Compatibility:**
+> - ✅ **`transformers==4.53.1`** (Recommended): Works with `AutoModel.from_pretrained()`
+> - ⚠️ **`transformers>=5.0.0`**: Use source code installation (see [Loading from Source Code](#loading-from-source-code))
 > **Note:** This model supports native resolution input. For optimal performance:
 > - **Image**: 448×448 resolution (pre-trained)
 > - **Video**: 224×224 resolution with 256 tokens per frame (pre-trained)
     outputs = model(video, visible_indices=visible_indices)
 ```
+### Loading from Source Code
+```bash
+git clone https://github.com/EvolvingLMMs-Lab/OneVision-Encoder.git
+cd OneVision-Encoder
+pip install -e .
+```
+```python
+from onevision_encoder import OneVisionEncoderModel, OneVisionEncoderConfig
+from transformers import AutoImageProcessor
+model = OneVisionEncoderModel.from_pretrained(
+    "lmms-lab-encoder/onevision-encoder-large",
+    trust_remote_code=True,
+    attn_implementation="flash_attention_2"
+).to("cuda").eval()
+preprocessor = AutoImageProcessor.from_pretrained(
+    "lmms-lab-encoder/onevision-encoder-large",
+    trust_remote_code=True
+)
+```
 ### LMM Probe Results