Update README.md
Browse files
README.md
CHANGED
|
@@ -37,7 +37,7 @@ The model only takes images as document-side inputs and produce vectors represen
|
|
| 37 |
- x86 CPU with 32GB memory.
|
| 38 |
- x86 CPU with 32GB memory + Nvidia GPU with 16GB memory.
|
| 39 |
|
| 40 |
-
1. Pip install all dependencies:
|
| 41 |
|
| 42 |
```
|
| 43 |
Pillow==10.1.0
|
|
@@ -65,13 +65,22 @@ pip install huggingface-hub
|
|
| 65 |
huggingface-cli download --resume-download RhapsodyAI/minicpm-visual-embedding-v0 --local-dir minicpm-visual-embedding-v0 --local-dir-use-symlinks False
|
| 66 |
```
|
| 67 |
|
| 68 |
-
3. To deploy a local demo, first check `pipeline_gradio.py`, change `model_path` to your local path and change `device` to your device
|
|
|
|
|
|
|
| 69 |
|
| 70 |
```bash
|
| 71 |
pip install gradio
|
| 72 |
-
python pipeline_gradio.py
|
| 73 |
```
|
| 74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
# For research purpose
|
| 76 |
|
| 77 |
To run the model for research purpose, please refer the following code:
|
|
@@ -116,11 +125,11 @@ print(scores)
|
|
| 116 |
|
| 117 |
# Todos
|
| 118 |
|
| 119 |
-
[x] Release huggingface space demo.
|
| 120 |
|
| 121 |
-
[] Release the evaluation results.
|
| 122 |
|
| 123 |
-
[] Release technical report.
|
| 124 |
|
| 125 |
# Limitations
|
| 126 |
|
|
@@ -130,6 +139,8 @@ print(scores)
|
|
| 130 |
|
| 131 |
- The inference speed is low, because vision encoder uses `timm`, which does not yet support `flash-attn`.
|
| 132 |
|
|
|
|
|
|
|
| 133 |
# Citation
|
| 134 |
|
| 135 |
If you find our work useful, please consider cite us:
|
|
|
|
| 37 |
- x86 CPU with 32GB memory.
|
| 38 |
- x86 CPU with 32GB memory + Nvidia GPU with 16GB memory.
|
| 39 |
|
| 40 |
+
1. Pip install all dependencies (for all platforms):
|
| 41 |
|
| 42 |
```
|
| 43 |
Pillow==10.1.0
|
|
|
|
| 65 |
huggingface-cli download --resume-download RhapsodyAI/minicpm-visual-embedding-v0 --local-dir minicpm-visual-embedding-v0 --local-dir-use-symlinks False
|
| 66 |
```
|
| 67 |
|
| 68 |
+
3. To deploy a local demo, first check `pipeline_gradio.py`, change `model_path` to your local path and change `device` to your device and launch demo:
|
| 69 |
+
|
| 70 |
+
Install `gradio` first.
|
| 71 |
|
| 72 |
```bash
|
| 73 |
pip install gradio
|
|
|
|
| 74 |
```
|
| 75 |
|
| 76 |
+
Adapt the code in `pipeline_gradio.py` according to your device.
|
| 77 |
+
|
| 78 |
+
- For M1/M2/M3 users, please make sure `model = model.to(device='mps', dtype=torch.float16)` then run `PYTORCH_ENABLE_MPS_FALLBACK=1 python pipeline_gradio.py`.
|
| 79 |
+
- For x86 CPU users, please remove `model = model.to(device)` then run `python pipeline_gradio.py`.
|
| 80 |
+
- For x86 CPU + Nvidia GPU users, please make sure `model = model.to('cuda')` then run `python pipeline_gradio.py`.
|
| 81 |
+
- If you encountered an error, please open an issue [here](https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0/discussions), we will respond soon.
|
| 82 |
+
|
| 83 |
+
|
| 84 |
# For research purpose
|
| 85 |
|
| 86 |
To run the model for research purpose, please refer the following code:
|
|
|
|
| 125 |
|
| 126 |
# Todos
|
| 127 |
|
| 128 |
+
- [x] Release huggingface space demo.
|
| 129 |
|
| 130 |
+
- [] Release the evaluation results.
|
| 131 |
|
| 132 |
+
- [] Release technical report.
|
| 133 |
|
| 134 |
# Limitations
|
| 135 |
|
|
|
|
| 139 |
|
| 140 |
- The inference speed is low, because vision encoder uses `timm`, which does not yet support `flash-attn`.
|
| 141 |
|
| 142 |
+
- The model performs not well on Chinese and other non-English information retrieval tasks.
|
| 143 |
+
|
| 144 |
# Citation
|
| 145 |
|
| 146 |
If you find our work useful, please consider cite us:
|