RhapsodyAI
/

MiniCPM-V-Embedding-preview

Feature Extraction

information retrieval

embedding model

visual information retrieval

Model card Files Files and versions

bokesyo commited on Jul 14, 2024

Commit

587e267

·

verified ·

1 Parent(s): 703e627

Update README.md

Files changed (1) hide show

README.md +17 -6

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ The model only takes images as document-side inputs and produce vectors represen
 - x86 CPU with 32GB memory.
 - x86 CPU with 32GB memory + Nvidia GPU with 16GB memory.
-1. Pip install all dependencies:
 ```
 Pillow==10.1.0
@@ -65,13 +65,22 @@ pip install huggingface-hub
 huggingface-cli download --resume-download RhapsodyAI/minicpm-visual-embedding-v0 --local-dir minicpm-visual-embedding-v0 --local-dir-use-symlinks False
 ```
-3. To deploy a local demo, first check `pipeline_gradio.py`, change `model_path` to your local path and change `device` to your device (for users with Nvidia card, use `cuda`, for users with apple silicon, use `mps`, for users with only x86 cpu, please use `cpu`). then launch the demo:
 ```bash
 pip install gradio
-python pipeline_gradio.py
 ```
 # For research purpose
 To run the model for research purpose, please refer the following code:
@@ -116,11 +125,11 @@ print(scores)
 # Todos
-[x] Release huggingface space demo.
-[] Release the evaluation results.
-[] Release technical report.
 # Limitations
@@ -130,6 +139,8 @@ print(scores)
 - The inference speed is low, because vision encoder uses `timm`, which does not yet support `flash-attn`.
 # Citation
 If you find our work useful, please consider cite us:

 - x86 CPU with 32GB memory.
 - x86 CPU with 32GB memory + Nvidia GPU with 16GB memory.
+1. Pip install all dependencies (for all platforms):
 ```
 Pillow==10.1.0
 huggingface-cli download --resume-download RhapsodyAI/minicpm-visual-embedding-v0 --local-dir minicpm-visual-embedding-v0 --local-dir-use-symlinks False
 ```
+3. To deploy a local demo, first check `pipeline_gradio.py`, change `model_path` to your local path and change `device` to your device and launch demo:
+Install `gradio` first.
 ```bash
 pip install gradio
 ```
+Adapt the code in `pipeline_gradio.py` according to your device.
+- For M1/M2/M3 users, please make sure `model = model.to(device='mps', dtype=torch.float16)` then run `PYTORCH_ENABLE_MPS_FALLBACK=1 python pipeline_gradio.py`.
+- For x86 CPU users, please remove `model = model.to(device)` then run `python pipeline_gradio.py`.
+- For x86 CPU + Nvidia GPU users, please make sure `model = model.to('cuda')` then run `python pipeline_gradio.py`.
+- If you encountered an error, please open an issue [here](https://huggingface.co/RhapsodyAI/minicpm-visual-embedding-v0/discussions), we will respond soon.
 # For research purpose
 To run the model for research purpose, please refer the following code:
 # Todos
+- [x] Release huggingface space demo.
+- [] Release the evaluation results.
+- [] Release technical report.
 # Limitations
 - The inference speed is low, because vision encoder uses `timm`, which does not yet support `flash-attn`.
+- The model performs not well on Chinese and other non-English information retrieval tasks.
 # Citation
 If you find our work useful, please consider cite us: