| | --- |
| | license: mit |
| | --- |
| | |
| |
|
| | <h1 align="center"> Moxin 7B VLM </h1> |
| |
|
| | <p align="center"> <a href="https://github.com/moxin-org/Moxin-VLM">Home Page</a>    |    <a href="https://arxiv.org/abs/2412.06845">Technical Report</a>    |    <a href="https://huggingface.co/moxin-org/Moxin-7B-LLM">Base Model</a>    |    <a href="https://huggingface.co/moxin-org/Moxin-7B-Chat">Chat Model</a>    |    <a href="https://huggingface.co/moxin-org/Moxin-7B-Instruct">Instruct Model</a>    |    <a href="https://huggingface.co/moxin-org/Moxin-7B-Reasoning">Reasoning Model</a>    |    <a href="https://huggingface.co/moxin-org/Moxin-7B-VLM">VLM Model</a> </p> |
| |
|
| | --- |
| |
|
| | ## Installation |
| |
|
| | ```bash |
| | git clone https://github.com/moxin-org/Moxin-VLM.git |
| | cd Moxin-VLM |
| | conda create -n moxin-vlm python=3.10 -y |
| | conda activate moxin-vlm |
| | pip install torch==2.4.1 torchvision==0.19.1 |
| | pip install transformers==4.46.0 peft==0.15.2 |
| | pip install -e . |
| | # Install Flash Attention 2 |
| | # =>> If you run into difficulty, try `pip cache remove flash_attn` first |
| | pip install flash-attn==2.6.3 --no-build-isolation |
| | ``` |
| |
|
| | ## Pretrained Models |
| |
|
| | Please find our Pretrained Models on our huggingface page: [moxin-org/Moxin-7B-VLM](https://huggingface.co/moxin-org/Moxin-7B-VLM). |
| |
|
| | We've also provided a hf_convert version [Moxin-7B-VLM-hf](https://huggingface.co/bobchenyx/Moxin-7B-VLM-hf) based on [openvla](https://github.com/openvla/openvla). |
| | Please refer to the attached scripts for downloading and running our model locally. |
| | ```bash |
| | python scripts/snapshot_download.py |
| | ``` |
| | ## Usage |
| | For a complete terminal-based CLI for interacting with our VLMs. |
| | ```bash |
| | python scripts/generate.py --model_path moxin-org/Moxin-7B-VLM |
| | ``` |
| | For a faster loading, inference and demo. |
| | ```bash |
| | python scripts/fast_inference.py |
| |
|
| | ``` |
| | --- |
| | ## Acknowledgments |
| | This project is based on [Prismatic VLMs](https://github.com/TRI-ML/prismatic-vlms) by [TRI-ML](https://github.com/TRI-ML). |
| | Special thanks to the original contributors for their excellent work. |
| | ## Citation |
| | If you find our code or models useful in your work, please cite [our paper](https://arxiv.org/abs/2412.06845v5): |
| | ```bibtex |
| | @article{zhao2024fully, |
| | title={Fully Open Source Moxin-7B Technical Report}, |
| | author={Zhao, Pu and Shen, Xuan and Kong, Zhenglun and Shen, Yixin and Chang, Sung-En and Rupprecht, Timothy and Lu, Lei and Nan, Enfu and Yang, Changdi and He, Yumei and others}, |
| | journal={arXiv preprint arXiv:2412.06845}, |
| | year={2024} |
| | } |