Image-Text-to-Text
Transformers
Safetensors
English
openvla
feature-extraction
robotics
vla
multimodal
pretraining
custom_code
Instructions to use moritzknaust/openvla-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use moritzknaust/openvla-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="moritzknaust/openvla-7b", trust_remote_code=True)# Load model directly from transformers import AutoModelForVision2Seq model = AutoModelForVision2Seq.from_pretrained("moritzknaust/openvla-7b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use moritzknaust/openvla-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "moritzknaust/openvla-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moritzknaust/openvla-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/moritzknaust/openvla-7b
- SGLang
How to use moritzknaust/openvla-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "moritzknaust/openvla-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moritzknaust/openvla-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "moritzknaust/openvla-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "moritzknaust/openvla-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use moritzknaust/openvla-7b with Docker Model Runner:
docker model run hf.co/moritzknaust/openvla-7b
Commit ·
c67d07b
1
Parent(s): 77de19a
Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ The model takes language instructions and camera images as input and generates r
|
|
| 19 |
|
| 20 |
All OpenVLA checkpoints, as well as our [training codebase](https://github.com/openvla/openvla) are released under an MIT License.
|
| 21 |
|
| 22 |
-
For full details, please read [our paper](https://
|
| 23 |
|
| 24 |
## Model Summary
|
| 25 |
|
|
@@ -32,7 +32,7 @@ For full details, please read [our paper](https://openvla.github.io/) and see [o
|
|
| 32 |
+ **Language Model**: Llama-2
|
| 33 |
- **Pretraining Dataset:** [Open X-Embodiment](https://robotics-transformer-x.github.io/) -- specific component datasets can be found [here](https://github.com/openvla/openvla).
|
| 34 |
- **Repository:** [https://github.com/openvla/openvla](https://github.com/openvla/openvla)
|
| 35 |
-
- **Paper:** [OpenVLA: An Open-Source Vision-Language-Action Model](https://
|
| 36 |
- **Project Page & Videos:** [https://openvla.github.io/](https://openvla.github.io/)
|
| 37 |
|
| 38 |
## Uses
|
|
@@ -93,7 +93,7 @@ For more examples, including scripts for fine-tuning OpenVLA models on your own
|
|
| 93 |
@article{kim24openvla,
|
| 94 |
title={OpenVLA: An Open-Source Vision-Language-Action Model},
|
| 95 |
author={{Moo Jin} Kim and Karl Pertsch and Siddharth Karamcheti and Ted Xiao and Ashwin Balakrishna and Suraj Nair and Rafael Rafailov and Ethan Foster and Grace Lam and Pannag Sanketi and Quan Vuong and Thomas Kollar and Benjamin Burchfiel and Russ Tedrake and Dorsa Sadigh and Sergey Levine and Percy Liang and Chelsea Finn},
|
| 96 |
-
journal = {arXiv preprint},
|
| 97 |
year={2024}
|
| 98 |
}
|
| 99 |
```
|
|
|
|
| 19 |
|
| 20 |
All OpenVLA checkpoints, as well as our [training codebase](https://github.com/openvla/openvla) are released under an MIT License.
|
| 21 |
|
| 22 |
+
For full details, please read [our paper](https://arxiv.org/abs/2406.09246) and see [our project page](https://openvla.github.io/).
|
| 23 |
|
| 24 |
## Model Summary
|
| 25 |
|
|
|
|
| 32 |
+ **Language Model**: Llama-2
|
| 33 |
- **Pretraining Dataset:** [Open X-Embodiment](https://robotics-transformer-x.github.io/) -- specific component datasets can be found [here](https://github.com/openvla/openvla).
|
| 34 |
- **Repository:** [https://github.com/openvla/openvla](https://github.com/openvla/openvla)
|
| 35 |
+
- **Paper:** [OpenVLA: An Open-Source Vision-Language-Action Model](https://arxiv.org/abs/2406.09246)
|
| 36 |
- **Project Page & Videos:** [https://openvla.github.io/](https://openvla.github.io/)
|
| 37 |
|
| 38 |
## Uses
|
|
|
|
| 93 |
@article{kim24openvla,
|
| 94 |
title={OpenVLA: An Open-Source Vision-Language-Action Model},
|
| 95 |
author={{Moo Jin} Kim and Karl Pertsch and Siddharth Karamcheti and Ted Xiao and Ashwin Balakrishna and Suraj Nair and Rafael Rafailov and Ethan Foster and Grace Lam and Pannag Sanketi and Quan Vuong and Thomas Kollar and Benjamin Burchfiel and Russ Tedrake and Dorsa Sadigh and Sergey Levine and Percy Liang and Chelsea Finn},
|
| 96 |
+
journal = {arXiv preprint arXiv:2406.09246},
|
| 97 |
year={2024}
|
| 98 |
}
|
| 99 |
```
|