Instructions to use IPEC-COMMUNITY/openfly-agent-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use IPEC-COMMUNITY/openfly-agent-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="IPEC-COMMUNITY/openfly-agent-7b")

# Load model directly
from transformers import OpenVLAForActionPrediction
model = OpenVLAForActionPrediction.from_pretrained("IPEC-COMMUNITY/openfly-agent-7b", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use IPEC-COMMUNITY/openfly-agent-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "IPEC-COMMUNITY/openfly-agent-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IPEC-COMMUNITY/openfly-agent-7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/IPEC-COMMUNITY/openfly-agent-7b

SGLang

How to use IPEC-COMMUNITY/openfly-agent-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "IPEC-COMMUNITY/openfly-agent-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IPEC-COMMUNITY/openfly-agent-7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "IPEC-COMMUNITY/openfly-agent-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IPEC-COMMUNITY/openfly-agent-7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use IPEC-COMMUNITY/openfly-agent-7b with Docker Model Runner:
```
docker model run hf.co/IPEC-COMMUNITY/openfly-agent-7b
```

Tavish9 commited on May 19, 2025

Commit

1541a95

verified ·

1 Parent(s): cbf5e79

update model card

Browse files

Files changed (1) hide show

README.md +86 -3

README.md CHANGED Viewed

@@ -1,3 +1,86 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- IPEC-COMMUNITY/OpenFly
+language:
+- en
+metrics:
+- accuracy
+base_model:
+- openvla/openvla-7b-prismatic
+pipeline_tag: image-text-to-text
+library_name: transformers
+tags:
+- UAV
+- Navigation
+- VLN
+- visual-language-navigation
+---
+# OpenFly
+OpenFly, a platform comprising a versatile toolchain and large-scale benchmark for aerial VLN. The code is purely huggingFace-based and concise, with efficient performance.
+For full details, please read [our paper](https://arxiv.org/abs/2502.18041) and see [our project page](https://shailab-ipec.github.io/openfly/).
+## Model Details
+### Model Description
+- **Developed by:** The OpenFly team consisting of researchers from Shanghai AI Laboratory.
+- **Model type:** vision-language-navigation (language, image => uav actions)
+- **Language(s) (NLP):** en
+- **License:** MIT
+- **Pretraining Dataset:** [OpenFly](https://huggingface.co/datasets/IPEC-COMMUNITY/OpenFly)
+- **Repository:** [https://github.com/SHAILAB-IPEC/OpenFly-Platform](https://github.com/SHAILAB-IPEC/OpenFly-Platform)
+- **Paper:** [OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation](https://arxiv.org/abs/2502.18041)
+- **Project Page & Videos:** [https://shailab-ipec.github.io/openfly/](https://shailab-ipec.github.io/openfly/)
+## Uses
+OpenFly relies solely on HuggingFace Transformers 🤗, making deployment extremely easy. If your environment supports `transformers >= 4.47.0`, you can directly use the following code to load the model and perform inference.
+### Direct Use
+```python
+from typing import Dict, List, Optional, Union
+from pathlib import Path
+import numpy as np
+import torch
+from PIL import Image
+from transformers import LlamaTokenizerFast
+from transformers import AutoConfig, AutoImageProcessor, AutoModelForVision2Seq, AutoProcessor
+import os, json
+from model.prismatic import PrismaticVLM
+from model.overwatch import initialize_overwatch
+from model.action_tokenizer import ActionTokenizer
+from model.vision_backbone import DinoSigLIPViTBackbone, DinoSigLIPImageTransform
+from model.llm_backbone import LLaMa2LLMBackbone
+from extern.hf.configuration_prismatic import OpenFlyConfig
+from extern.hf.modeling_prismatic import OpenVLAForActionPrediction
+from extern.hf.processing_prismatic import PrismaticImageProcessor, PrismaticProcessor
+AutoConfig.register("openvla", OpenFlyConfig)
+AutoImageProcessor.register(OpenFlyConfig, PrismaticImageProcessor)
+AutoProcessor.register(OpenFlyConfig, PrismaticProcessor)
+AutoModelForVision2Seq.register(OpenFlyConfig, OpenVLAForActionPrediction)
+model_name_or_path="IPEC-COMMUNITY/openfly-agent-7b"
+processor = AutoProcessor.from_pretrained(model_name_or_path)
+model = AutoModelForVision2Seq.from_pretrained(
+    model_name_or_path,
+    attn_implementation="flash_attention_2",  # [Optional] Requires `flash_attn`
+    torch_dtype=torch.bfloat16,
+    low_cpu_mem_usage=True,
+    trust_remote_code=True,
+).to("cuda:0")
+image = Image.fromarray(cv2.imread("example.png"))
+prompt = "Take off, go straight pass the river"
+inputs = processor(prompt, [image, image, image]).to("cuda:0", dtype=torch.bfloat16)
+action = model.predict_action(**inputs, unnorm_key="vln_norm", do_sample=False)
+print(action)
+```