Tavish9 commited on
Commit
1541a95
·
verified ·
1 Parent(s): cbf5e79

update model card

Browse files
Files changed (1) hide show
  1. README.md +86 -3
README.md CHANGED
@@ -1,3 +1,86 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - IPEC-COMMUNITY/OpenFly
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ base_model:
10
+ - openvla/openvla-7b-prismatic
11
+ pipeline_tag: image-text-to-text
12
+ library_name: transformers
13
+ tags:
14
+ - UAV
15
+ - Navigation
16
+ - VLN
17
+ - visual-language-navigation
18
+ ---
19
+
20
+ # OpenFly
21
+
22
+ OpenFly, a platform comprising a versatile toolchain and large-scale benchmark for aerial VLN. The code is purely huggingFace-based and concise, with efficient performance.
23
+
24
+ For full details, please read [our paper](https://arxiv.org/abs/2502.18041) and see [our project page](https://shailab-ipec.github.io/openfly/).
25
+
26
+ ## Model Details
27
+
28
+ ### Model Description
29
+
30
+ - **Developed by:** The OpenFly team consisting of researchers from Shanghai AI Laboratory.
31
+ - **Model type:** vision-language-navigation (language, image => uav actions)
32
+ - **Language(s) (NLP):** en
33
+ - **License:** MIT
34
+ - **Pretraining Dataset:** [OpenFly](https://huggingface.co/datasets/IPEC-COMMUNITY/OpenFly)
35
+ - **Repository:** [https://github.com/SHAILAB-IPEC/OpenFly-Platform](https://github.com/SHAILAB-IPEC/OpenFly-Platform)
36
+ - **Paper:** [OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation](https://arxiv.org/abs/2502.18041)
37
+ - **Project Page & Videos:** [https://shailab-ipec.github.io/openfly/](https://shailab-ipec.github.io/openfly/)
38
+
39
+
40
+ ## Uses
41
+
42
+ OpenFly relies solely on HuggingFace Transformers 🤗, making deployment extremely easy. If your environment supports `transformers >= 4.47.0`, you can directly use the following code to load the model and perform inference.
43
+
44
+ ### Direct Use
45
+
46
+ ```python
47
+
48
+ from typing import Dict, List, Optional, Union
49
+ from pathlib import Path
50
+ import numpy as np
51
+ import torch
52
+ from PIL import Image
53
+ from transformers import LlamaTokenizerFast
54
+ from transformers import AutoConfig, AutoImageProcessor, AutoModelForVision2Seq, AutoProcessor
55
+ import os, json
56
+ from model.prismatic import PrismaticVLM
57
+ from model.overwatch import initialize_overwatch
58
+ from model.action_tokenizer import ActionTokenizer
59
+ from model.vision_backbone import DinoSigLIPViTBackbone, DinoSigLIPImageTransform
60
+ from model.llm_backbone import LLaMa2LLMBackbone
61
+ from extern.hf.configuration_prismatic import OpenFlyConfig
62
+ from extern.hf.modeling_prismatic import OpenVLAForActionPrediction
63
+ from extern.hf.processing_prismatic import PrismaticImageProcessor, PrismaticProcessor
64
+
65
+ AutoConfig.register("openvla", OpenFlyConfig)
66
+ AutoImageProcessor.register(OpenFlyConfig, PrismaticImageProcessor)
67
+ AutoProcessor.register(OpenFlyConfig, PrismaticProcessor)
68
+ AutoModelForVision2Seq.register(OpenFlyConfig, OpenVLAForActionPrediction)
69
+
70
+ model_name_or_path="IPEC-COMMUNITY/openfly-agent-7b"
71
+ processor = AutoProcessor.from_pretrained(model_name_or_path)
72
+ model = AutoModelForVision2Seq.from_pretrained(
73
+ model_name_or_path,
74
+ attn_implementation="flash_attention_2", # [Optional] Requires `flash_attn`
75
+ torch_dtype=torch.bfloat16,
76
+ low_cpu_mem_usage=True,
77
+ trust_remote_code=True,
78
+ ).to("cuda:0")
79
+
80
+ image = Image.fromarray(cv2.imread("example.png"))
81
+ prompt = "Take off, go straight pass the river"
82
+ inputs = processor(prompt, [image, image, image]).to("cuda:0", dtype=torch.bfloat16)
83
+ action = model.predict_action(**inputs, unnorm_key="vln_norm", do_sample=False)
84
+ print(action)
85
+ ```
86
+