readme + demo
Browse files- README.md +153 -3
- VisionAtomicFlow.py +116 -2
- VisionAtomicFlow.yaml +11 -7
- __init__.py +1 -1
- demo.yaml +20 -0
- run.py +91 -0
README.md
CHANGED
|
@@ -1,3 +1,153 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Table of Contents
|
| 2 |
+
|
| 3 |
+
* [VisionAtomicFlow](#VisionAtomicFlow)
|
| 4 |
+
* [VisionAtomicFlow](#VisionAtomicFlow.VisionAtomicFlow)
|
| 5 |
+
* [get\_image](#VisionAtomicFlow.VisionAtomicFlow.get_image)
|
| 6 |
+
* [get\_video](#VisionAtomicFlow.VisionAtomicFlow.get_video)
|
| 7 |
+
* [get\_user\_message](#VisionAtomicFlow.VisionAtomicFlow.get_user_message)
|
| 8 |
+
|
| 9 |
+
<a id="VisionAtomicFlow"></a>
|
| 10 |
+
|
| 11 |
+
# VisionAtomicFlow
|
| 12 |
+
|
| 13 |
+
<a id="VisionAtomicFlow.VisionAtomicFlow"></a>
|
| 14 |
+
|
| 15 |
+
## VisionAtomicFlow Objects
|
| 16 |
+
|
| 17 |
+
```python
|
| 18 |
+
class VisionAtomicFlow(OpenAIChatAtomicFlow)
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
This class implements the atomic flow for the VisionFlowModule. It is a flow that, given a textual input, and a set of images and/or videos, generates a textual output.
|
| 22 |
+
It uses the litellm library as a backend. See https://docs.litellm.ai/docs/providers for supported models and APIs.
|
| 23 |
+
|
| 24 |
+
*Configuration Parameters*:
|
| 25 |
+
|
| 26 |
+
- `name` (str): The name of the flow. Default: "VisionAtomicFlow"
|
| 27 |
+
- `description` (str): A description of the flow. This description is used to generate the help message of the flow.
|
| 28 |
+
Default: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
|
| 29 |
+
- enable_cache (bool): If True, the flow will use the cache. Default: True
|
| 30 |
+
- `n_api_retries` (int): The number of times to retry the API call in case of failure. Default: 6
|
| 31 |
+
- `wait_time_between_api_retries` (int): The time to wait between API retries in seconds. Default: 20
|
| 32 |
+
- `system_name` (str): The name of the system. Default: "system"
|
| 33 |
+
- `user_name` (str): The name of the user. Default: "user"
|
| 34 |
+
- `assistant_name` (str): The name of the assistant. Default: "assistant"
|
| 35 |
+
- `backend` (Dict[str, Any]): The configuration of the backend which is used to fetch api keys. Default: LiteLLMBackend with the
|
| 36 |
+
default parameters of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule). Except for the following parameters
|
| 37 |
+
whose default value is overwritten:
|
| 38 |
+
- `api_infos` (List[Dict[str, Any]]): The list of api infos. Default: No default value, this parameter is required.
|
| 39 |
+
- `model_name` (Union[Dict[str,str],str]): The name of the model to use.
|
| 40 |
+
When using multiple API providers, the model_name can be a dictionary of the form
|
| 41 |
+
{"provider_name": "model_name"}.
|
| 42 |
+
Default: "gpt-4-vision-preview" (the name needs to follow the name of the model in litellm https://docs.litellm.ai/docs/providers).
|
| 43 |
+
- `n` (int) : The number of answers to generate. Default: 1
|
| 44 |
+
- `max_tokens` (int): The maximum number of tokens to generate. Default: 2000
|
| 45 |
+
- `temperature` (float): The temperature to use. Default: 0.3
|
| 46 |
+
- `top_p` (float): An alternative to sampling with temperature. It instructs the model to consider the results of
|
| 47 |
+
the tokens with top_p probability. Default: 0.2
|
| 48 |
+
- `frequency_penalty` (float): The higher this value, the more likely the model will repeat itself. Default: 0.0
|
| 49 |
+
- `presence_penalty` (float): The higher this value, the less likely the model will talk about a new topic. Default: 0.0
|
| 50 |
+
- `system_message_prompt_template` (Dict[str,Any]): The template of the system message. It is used to generate the system message.
|
| 51 |
+
By default its of type flows.prompt_template.JinjaPrompt.
|
| 52 |
+
None of the parameters of the prompt are defined by default and therefore need to be defined if one wants to use the system prompt.
|
| 53 |
+
Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
|
| 54 |
+
- `init_human_message_prompt_template` (Dict[str,Any]): The prompt template of the human/user message used to initialize the conversation
|
| 55 |
+
(first time in). It is used to generate the human message. It's passed as the user message to the LLM.
|
| 56 |
+
By default its of type flows.prompt_template.JinjaPrompt. None of the parameters of the prompt are defined by default and therefore need to be defined if one
|
| 57 |
+
wants to use the init_human_message_prompt_template. Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
|
| 58 |
+
- `previous_messages` (Dict[str,Any]): Defines which previous messages to include in the input of the LLM. Note that if `first_k`and `last_k` are both none,
|
| 59 |
+
all the messages of the flows's history are added to the input of the LLM. Default:
|
| 60 |
+
- `first_k` (int): If defined, adds the first_k earliest messages of the flow's chat history to the input of the LLM. Default: None
|
| 61 |
+
- `last_k` (int): If defined, adds the last_k latest messages of the flow's chat history to the input of the LLM. Default: None
|
| 62 |
+
- Other parameters are inherited from the default configuration of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule).
|
| 63 |
+
|
| 64 |
+
*Input Interface Initialized (Expected input the first time in flow)*:
|
| 65 |
+
|
| 66 |
+
- `query` (str): The textual query to run the model on.
|
| 67 |
+
- `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
|
| 68 |
+
- `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
|
| 69 |
+
- `type` (str): The type of the image. It can be "local_path" or "url".
|
| 70 |
+
- `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
|
| 71 |
+
- `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
|
| 72 |
+
- `video_path` (str): The path to the video.
|
| 73 |
+
- `resize` (int): The resize we want to apply on the frames of the video.
|
| 74 |
+
- `frame_step_size` (int): The step size between the frames of the video (to send to the model).
|
| 75 |
+
- `start_frame` (int): The start frame of the video (to send to the model).
|
| 76 |
+
- `end_frame` (int): The last frame of the video (to send to the model).
|
| 77 |
+
|
| 78 |
+
*Input Interface (Expected input the after the first time in flow)*:
|
| 79 |
+
|
| 80 |
+
- `query` (str): The textual query to run the model on.
|
| 81 |
+
- `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
|
| 82 |
+
- `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
|
| 83 |
+
- `type` (str): The type of the image. It can be "local_path" or "url".
|
| 84 |
+
- `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
|
| 85 |
+
- `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
|
| 86 |
+
- `video_path` (str): The path to the video.
|
| 87 |
+
- `resize` (int): The resize we want to apply on the frames of the video.
|
| 88 |
+
- `frame_step_size` (int): The step size between the frames of the video (to send to the model).
|
| 89 |
+
- `start_frame` (int): The start frame of the video (to send to the model).
|
| 90 |
+
- `end_frame` (int): The last frame of the video (to send to the model).
|
| 91 |
+
|
| 92 |
+
*Output Interface*:
|
| 93 |
+
|
| 94 |
+
- `api_output`s (str): The api output of the flow to the query and data
|
| 95 |
+
|
| 96 |
+
<a id="VisionAtomicFlow.VisionAtomicFlow.get_image"></a>
|
| 97 |
+
|
| 98 |
+
#### get\_image
|
| 99 |
+
|
| 100 |
+
```python
|
| 101 |
+
@staticmethod
|
| 102 |
+
def get_image(image)
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
This method returns an image in the appropriate format for API.
|
| 106 |
+
|
| 107 |
+
**Arguments**:
|
| 108 |
+
|
| 109 |
+
- `image` (`Dict[str, Any]`): The image dictionary.
|
| 110 |
+
|
| 111 |
+
**Returns**:
|
| 112 |
+
|
| 113 |
+
`Dict[str, Any]`: The image url.
|
| 114 |
+
|
| 115 |
+
<a id="VisionAtomicFlow.VisionAtomicFlow.get_video"></a>
|
| 116 |
+
|
| 117 |
+
#### get\_video
|
| 118 |
+
|
| 119 |
+
```python
|
| 120 |
+
@staticmethod
|
| 121 |
+
def get_video(video)
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
This method returns the video in the appropriate format for API.
|
| 125 |
+
|
| 126 |
+
**Arguments**:
|
| 127 |
+
|
| 128 |
+
- `video` (`Dict[str, Any]`): The video dictionary.
|
| 129 |
+
|
| 130 |
+
**Returns**:
|
| 131 |
+
|
| 132 |
+
`Dict[str, Any]`: The video url.
|
| 133 |
+
|
| 134 |
+
<a id="VisionAtomicFlow.VisionAtomicFlow.get_user_message"></a>
|
| 135 |
+
|
| 136 |
+
#### get\_user\_message
|
| 137 |
+
|
| 138 |
+
```python
|
| 139 |
+
@staticmethod
|
| 140 |
+
def get_user_message(prompt_template, input_data: Dict[str, Any])
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
This method constructs the user message to be passed to the API.
|
| 144 |
+
|
| 145 |
+
**Arguments**:
|
| 146 |
+
|
| 147 |
+
- `prompt_template` (`PromptTemplate`): The prompt template to use.
|
| 148 |
+
- `input_data` (`Dict[str, Any]`): The input data.
|
| 149 |
+
|
| 150 |
+
**Returns**:
|
| 151 |
+
|
| 152 |
+
`Dict[str, Any]`: The constructed user message (images , videos and text).
|
| 153 |
+
|
VisionAtomicFlow.py
CHANGED
|
@@ -1,14 +1,96 @@
|
|
| 1 |
|
| 2 |
from typing import Dict, Any
|
| 3 |
-
from flow_modules.aiflows.
|
| 4 |
from flows.utils.general_helpers import encode_image,encode_from_buffer
|
| 5 |
import cv2
|
| 6 |
|
| 7 |
|
| 8 |
-
class VisionAtomicFlow(
|
|
|
|
|
|
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
@staticmethod
|
| 11 |
def get_image(image):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
extension_dict = {
|
| 13 |
"jpg": "jpeg",
|
| 14 |
"jpeg": "jpeg",
|
|
@@ -34,6 +116,13 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
|
|
| 34 |
|
| 35 |
@staticmethod
|
| 36 |
def get_video(video):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
video_path = video["video_path"]
|
| 38 |
resize = video.get("resize",768)
|
| 39 |
frame_step_size = video.get("frame_step_size",10)
|
|
@@ -52,6 +141,15 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
|
|
| 52 |
|
| 53 |
@staticmethod
|
| 54 |
def get_user_message(prompt_template, input_data: Dict[str, Any]):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
content = VisionAtomicFlow._get_message(prompt_template=prompt_template,input_data=input_data)
|
| 56 |
media_data = input_data["data"]
|
| 57 |
if "video" in media_data:
|
|
@@ -63,6 +161,15 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
|
|
| 63 |
|
| 64 |
@staticmethod
|
| 65 |
def _get_message(prompt_template, input_data: Dict[str, Any]):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
template_kwargs = {}
|
| 67 |
for input_variable in prompt_template.input_variables:
|
| 68 |
template_kwargs[input_variable] = input_data[input_variable]
|
|
@@ -70,6 +177,13 @@ class VisionAtomicFlow(OpenAIChatAtomicFlow):
|
|
| 70 |
return [{"type": "text", "text": msg_content}]
|
| 71 |
|
| 72 |
def _process_input(self, input_data: Dict[str, Any]):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
if self._is_conversation_initialized():
|
| 74 |
# Construct the message using the human message prompt template
|
| 75 |
user_message_content = self.get_user_message(self.human_message_prompt_template, input_data)
|
|
|
|
| 1 |
|
| 2 |
from typing import Dict, Any
|
| 3 |
+
from flow_modules.aiflows.ChatFlowModule import ChatAtomicFlow
|
| 4 |
from flows.utils.general_helpers import encode_image,encode_from_buffer
|
| 5 |
import cv2
|
| 6 |
|
| 7 |
|
| 8 |
+
class VisionAtomicFlow(ChatAtomicFlow):
|
| 9 |
+
""" This class implements the atomic flow for the VisionFlowModule. It is a flow that, given a textual input, and a set of images and/or videos, generates a textual output.
|
| 10 |
+
It uses the litellm library as a backend. See https://docs.litellm.ai/docs/providers for supported models and APIs.
|
| 11 |
|
| 12 |
+
*Configuration Parameters*:
|
| 13 |
+
|
| 14 |
+
- `name` (str): The name of the flow. Default: "VisionAtomicFlow"
|
| 15 |
+
- `description` (str): A description of the flow. This description is used to generate the help message of the flow.
|
| 16 |
+
Default: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
|
| 17 |
+
- enable_cache (bool): If True, the flow will use the cache. Default: True
|
| 18 |
+
- `n_api_retries` (int): The number of times to retry the API call in case of failure. Default: 6
|
| 19 |
+
- `wait_time_between_api_retries` (int): The time to wait between API retries in seconds. Default: 20
|
| 20 |
+
- `system_name` (str): The name of the system. Default: "system"
|
| 21 |
+
- `user_name` (str): The name of the user. Default: "user"
|
| 22 |
+
- `assistant_name` (str): The name of the assistant. Default: "assistant"
|
| 23 |
+
- `backend` (Dict[str, Any]): The configuration of the backend which is used to fetch api keys. Default: LiteLLMBackend with the
|
| 24 |
+
default parameters of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule). Except for the following parameters
|
| 25 |
+
whose default value is overwritten:
|
| 26 |
+
- `api_infos` (List[Dict[str, Any]]): The list of api infos. Default: No default value, this parameter is required.
|
| 27 |
+
- `model_name` (Union[Dict[str,str],str]): The name of the model to use.
|
| 28 |
+
When using multiple API providers, the model_name can be a dictionary of the form
|
| 29 |
+
{"provider_name": "model_name"}.
|
| 30 |
+
Default: "gpt-4-vision-preview" (the name needs to follow the name of the model in litellm https://docs.litellm.ai/docs/providers).
|
| 31 |
+
- `n` (int) : The number of answers to generate. Default: 1
|
| 32 |
+
- `max_tokens` (int): The maximum number of tokens to generate. Default: 2000
|
| 33 |
+
- `temperature` (float): The temperature to use. Default: 0.3
|
| 34 |
+
- `top_p` (float): An alternative to sampling with temperature. It instructs the model to consider the results of
|
| 35 |
+
the tokens with top_p probability. Default: 0.2
|
| 36 |
+
- `frequency_penalty` (float): The higher this value, the more likely the model will repeat itself. Default: 0.0
|
| 37 |
+
- `presence_penalty` (float): The higher this value, the less likely the model will talk about a new topic. Default: 0.0
|
| 38 |
+
- `system_message_prompt_template` (Dict[str,Any]): The template of the system message. It is used to generate the system message.
|
| 39 |
+
By default its of type flows.prompt_template.JinjaPrompt.
|
| 40 |
+
None of the parameters of the prompt are defined by default and therefore need to be defined if one wants to use the system prompt.
|
| 41 |
+
Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
|
| 42 |
+
- `init_human_message_prompt_template` (Dict[str,Any]): The prompt template of the human/user message used to initialize the conversation
|
| 43 |
+
(first time in). It is used to generate the human message. It's passed as the user message to the LLM.
|
| 44 |
+
By default its of type flows.prompt_template.JinjaPrompt. None of the parameters of the prompt are defined by default and therefore need to be defined if one
|
| 45 |
+
wants to use the init_human_message_prompt_template. Default parameters are defined in flows.prompt_template.jinja2_prompts.JinjaPrompt.
|
| 46 |
+
- `previous_messages` (Dict[str,Any]): Defines which previous messages to include in the input of the LLM. Note that if `first_k`and `last_k` are both none,
|
| 47 |
+
all the messages of the flows's history are added to the input of the LLM. Default:
|
| 48 |
+
- `first_k` (int): If defined, adds the first_k earliest messages of the flow's chat history to the input of the LLM. Default: None
|
| 49 |
+
- `last_k` (int): If defined, adds the last_k latest messages of the flow's chat history to the input of the LLM. Default: None
|
| 50 |
+
- Other parameters are inherited from the default configuration of ChatAtomicFlow (see Flow card of ChatAtomicFlowModule).
|
| 51 |
+
|
| 52 |
+
*Input Interface Initialized (Expected input the first time in flow)*:
|
| 53 |
+
|
| 54 |
+
- `query` (str): The textual query to run the model on.
|
| 55 |
+
- `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
|
| 56 |
+
- `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
|
| 57 |
+
- `type` (str): The type of the image. It can be "local_path" or "url".
|
| 58 |
+
- `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
|
| 59 |
+
- `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
|
| 60 |
+
- `video_path` (str): The path to the video.
|
| 61 |
+
- `resize` (int): The resize we want to apply on the frames of the video.
|
| 62 |
+
- `frame_step_size` (int): The step size between the frames of the video (to send to the model).
|
| 63 |
+
- `start_frame` (int): The start frame of the video (to send to the model).
|
| 64 |
+
- `end_frame` (int): The last frame of the video (to send to the model).
|
| 65 |
+
|
| 66 |
+
*Input Interface (Expected input the after the first time in flow)*:
|
| 67 |
+
|
| 68 |
+
- `query` (str): The textual query to run the model on.
|
| 69 |
+
- `data` (Dict[str, Any]): The data (images or video) to run the model on. It can contain the following keys:
|
| 70 |
+
- `images` (List[Dict[str, Any]]): A list of images to run the model on. Each image is a dictionary that contains the following keys:
|
| 71 |
+
- `type` (str): The type of the image. It can be "local_path" or "url".
|
| 72 |
+
- `image` (str): The image. If type is "local_path", it is a local path to the image. If type is "url", it is a url to the image.
|
| 73 |
+
- `video` (Dict[str, Any]): A video to run the model on. It is a dictionary that contains the following keys:
|
| 74 |
+
- `video_path` (str): The path to the video.
|
| 75 |
+
- `resize` (int): The resize we want to apply on the frames of the video.
|
| 76 |
+
- `frame_step_size` (int): The step size between the frames of the video (to send to the model).
|
| 77 |
+
- `start_frame` (int): The start frame of the video (to send to the model).
|
| 78 |
+
- `end_frame` (int): The last frame of the video (to send to the model).
|
| 79 |
+
|
| 80 |
+
*Output Interface*:
|
| 81 |
+
|
| 82 |
+
- `api_output`s (str): The api output of the flow to the query and data
|
| 83 |
+
|
| 84 |
+
"""
|
| 85 |
@staticmethod
|
| 86 |
def get_image(image):
|
| 87 |
+
""" This method returns an image in the appropriate format for API.
|
| 88 |
+
|
| 89 |
+
:param image: The image dictionary.
|
| 90 |
+
:type image: Dict[str, Any]
|
| 91 |
+
:return: The image url.
|
| 92 |
+
:rtype: Dict[str, Any]
|
| 93 |
+
"""
|
| 94 |
extension_dict = {
|
| 95 |
"jpg": "jpeg",
|
| 96 |
"jpeg": "jpeg",
|
|
|
|
| 116 |
|
| 117 |
@staticmethod
|
| 118 |
def get_video(video):
|
| 119 |
+
""" This method returns the video in the appropriate format for API.
|
| 120 |
+
|
| 121 |
+
:param video: The video dictionary.
|
| 122 |
+
:type video: Dict[str, Any]
|
| 123 |
+
:return: The video url.
|
| 124 |
+
:rtype: Dict[str, Any]
|
| 125 |
+
"""
|
| 126 |
video_path = video["video_path"]
|
| 127 |
resize = video.get("resize",768)
|
| 128 |
frame_step_size = video.get("frame_step_size",10)
|
|
|
|
| 141 |
|
| 142 |
@staticmethod
|
| 143 |
def get_user_message(prompt_template, input_data: Dict[str, Any]):
|
| 144 |
+
""" This method constructs the user message to be passed to the API.
|
| 145 |
+
|
| 146 |
+
:param prompt_template: The prompt template to use.
|
| 147 |
+
:type prompt_template: PromptTemplate
|
| 148 |
+
:param input_data: The input data.
|
| 149 |
+
:type input_data: Dict[str, Any]
|
| 150 |
+
:return: The constructed user message (images , videos and text).
|
| 151 |
+
:rtype: Dict[str, Any]
|
| 152 |
+
"""
|
| 153 |
content = VisionAtomicFlow._get_message(prompt_template=prompt_template,input_data=input_data)
|
| 154 |
media_data = input_data["data"]
|
| 155 |
if "video" in media_data:
|
|
|
|
| 161 |
|
| 162 |
@staticmethod
|
| 163 |
def _get_message(prompt_template, input_data: Dict[str, Any]):
|
| 164 |
+
""" This method constructs the textual message to be passed to the API.
|
| 165 |
+
|
| 166 |
+
:param prompt_template: The prompt template to use.
|
| 167 |
+
:type prompt_template: PromptTemplate
|
| 168 |
+
:param input_data: The input data.
|
| 169 |
+
:type input_data: Dict[str, Any]
|
| 170 |
+
:return: The constructed textual message.
|
| 171 |
+
:rtype: Dict[str, Any]
|
| 172 |
+
"""
|
| 173 |
template_kwargs = {}
|
| 174 |
for input_variable in prompt_template.input_variables:
|
| 175 |
template_kwargs[input_variable] = input_data[input_variable]
|
|
|
|
| 177 |
return [{"type": "text", "text": msg_content}]
|
| 178 |
|
| 179 |
def _process_input(self, input_data: Dict[str, Any]):
|
| 180 |
+
""" This method processes the input data (prepares the messages to send to the API).
|
| 181 |
+
|
| 182 |
+
:param input_data: The input data.
|
| 183 |
+
:type input_data: Dict[str, Any]
|
| 184 |
+
:return: The processed input data.
|
| 185 |
+
:rtype: Dict[str, Any]
|
| 186 |
+
"""
|
| 187 |
if self._is_conversation_initialized():
|
| 188 |
# Construct the message using the human message prompt template
|
| 189 |
user_message_content = self.get_user_message(self.human_message_prompt_template, input_data)
|
VisionAtomicFlow.yaml
CHANGED
|
@@ -1,4 +1,6 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
| 2 |
enable_cache: True
|
| 3 |
|
| 4 |
n_api_retries: 6
|
|
@@ -30,20 +32,22 @@ human_message_prompt_template:
|
|
| 30 |
template: "{{query}}"
|
| 31 |
input_variables:
|
| 32 |
- "query"
|
|
|
|
| 33 |
input_interface_initialized:
|
| 34 |
- "query"
|
| 35 |
- "data"
|
| 36 |
|
| 37 |
-
query_message_prompt_template:
|
| 38 |
-
_target_: flows.prompt_template.JinjaPrompt
|
| 39 |
-
|
| 40 |
-
|
| 41 |
previous_messages:
|
| 42 |
first_k: null # Note that the first message is the system prompt
|
| 43 |
last_k: null
|
| 44 |
|
| 45 |
-
|
| 46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
output_interface:
|
| 49 |
- "api_output"
|
|
|
|
| 1 |
+
name: "VisionAtomicFlow"
|
| 2 |
+
description: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
|
| 3 |
+
|
| 4 |
enable_cache: True
|
| 5 |
|
| 6 |
n_api_retries: 6
|
|
|
|
| 32 |
template: "{{query}}"
|
| 33 |
input_variables:
|
| 34 |
- "query"
|
| 35 |
+
|
| 36 |
input_interface_initialized:
|
| 37 |
- "query"
|
| 38 |
- "data"
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
previous_messages:
|
| 41 |
first_k: null # Note that the first message is the system prompt
|
| 42 |
last_k: null
|
| 43 |
|
| 44 |
+
input_interface:
|
| 45 |
+
- "query"
|
| 46 |
+
- "data"
|
| 47 |
+
|
| 48 |
+
input_interface_non_initialized:
|
| 49 |
+
- "question"
|
| 50 |
+
- "data"
|
| 51 |
|
| 52 |
output_interface:
|
| 53 |
- "api_output"
|
__init__.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# ~~~ Specify the dependencies ~~
|
| 2 |
dependencies = [
|
| 3 |
-
{"url": "aiflows/
|
| 4 |
]
|
| 5 |
from flows import flow_verse
|
| 6 |
flow_verse.sync_dependencies(dependencies)
|
|
|
|
| 1 |
# ~~~ Specify the dependencies ~~
|
| 2 |
dependencies = [
|
| 3 |
+
{"url": "aiflows/ChatFlowModule", "revision": "a749ad10ed39776ba6721c37d0dc22af49ca0f17"}
|
| 4 |
]
|
| 5 |
from flows import flow_verse
|
| 6 |
flow_verse.sync_dependencies(dependencies)
|
demo.yaml
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
flow:
|
| 2 |
+
_target_: aiflows.VisionFlowModule.VisionAtomicFlow.instantiate_from_default_config
|
| 3 |
+
name: "Demo Vision Flow"
|
| 4 |
+
description: "A flow that, given a textual input, and a set of images and/or videos, generates a textual output."
|
| 5 |
+
backend:
|
| 6 |
+
api_infos: ???
|
| 7 |
+
|
| 8 |
+
system_message_prompt_template:
|
| 9 |
+
template: |2-
|
| 10 |
+
You are a helpful chatbot that truthfully answers questions.
|
| 11 |
+
input_variables: []
|
| 12 |
+
partial_variables: {}
|
| 13 |
+
|
| 14 |
+
init_human_message_prompt_template:
|
| 15 |
+
template: |2-
|
| 16 |
+
{{query}}
|
| 17 |
+
input_variables: ["query"]
|
| 18 |
+
partial_variables: {}
|
| 19 |
+
|
| 20 |
+
|
run.py
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
|
| 3 |
+
import hydra
|
| 4 |
+
|
| 5 |
+
from flows.flow_launchers import FlowLauncher
|
| 6 |
+
from flows.backends.api_info import ApiInfo
|
| 7 |
+
from flows.utils.general_helpers import read_yaml_file
|
| 8 |
+
|
| 9 |
+
from flows import logging
|
| 10 |
+
from flows.flow_cache import CACHING_PARAMETERS, clear_cache
|
| 11 |
+
|
| 12 |
+
CACHING_PARAMETERS.do_caching = False # Set to True in order to disable caching
|
| 13 |
+
# clear_cache() # Uncomment this line to clear the cache
|
| 14 |
+
|
| 15 |
+
logging.set_verbosity_debug() # Uncomment this line to see verbose logs
|
| 16 |
+
|
| 17 |
+
from flows import flow_verse
|
| 18 |
+
|
| 19 |
+
dependencies = [
|
| 20 |
+
{"url": "aiflows/VisionFlowModule", "revision": os.getcwd()},
|
| 21 |
+
]
|
| 22 |
+
flow_verse.sync_dependencies(dependencies)
|
| 23 |
+
|
| 24 |
+
if __name__ == "__main__":
|
| 25 |
+
# ~~~ Set the API information ~~~
|
| 26 |
+
# OpenAI backend
|
| 27 |
+
|
| 28 |
+
api_information = [ApiInfo(backend_used="openai",
|
| 29 |
+
api_key = os.getenv("OPENAI_API_KEY"))]
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
# # Azure backend
|
| 33 |
+
# api_information = ApiInfo(backend_used = "azure",
|
| 34 |
+
# api_base = os.getenv("AZURE_API_BASE"),
|
| 35 |
+
# api_key = os.getenv("AZURE_OPENAI_KEY"),
|
| 36 |
+
# api_version = os.getenv("AZURE_API_VERSION") )
|
| 37 |
+
|
| 38 |
+
root_dir = "."
|
| 39 |
+
cfg_path = os.path.join(root_dir, "demo.yaml")
|
| 40 |
+
cfg = read_yaml_file(cfg_path)
|
| 41 |
+
|
| 42 |
+
cfg["flow"]["backend"]["api_infos"] = api_information
|
| 43 |
+
|
| 44 |
+
# ~~~ Instantiate the Flow ~~~
|
| 45 |
+
flow_with_interfaces = {
|
| 46 |
+
"flow": hydra.utils.instantiate(cfg['flow'], _recursive_=False, _convert_="partial"),
|
| 47 |
+
"input_interface": (
|
| 48 |
+
None
|
| 49 |
+
if cfg.get( "input_interface", None) is None
|
| 50 |
+
else hydra.utils.instantiate(cfg['input_interface'], _recursive_=False)
|
| 51 |
+
),
|
| 52 |
+
"output_interface": (
|
| 53 |
+
None
|
| 54 |
+
if cfg.get( "output_interface", None) is None
|
| 55 |
+
else hydra.utils.instantiate(cfg['output_interface'], _recursive_=False)
|
| 56 |
+
),
|
| 57 |
+
}
|
| 58 |
+
url_image = {"type": "url",
|
| 59 |
+
"image": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"}
|
| 60 |
+
|
| 61 |
+
local_image = {"type": "local_path", "image": "PATH TO YOUR LOCAL IMAGE"}
|
| 62 |
+
|
| 63 |
+
video = {"video_path": "PATH TO YOUR LOCAL VIDEO", "resize": 768, "frame_step_size": 30, "start_frame": 0, "end_frame": None }
|
| 64 |
+
|
| 65 |
+
# ~~~ Get the data ~~~
|
| 66 |
+
|
| 67 |
+
## FOR SINGLE IMAGE
|
| 68 |
+
data = {"id": 0, "query": "What’s in this image?", "data": {"images": [url_image]}} # This can be a list of samples
|
| 69 |
+
|
| 70 |
+
## FOR MULTIPLE IMAGES
|
| 71 |
+
# data = {"id": 0, "question": "What are in these images? Is there any difference between them?", "data": {"images": [url_image,local_image]}} # This can be a list of samples
|
| 72 |
+
|
| 73 |
+
## FOR VIDEO
|
| 74 |
+
# data = {"id": 0,
|
| 75 |
+
# "question": "These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.",
|
| 76 |
+
# "data": {"video": video}} # This can be a list of samples
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
# ~~~ Run inference ~~~
|
| 80 |
+
path_to_output_file = None
|
| 81 |
+
# path_to_output_file = "output.jsonl" # Uncomment this line to save the output to disk
|
| 82 |
+
|
| 83 |
+
_, outputs = FlowLauncher.launch(
|
| 84 |
+
flow_with_interfaces=flow_with_interfaces,
|
| 85 |
+
data=data,
|
| 86 |
+
path_to_output_file=path_to_output_file
|
| 87 |
+
)
|
| 88 |
+
|
| 89 |
+
# ~~~ Print the output ~~~
|
| 90 |
+
flow_output_data = outputs[0]
|
| 91 |
+
print(flow_output_data)
|