nzhenev commited on
Commit
3d64abf
·
verified ·
1 Parent(s): e936626

Update app.py from anycoder

Browse files
Files changed (1) hide show
  1. app.py +120 -166
app.py CHANGED
@@ -1,167 +1,121 @@
1
- *Thinking...*
 
 
 
 
2
 
3
- > **Refining the Uploads**
4
- >
5
- > I'm now focusing on how to modify the Gradio application to handle multiple image or video uploads. My primary concern is ensuring the output is persistent and updates dynamically as new media is uploaded, rather than requiring a full re-process. I'm taking a look at `src/sam3d_body/gradio_ui/sam3d_body_ui.py` to see how the single upload is managed currently.
6
- >
7
- >
8
- > **Analyzing Input Methods**
9
- >
10
- > I'm now weighing the best approach for the Gradio input. Initially, I'll switch from the single-image input in `sam3d_body_ui.py` to using `gr.File(file_count="multiple")` to handle multiple uploads, and passing the associated filepaths to the processing function. Considering the need for persistent output, a streaming or accumulative process is very suitable. This suggests that the current `sam3d_prediction_fn` needs to be adapted or wrapped to accept a sequence of files, making sure that it process each file one by one in one go.
11
- >
12
- >
13
- > **Formulating a Plan**
14
- >
15
- > The immediate focus is updating the processing logic, especially `sam3d_prediction_fn`, to accommodate multiple file uploads via `gr.File(file_count="multiple")`. This necessitates iterating through the file list, differentiating between images and videos. The "persistent updates" requirement leads me to consider frame-by-frame processing for videos and the use of Rerun's timeline features and streaming capabilities for a dynamic output. I need to make the prediction function able to yield the outputs so it can stream to the Gradio interface.
16
- >
17
- >
18
- > **Adapting Processing Steps**
19
- >
20
- > The key is now to adapt the `sam3d_prediction_fn` to iterate and process the incoming file list. For videos, this means frame-by-frame handling; for images, single-frame processing. I'm focusing on ensuring the Rerun timeline remains continuous across multiple file uploads, avoiding any reset behaviors. The streaming capabilities of the `gradio-rerun` library are the central part of the design, especially through `@rr.thread_local_stream`, and the use of the `yield` statement within the prediction function is critical.
21
- >
22
- >
23
- > **Evaluating the Implementation**
24
- >
25
- > My thought process is now on coding the changes. I'll begin by modifying `sam3d_body_ui.py`, first changing the input from `gr.Image` to `gr.File(file_count="multiple")`. This will require adapting `sam3d_prediction_fn` to iterate through the file list. I'll need to use `av` to handle videos and `cv2` can be removed. The core challenge is making sure the Rerun timeline correctly sequences frames from different uploads seamlessly, especially when using `yield`. I'll test it step-by-step.
26
- >
27
- >
28
- > **Revising the Implementation**
29
- >
30
- > The core of my thought process is now on coding. I'll change the Gradio UI input from `gr.Image` to `gr.File(file_count="multiple", label="Input Images/Videos")`. Then I'll adapt `sam3d_prediction_fn` to iterate through the provided files. I'll need to use the `av` library for video frame extraction and consider a global frame counter for consistent Rerun timeline indexing. The main challenge will be maintaining continuous Rerun timelines across file processing and handling the streamed outputs.
31
- >
32
- >
33
- > **Revising the Processing Logic**
34
- >
35
- > I'm now focusing on implementing the proposed changes. I'll modify `src/sam3d_body/gradio_ui/sam3d_body_ui.py`, replacing the single-image input with `gr.File(file_count="multiple", label="Input Images/Videos")`. I'll adapt `sam3d_prediction_fn` to process each uploaded file, handling image and video types. This includes using `cv2` or `av` for frame extraction in videos and creating a global counter for the Rerun timeline. I'll make sure the `@rr.thread_local_stream` streams everything.
36
- >
37
- >
38
- > **Adapting to Multiple Inputs**
39
- >
40
- > I'm now refactoring the `sam3d_prediction_fn` function. The goal is to move from single image input to handling a list of file paths uploaded via `gr.File(file_count="multiple")`. This change entails processing each file sequentially and differentiating between image and video types. The core will involve integrating `cv2` or `av` for video frame extraction and ensuring consistent Rerun timeline indexing across multiple uploads, using the stream.
41
- >
42
- >
43
- > **Adapting to File Inputs**
44
- >
45
- > I'm now restructuring `sam3d_prediction_fn` to handle a list of file paths uploaded by the user, replacing the single-image input. This refactoring involves iterating through each file, distinguishing between images and videos to load the frames accordingly using `cv2`. I'm focusing on incorporating a global frame counter for consistent Rerun timeline indexing across all uploads, with a focus on smooth, dynamic updates.
46
- >
47
- >
48
- > **Re-Evaluating Function Inputs**
49
- >
50
- > I'm now changing the function signature for `sam3d_prediction_fn` to accept `file_paths: list[str]` instead of the current input. I'm focusing on creating an efficient processing loop, distinguishing between images and videos and using frame indexing for Rerun's timeline to ensure continuity across uploads. I will use `cv2` for frame extraction.
51
- >
52
- >
53
- > **Re-Evaluating Functionality**
54
- >
55
- > I'm now changing the function signature for `sam3d_prediction_fn`, making it accept a list of file paths. The current focus is on creating an effective processing loop that distinguishes between images and videos and also uses a frame index for the Rerun timeline. My current plan involves leveraging `cv2` for frame extraction. I'm taking the steps necessary to make my function more versatile.
56
- >
57
- >
58
- > **Adjusting the Processing Loop**
59
- >
60
- > I'm now changing the function signature for `sam3d_prediction_fn`, making it accept a list of file paths. I'm focusing on creating an effective processing loop that distinguishes between images and videos and also uses a frame index for the Rerun timeline. I've chosen `cv2` for frame extraction. I'm taking the steps necessary to make my function more versatile.
61
- >
62
- >
63
- > **Adjusting the Frame Counter**
64
- >
65
- > I'm now in the process of finalizing the processing loop. Specifically, I'm integrating the global frame counter more comprehensively to ensure accurate Rerun timeline indexing across all uploaded files. The core focus is on preventing any reset behavior. I'm also preparing to test the new function.
66
- >
67
- >
68
- > **Adjusting File Handling**
69
- >
70
- > I'm now adapting the processing logic within `sam3d_prediction_fn` to correctly use the file list provided. This involves iterating over each file path, determining its type (image or video), and loading the media accordingly, using `cv2` for frame extraction. I'm focusing on the integration of the frame index within the Rerun stream.
71
- >
72
- >
73
- > **Adapting File Processing**
74
- >
75
- > I'm now adapting the core function, `sam3d_prediction_fn`, to handle the list of file paths delivered by the `gr.File` input. The initial processing will need to determine the file type (image or video) and then call appropriate extraction methods to ensure the Rerun timeline remains continuous across multiple uploads. I'm focusing on the global frame counter's correct incrementing and the smooth display of updates in the Gradio interface.
76
- >
77
- >
78
- > **Adjusting File Paths**
79
- >
80
- > I'm now adapting the processing logic within `sam3d_prediction_fn` to correctly use the file list provided. This involves iterating over each file path, determining its type (image or video), and loading the media accordingly, using `cv2` for frame extraction. I'm focusing on the integration of the frame index within the Rerun stream.
81
- >
82
- >
83
- > **Adjusting the Main Function**
84
- >
85
- > I'm now restructuring `main` to integrate the changes to `sam3d_prediction_fn` and the input components. The main adjustment is updating the Gradio UI to use `gr.File(file_count="multiple")` for handling multiple image and video uploads, alongside corresponding modifications to example inputs. I'm focusing on a seamless user experience.
86
- >
87
- >
88
- > **Modifying UI Components**
89
- >
90
- > I'm now changing the input component from `gr.Image` to `gr.File(file_count="multiple", label="Input Images/Videos")` and modifying the examples to handle file paths. This addresses the core requirement of supporting multiple image and video uploads. I'm taking the necessary steps to make my function more versatile.
91
- >
92
- >
93
- > **Refactoring UI for Uploads**
94
- >
95
- > I'm now changing the input component from `gr.Image` to `gr.File(file_count="multiple", label="Input Images/Videos")` and modifying example inputs. This satisfies the requirement of supporting many image and video uploads, and I'll focus on testing the updated UI interaction.
96
- >
97
- >
98
- > **Adjusting for Code Implementation**
99
- >
100
- > I'm now integrating the changes into `src/sam3d_body/gradio_ui/sam3d_body_ui.py`, including modifications to the function signatures, input components, and example data. I've focused on ensuring the correct use of `gr.File` for uploading multiple image or video files and adapting the processing logic to manage them effectively.
101
- >
102
- >
103
- > **Modifying the Main Script**
104
- >
105
- > I'm now integrating the changes into `src/sam3d_body/gradio_ui/sam3d_body_ui.py`, including modifications to the function signatures, input components, and example data. I'm focused on ensuring the correct use of `gr.File` for uploading multiple image or video files and adapting the processing logic to manage them effectively.
106
- >
107
- >
108
- > **Adapting the Main Code**
109
- >
110
- > I've now incorporated the changes into the main UI file, `src/sam3d_body/gradio_ui/sam3d_body_ui.py`. I've updated the file input to `gr.File` for handling multiple files and adjusted the example inputs accordingly. The focus is to make the application fully compatible with multiple image and video uploads, and ensuring it performs as expected.
111
- >
112
- >
113
- > **Adjusting for Code Updates**
114
- >
115
- > I've integrated the changes into `src/sam3d_body/gradio_ui/sam3d_body_ui.py`, updating the input with `gr.File(file_count="multiple")` and example data. The focus is to make the application handle multiple images and videos. The next step is to test it thoroughly.
116
- >
117
- >
118
- > **Adjusting UI and Logic**
119
- >
120
- > I've now incorporated the changes into `src/sam3d_body/gradio_ui/sam3d_body_ui.py`, updating the file input to accept multiple uploads. I've adjusted the code to load and handle images and video files. The main task will be testing to ensure that the multiple uploads are processed correctly, and that the Rerun streaming occurs as designed, showing frames smoothly.
121
- >
122
- >
123
- > **Adjusting the Main UI**
124
- >
125
- > I've now incorporated the changes into `src/sam3d_body/gradio_ui/sam3d_body_ui.py`, updating the file input to support multiple uploads and revising the example inputs, including the addition of `cv2`. I am making sure that the application now smoothly handles multiple images and videos. The main task is verifying the Rerun streaming and its dynamic behavior.
126
- >
127
- >
128
- > **Refactoring for File Handling**
129
- >
130
- > I'm now changing the function signature for `sam3d_prediction_fn`, making it accept a list of file paths. I'm focusing on creating an effective processing loop that distinguishes between images and videos and also uses a frame index for the Rerun timeline. I've chosen `cv2` for frame extraction. I'm taking the steps necessary to make my function more versatile.
131
- >
132
- >
133
- > **Finalizing Code and Testing**
134
- >
135
- > I've implemented the changes in `src/sam3d_body/gradio_ui/sam3d_body_ui.py`, updating the UI and processing logic to accommodate multiple file uploads and stream the Rerun output. The focus now is on thorough testing to ensure proper handling of various image and video formats, continuous timeline indexing, and correct GLB export, using `cv2` within the revised functions.
136
- >
137
- >
138
- > **Adjusting for Testing**
139
- >
140
- > I'm now adapting the code to accommodate the changes, after which I'll focus on testing. I'm focusing on ensuring the Rerun timeline remains continuous across multiple file uploads and that videos are handled correctly with `cv2`. The core thing is getting to testing now.
141
- >
142
- >
143
- > **Finalizing the Integration**
144
- >
145
- > I'm now delivering the completed code for `src/sam3d_body/gradio_ui/sam3d_body_ui.py`, fully updated to handle multiple file uploads and stream the Rerun output. The changes now allow for the correct processing of a user's image and video uploads, with the `cv2` library and the Rerun stream working in tandem.
146
- >
147
- >
148
- > **Finalizing the Adaptation**
149
- >
150
- > I'm now delivering the completed code for `src/sam3d_body/gradio_ui/sam3d_body_ui.py`, fully updated to handle multiple file uploads and stream the Rerun output. The changes now allow for the correct processing of a user's image and video uploads, with the `cv2` library and the Rerun stream working in tandem, ready for final tests.
151
- >
152
- >
153
- > **Modifying the UI Elements**
154
- >
155
- > I'm now restructuring `src/sam3d_body/gradio_ui/sam3d_body_ui.py` to change the input component and related examples to support multiple files, using `gr.File(file_count="multiple", label="Input Images/Videos")`. I'm also preparing for the required tests.
156
- >
157
- >
158
- > **Adapting for Gradio Updates**
159
- >
160
- > I'm now fully integrating the file upload functionality and Rerun streaming. This encompasses modifying the UI components and processing steps to accommodate `gr.File` for uploading and iterating through multiple files. The emphasis is on seamless operation and correct behavior when the user uploads multiple media, and ensuring updates show smoothly in the Gradio interface.
161
- >
162
- >
163
- > **Integrating New Uploads**
164
- >
165
- > I'm now changing the input to `gr.File(file_count="multiple")` in `src/sam3d_body/gradio_ui/sam3d_body_ui.py` to allow multi-file uploads. The `sam3d_prediction_fn` function signature is also updated to accommodate a list of file paths to process. I am also adjusting the UI examples for file input.
166
- >
167
- >
 
1
+ import gradio as gr
2
+ import cv2
3
+ import os
4
+ import time
5
+ from utils import is_video, process_frame_mock
6
 
7
+ def sam3d_prediction_fn(file_paths, progress=gr.Progress()):
8
+ """
9
+ Processes a list of file paths (images or videos).
10
+ Simulates the SAM3D/Rerun logic by iterating through files
11
+ and yielding results for a streaming UI.
12
+ """
13
+ if not file_paths:
14
+ return [], "No files uploaded."
15
+
16
+ processed_results = []
17
+ total_files = len(file_paths)
18
+ status_log = ""
19
+
20
+ # Global frame counter simulation (as mentioned in your thought trace)
21
+ global_frame_idx = 0
22
+
23
+ for i, file_path in enumerate(file_paths):
24
+ file_name = os.path.basename(file_path)
25
+ status_msg = f"Processing file {i+1}/{total_files}: {file_name}..."
26
+ status_log += status_msg + "\n"
27
+
28
+ # Yield status update immediately
29
+ yield processed_results, status_log
30
+
31
+ # Determine if file is video or image
32
+ if is_video(file_path):
33
+ cap = cv2.VideoCapture(file_path)
34
+ frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
35
+
36
+ # Process video frames (simulating frame-by-frame logic)
37
+ # We'll just process a few frames to keep the demo quick
38
+ max_frames_to_preview = 5
39
+ current_frame = 0
40
+
41
+ while cap.isOpened() and current_frame < max_frames_to_preview:
42
+ ret, frame = cap.read()
43
+ if not ret:
44
+ break
45
+
46
+ # Simulate SAM3D processing on the frame
47
+ processed_frame = process_frame_mock(frame, f"Video Frame {global_frame_idx}")
48
+ processed_results.append((processed_frame, f"{file_name} - Frame {current_frame}"))
49
+
50
+ global_frame_idx += 1
51
+ current_frame += 1
52
+
53
+ # Stream updates every frame
54
+ yield processed_results, status_log
55
+ time.sleep(0.1) # Simulate processing time
56
+
57
+ cap.release()
58
+ status_log += f"Finished video: {file_name}\n"
59
+
60
+ else:
61
+ # Process single image
62
+ image = cv2.imread(file_path)
63
+ if image is not None:
64
+ processed_image = process_frame_mock(image, f"Image {global_frame_idx}")
65
+ processed_results.append((processed_image, file_name))
66
+ global_frame_idx += 1
67
+
68
+ # Stream update
69
+ yield processed_results, status_log
70
+ time.sleep(0.5) # Simulate processing time
71
+
72
+ status_log += f"Finished image: {file_name}\n"
73
+
74
+ status_log += "All processing complete."
75
+ yield processed_results, status_log
76
+
77
+ # --- Gradio 6 Application Structure ---
78
+
79
+ with gr.Blocks() as demo:
80
+ # Header with required link
81
+ gr.Markdown("# SAM3D Multi-File Processor")
82
+ gr.Markdown("[Built with anycoder](https://huggingface.co/spaces/akhaliq/anycoder)")
83
+
84
+ with gr.Row():
85
+ with gr.Column(scale=1):
86
+ # UPDATED: Using gr.File with file_count="multiple" as requested
87
+ input_files = gr.File(
88
+ file_count="multiple",
89
+ label="Input Images/Videos",
90
+ file_types=["image", "video"]
91
+ )
92
+ process_btn = gr.Button("Start Processing", variant="primary")
93
+
94
+ # Log output to see the sequential processing logic
95
+ log_output = gr.Textbox(label="Processing Log", lines=10, interactive=False)
96
+
97
+ with gr.Column(scale=2):
98
+ # Gallery to display the streaming results
99
+ output_gallery = gr.Gallery(
100
+ label="Processed Stream",
101
+ columns=3,
102
+ height=600,
103
+ object_fit="contain"
104
+ )
105
+
106
+ # Event Listener
107
+ process_btn.click(
108
+ fn=sam3d_prediction_fn,
109
+ inputs=[input_files],
110
+ outputs=[output_gallery, log_output],
111
+ api_visibility="public"
112
+ )
113
+
114
+ # Launch with Gradio 6 parameters
115
+ if __name__ == "__main__":
116
+ demo.launch(
117
+ theme=gr.themes.Soft(),
118
+ footer_links=[
119
+ {"label": "Built with anycoder", "url": "https://huggingface.co/spaces/akhaliq/anycoder"}
120
+ ]
121
+ )