MultiPerson

Running on Zero

App Files Files Community

C4G-HKUST commited on 4 days ago

Commit

8216d98

1 Parent(s): 32942ab

Update quality mode duration calculation: add 60s preprocessing time, change formula to 60s + video_seconds × steps × 2.5s

Browse files

Files changed (2) hide show

README.md +9 -12
app.py +4 -4

README.md CHANGED Viewed

@@ -212,21 +212,18 @@ python app.py
 #### Generation Modes
 The Gradio demo provides two generation modes:
-- **Fast Mode (up to 210s GPU budget)**:
-  - Fixed 10 denoising steps for quick generation
-  - Suitable for single-person videos or quick previews
-  - Lower GPU usage quota consumption
-  - The 210s is the maximum GPU allocation time (budget), not the actual generation time
-  - **For free-tier users: Fast mode can generate approximately 6 seconds of two-person video at most; longer videos may timeout.**
-- **Quality Mode (up to 720s GPU budget)**:
-  - Custom denoising steps (adjustable via "Diffusion steps" slider)
   - Recommended for multi-person videos that require higher quality
   - Longer generation time but better quality output
-  - The 720s is the maximum GPU allocation time (budget), not the actual generation time
-  - With 40 denoising steps, approximately 10 seconds of video can be generated
-**Design Rationale**: Multi-person videos generally have longer duration and require more computational resources. To achieve better quality, especially for complex multi-person interactions, more denoising steps and longer GPU allocation time are needed. The Quality Mode provides sufficient Usage Quota (up to 720 seconds) to accommodate these requirements, while the Fast Mode offers a quick preview option with fixed 10 steps for faster iteration. Note that the GPU duration values (210s/720s) represent the maximum budget allocated, not the actual generation time.

 #### Generation Modes
 The Gradio demo provides two generation modes:
+- **Fast Mode (120s GPU budget, suitable for any type of users)**:
+  - Fixed 8 denoising steps for quick generation
+  - Maximum video duration: 4 seconds
+  - Audio inputs longer than 4 seconds will be automatically trimmed to 4 seconds
+- **Quality Mode (Dynamic GPU budget)**:
+  - Custom denoising steps (adjustable via "Diffusion steps" slider, default: 25 steps)
   - Recommended for multi-person videos that require higher quality
+  - GPU duration is dynamically calculated as: **60s (preprocessing) + video_seconds × steps × 2.5s**
   - Longer generation time but better quality output
+**Design Rationale**: Multi-person videos generally have longer duration and require more computational resources. To achieve better quality, especially for complex multi-person interactions, more denoising steps and longer GPU allocation time are needed. The Quality Mode dynamically allocates GPU time based on video length and denoising steps, while the Fast Mode offers a quick preview option with fixed 8 steps and a 4-second maximum duration for faster iteration.

app.py CHANGED Viewed

@@ -648,9 +648,9 @@ def run_graio_demo(args):
     def get_duration(video_seconds, steps):
         """
         计算quality模式所需的GPU duration
-        duration = 视频秒数 * 步数 * 3.5 秒
         """
-        return int(video_seconds * steps * 3.5)
     # 为quality模式创建动态duration计算函数
     def calculate_quality_duration(*args, **kwargs):
@@ -932,7 +932,7 @@ def run_graio_demo(args):
                 gr.Markdown("""
                 **Generation Modes:**
                 - **Fast Mode (120s GPU budget, suitable for any type of users)**: Fixed 8 denoising steps for quick generation. Maximum video duration: 4 seconds.
-                - **Quality Mode (Dynamic GPU budget)**: Custom denoising steps (adjustable via "Diffusion steps" slider, default: 25 steps). GPU duration is dynamically calculated as: video_seconds × steps × 3.5 s.
                 *Note: Fast mode has a fixed 120s GPU budget. Quality mode dynamically allocates GPU time based on video length and denoising steps. Multi-person videos generally require longer duration and more Usage Quota for better quality.*
                 """)
@@ -1058,7 +1058,7 @@ def run_graio_demo(args):
                 # 计算实际使用的duration
                 actual_duration = get_duration(actual_video_seconds, actual_steps)
                 # 使用 gr.Info 提示用户
-                info_msg = f"Video generation completed! Duration used: {actual_duration}s (estimated: {actual_video_seconds:.2f}s video × {actual_steps} steps × 2s)"
                 gr.Info(info_msg)
                 return output_file
             else:

     def get_duration(video_seconds, steps):
         """
         计算quality模式所需的GPU duration
+        duration = 60s (预处理时间) + 视频秒数 * 步数 * 2.5 秒
         """
+        return int(60 + video_seconds * steps * 2.5)
     # 为quality模式创建动态duration计算函数
     def calculate_quality_duration(*args, **kwargs):
                 gr.Markdown("""
                 **Generation Modes:**
                 - **Fast Mode (120s GPU budget, suitable for any type of users)**: Fixed 8 denoising steps for quick generation. Maximum video duration: 4 seconds.
+                - **Quality Mode (Dynamic GPU budget)**: Custom denoising steps (adjustable via "Diffusion steps" slider, default: 25 steps). GPU duration is dynamically calculated as: 60s (preprocessing) + video_seconds × steps × 2.5 s.
                 *Note: Fast mode has a fixed 120s GPU budget. Quality mode dynamically allocates GPU time based on video length and denoising steps. Multi-person videos generally require longer duration and more Usage Quota for better quality.*
                 """)
                 # 计算实际使用的duration
                 actual_duration = get_duration(actual_video_seconds, actual_steps)
                 # 使用 gr.Info 提示用户
+                info_msg = f"Video generation completed! Duration used: {actual_duration}s (60s preprocessing + {actual_video_seconds:.2f}s video × {actual_steps} steps × 2.5s)"
                 gr.Info(info_msg)
                 return output_file
             else: