jucai commited on
Commit
3e43c71
·
1 Parent(s): 0b2e207

Add MuseV video generator files

Browse files
Files changed (7) hide show
  1. .gitattributes +5 -0
  2. .gitignore +77 -0
  3. .space-yaml +10 -0
  4. README.md +35 -8
  5. app.py +815 -0
  6. packages.txt +2 -0
  7. requirements.txt +26 -0
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.gif filter=lfs diff=lfs merge=lfs -text
37
+ *.png filter=lfs diff=lfs merge=lfs -text
38
+ *.jpg filter=lfs diff=lfs merge=lfs -text
39
+ *.jpeg filter=lfs diff=lfs merge=lfs -text
40
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 操作系统文件
2
+ .DS_Store
3
+ .DS_Store?
4
+ ._*
5
+ .Spotlight-V100
6
+ .Trashes
7
+ ehthumbs.db
8
+ Thumbs.db
9
+
10
+ # Python缓存
11
+ __pycache__/
12
+ *.py[cod]
13
+ *$py.class
14
+
15
+ # 虚拟环境
16
+ venv/
17
+ .env
18
+ .env.local
19
+ .env.development.local
20
+ .env.test.local
21
+ .env.production.local
22
+
23
+ # 日志
24
+ *.log
25
+ *.log.*
26
+
27
+ # 构建文件
28
+ build/
29
+ dist/
30
+ *.egg-info/
31
+
32
+ # IDE配置
33
+ .vscode/
34
+ .idea/
35
+ *.swp
36
+ *.swo
37
+ *~
38
+
39
+ # 临时文件
40
+ *.tmp
41
+ *.temp
42
+ *.bak
43
+ *.backup
44
+
45
+ # 数据文件
46
+ data/
47
+ models/
48
+
49
+ # 敏感信息
50
+ *.key
51
+ *.pem
52
+ *.cer
53
+ *.crt
54
+ *.pfx
55
+ *.p12
56
+ *.p7b
57
+ *.p7c
58
+ *.p7m
59
+ *.p7s
60
+ *.srl
61
+
62
+ # Hugging Face缓存
63
+ .huggingface/
64
+
65
+ # 测试相关
66
+ coverage/
67
+ .coverage
68
+ .tox/
69
+ nosetests.xml
70
+ .pytest_cache/
71
+
72
+ # 其他可能包含敏感信息的文件
73
+ *.secret
74
+ *.private
75
+ *.auth
76
+ *.token
77
+ *.access
.space-yaml ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ title: MuseV 视频生成工具
2
+ emoji: 🚀
3
+ colorFrom: red
4
+ colorTo: indigo
5
+ sdk: gradio
6
+ sdk_version: 4.25.0
7
+ app_file: app.py
8
+ pinned: false
9
+ license: mit
10
+ python_version: "3.10"
README.md CHANGED
@@ -1,14 +1,41 @@
1
  ---
2
- title: MuseV Video Generator
3
- emoji: 💻
4
- colorFrom: indigo
5
- colorTo: pink
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
- app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: 基于MuseV模型开发的交互式视频生成工具,支持通过文本描述快速生成动态视频
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: MuseV 视频生成工具 # 保留原项目名,补充“视频生成工具”更清晰
3
+ emoji: 🚀 # 沿用原项目的 emoji,保持辨识度
4
+ colorFrom: red # 保留原项目的主题色
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 4.25.0 # 严格沿用原项目的 Gradio 版本,避免版本冲突
8
+ app_file: app.py # 与你的入口文件一致
9
  pinned: false
10
  license: mit
 
11
  ---
12
 
13
+ # MuseV 文本到视频生成工具
14
+
15
+ 基于 MuseV 模型开发的交互式视频生成工具,支持通过文本描述快速生成动态视频,可灵活调整视频参数以匹配你的需求。
16
+
17
+ ## 🌟 核心功能
18
+ - **文本驱动**:输入任意文本描述(如“星空下的海浪拍打礁石”),即可生成对应视频
19
+ - **参数可调**:支持自定义视频分辨率、运动速度、生成步数等,平衡画质与效率
20
+ - **实时预览**:生成完成后可直接在页面播放,支持下载本地保存
21
+ - **简洁界面**:清晰区分输入区与输出区,新手也能快速上手
22
+
23
+ ## 📝 使用步骤
24
+ 1. **输入提示词**:在左侧文本框填写详细的视频描述(越具体,生成效果越符合预期)
25
+ - 示例:“一只橘猫在阳光下打哈欠,毛发柔软,背景是木质地板”
26
+ 2. **调整参数**(默认参数已适配多数场景,可按需修改):
27
+ - 分辨率:建议选择 512×512 或 768×768(过高会增加生成时间)
28
+ - 引导强度:值越高(如 8-10),越贴近提示词;值越低(如 3-5),创意性越强
29
+ - 运动速度:值越高(如 10-12),视频中物体运动越明显
30
+ 3. **点击生成**:点击“生成视频”按钮,等待模型运行(首次加载约 1-2 分钟,后续更快)
31
+ 4. **查看结果**:右侧视频区会显示生成结果,点击“下载”可保存到本地
32
+
33
+ ## ⚠️ 注意事项
34
+ - 生成时间:取决于参数设置,高分辨率(如 1024×1024)+ 多步数(如 50 步)可能需要 3-5 分钟
35
+ - 错误排查:若生成失败,可查看“状态信息”栏提示(常见原因:提示词过长、参数超出硬件限制)
36
+ - 模型依赖:首次运行会自动加载预训练模型,需确保网络通畅
37
+
38
+ (更新于2024年7月)
39
+
40
+ ## 📚 配置参考
41
+ 更多 Space 配置细节,可查看官方文档:https://huggingface.co/docs/hub/spaces-config-reference
app.py ADDED
@@ -0,0 +1,815 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import copy
3
+ import os
4
+ from pathlib import Path
5
+ import sys
6
+ import logging
7
+ from collections import OrderedDict
8
+ from pprint import pprint
9
+ import random
10
+ import gradio as gr
11
+ from argparse import Namespace
12
+
13
+ # 添加MuseV项目路径到系统路径
14
+ sys.path.append(os.path.join(os.path.dirname(__file__), '../MuseV'))
15
+
16
+ try:
17
+ import numpy as np
18
+ from omegaconf import OmegaConf, SCMode
19
+ import torch
20
+ from einops import rearrange, repeat
21
+ import cv2
22
+ from PIL import Image
23
+ from diffusers.models.autoencoder_kl import AutoencoderKL
24
+
25
+ # 导入MuseV必要的模块
26
+ from mmcm.utils.load_util import load_pyhon_obj
27
+ from mmcm.utils.seed_util import set_all_seed
28
+ from mmcm.utils.signature import get_signature_of_string
29
+ from mmcm.vision.utils.data_type_util import is_video, is_image, read_image_as_5d
30
+ from mmcm.utils.str_util import clean_str_for_save
31
+ from musev.models.referencenet_loader import load_referencenet_by_name
32
+ from musev.models.ip_adapter_loader import (
33
+ load_vision_clip_encoder_by_name,
34
+ load_ip_adapter_image_proj_by_name,
35
+ )
36
+ from musev.models.ip_adapter_face_loader import (
37
+ load_ip_adapter_face_extractor_and_proj_by_name,
38
+ )
39
+ from musev.pipelines.pipeline_controlnet_predictor import (
40
+ DiffusersPipelinePredictor,
41
+ )
42
+ from musev.models.unet_loader import load_unet_by_name
43
+ from musev.utils.util import save_videos_grid_with_opencv
44
+ from musev import logger
45
+
46
+ # 确保cuid模块可用
47
+ try:
48
+ import cuid
49
+ except ImportError:
50
+ print("cuid module not found, using a simple implementation")
51
+ import uuid
52
+ class cuid:
53
+ @staticmethod
54
+ def cuid():
55
+ return str(uuid.uuid4())[:8]
56
+
57
+ # 设置基本配置
58
+ logger.setLevel(logging.INFO)
59
+ except ImportError as e:
60
+ print(f"Import error: {e}")
61
+ print("请确保MuseV项目正确安装了所有依赖")
62
+ # 使用mock实现让界面能够运行
63
+ import numpy as np
64
+ import cv2
65
+ from PIL import Image
66
+ import torch
67
+ from argparse import Namespace
68
+
69
+ class MockLogger:
70
+ def __init__(self):
71
+ self.level = logging.INFO
72
+ def info(self, msg):
73
+ print(f"INFO: {msg}")
74
+ def error(self, msg):
75
+ print(f"ERROR: {msg}")
76
+ def setLevel(self, level):
77
+ self.level = level
78
+
79
+ logger = MockLogger()
80
+
81
+ class cuid:
82
+ @staticmethod
83
+ def cuid():
84
+ import uuid
85
+ return str(uuid.uuid4())[:8]
86
+
87
+ def set_all_seed(seed):
88
+ return None, None
89
+
90
+ def save_videos_grid_with_opencv(videos, output_path, texts=None, fps=4, tensor_order="b c t h w", n_cols=1, write_info=False, save_filetype="mp4", save_images=False):
91
+ try:
92
+ if tensor_order == "b c t h w":
93
+ videos = videos.transpose(0, 2, 3, 4, 1)
94
+ elif tensor_order == "b t c h w":
95
+ videos = videos.transpose(0, 1, 3, 4, 2)
96
+
97
+ video = videos[0]
98
+ height, width, channels = video[0].shape
99
+
100
+ fourcc = cv2.VideoWriter_fourcc(*'mp4v')
101
+ out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
102
+
103
+ for frame in video:
104
+ frame_bgr = cv2.cvtColor(frame.astype(np.uint8), cv2.COLOR_RGB2BGR)
105
+ out.write(frame_bgr)
106
+
107
+ out.release()
108
+ logger.info(f"Video saved to {output_path}")
109
+ return output_path
110
+ except Exception as e:
111
+ logger.error(f"Failed to save video: {e}")
112
+ return None
113
+
114
+ # 确保cuid模块可用
115
+ try:
116
+ import cuid
117
+ except ImportError:
118
+ print("cuid module not found, using a simple implementation")
119
+ import uuid
120
+ class cuid:
121
+ @staticmethod
122
+ def cuid():
123
+ return str(uuid.uuid4())[:8]
124
+
125
+ # 设置基本配置
126
+ logger.setLevel(logging.INFO)
127
+
128
+ # 设置项目路径
129
+ file_dir = os.path.dirname(__file__)
130
+ PROJECT_DIR = os.path.join(os.path.dirname(__file__))
131
+ DATA_DIR = os.path.join(PROJECT_DIR, "data")
132
+ CACHE_PATH = os.path.join(PROJECT_DIR, "t2v_input_image")
133
+ OUTPUT_DIR = os.path.join(PROJECT_DIR, "results")
134
+
135
+ # 创建必要的目录
136
+ os.makedirs(CACHE_PATH, exist_ok=True)
137
+ os.makedirs(OUTPUT_DIR, exist_ok=True)
138
+
139
+ # 参数配置
140
+ def get_default_args():
141
+ args_dict = {
142
+ "add_static_video_prompt": False,
143
+ "context_batch_size": 1,
144
+ "context_frames": 12,
145
+ "context_overlap": 4,
146
+ "context_schedule": "uniform_v2",
147
+ "context_stride": 1,
148
+ "cross_attention_dim": 768,
149
+ "face_image_path": None,
150
+ "facein_model_cfg_path": os.path.join(PROJECT_DIR, "configs/model/facein.py"),
151
+ "facein_model_name": None,
152
+ "facein_scale": 1.0,
153
+ "fix_condition_images": False,
154
+ "fixed_ip_adapter_image": True,
155
+ "fixed_refer_face_image": True,
156
+ "fixed_refer_image": True,
157
+ "fps": 4,
158
+ "guidance_scale": 7.5,
159
+ "height": None,
160
+ "img_length_ratio": 1.0,
161
+ "img_weight": 0.001,
162
+ "interpolation_factor": 1,
163
+ "ip_adapter_face_model_cfg_path": os.path.join(PROJECT_DIR, "configs/model/ip_adapter.py"),
164
+ "ip_adapter_face_model_name": None,
165
+ "ip_adapter_face_scale": 1.0,
166
+ "ip_adapter_model_cfg_path": os.path.join(PROJECT_DIR, "configs/model/ip_adapter.py"),
167
+ "ip_adapter_model_name": "musev_referencenet",
168
+ "ip_adapter_scale": 1.0,
169
+ "ipadapter_image_path": None,
170
+ "lcm_model_cfg_path": os.path.join(PROJECT_DIR, "configs/model/lcm_model.py"),
171
+ "lcm_model_name": None,
172
+ "log_level": "INFO",
173
+ "motion_speed": 8.0,
174
+ "n_batch": 1,
175
+ "n_cols": 3,
176
+ "n_repeat": 1,
177
+ "n_vision_condition": 1,
178
+ "need_hist_match": False,
179
+ "need_img_based_video_noise": True,
180
+ "need_redraw": False,
181
+ "negative_prompt": "V2",
182
+ "negprompt_cfg_path": os.path.join(PROJECT_DIR, "configs/model/negative_prompt.py"),
183
+ "noise_type": "video_fusion",
184
+ "num_inference_steps": 30,
185
+ "output_dir": OUTPUT_DIR,
186
+ "overwrite": False,
187
+ "prompt_only_use_image_prompt": False,
188
+ "record_mid_video_latents": False,
189
+ "record_mid_video_noises": False,
190
+ "redraw_condition_image": False,
191
+ "redraw_condition_image_with_facein": True,
192
+ "redraw_condition_image_with_ip_adapter_face": True,
193
+ "redraw_condition_image_with_ipdapter": True,
194
+ "redraw_condition_image_with_referencenet": True,
195
+ "referencenet_image_path": None,
196
+ "referencenet_model_cfg_path": os.path.join(PROJECT_DIR, "configs/model/referencenet.py"),
197
+ "referencenet_model_name": "musev_referencenet",
198
+ "save_filetype": "mp4",
199
+ "save_images": False,
200
+ "sd_model_cfg_path": os.path.join(PROJECT_DIR, "configs/model/T2I_all_model.py"),
201
+ "sd_model_name": "majicmixRealv6Fp16",
202
+ "seed": None,
203
+ "strength": 0.8,
204
+ "target_datas": "boy_dance2",
205
+ "test_data_path": os.path.join(PROJECT_DIR, "configs/infer/testcase_video_famous.yaml"),
206
+ "time_size": 24,
207
+ "unet_model_cfg_path": os.path.join(PROJECT_DIR, "configs/model/motion_model.py"),
208
+ "unet_model_name": "musev_referencenet",
209
+ "use_condition_image": True,
210
+ "use_video_redraw": True,
211
+ "vae_model_path": os.path.join(PROJECT_DIR, "checkpoints/vae/sd-vae-ft-mse"),
212
+ "video_guidance_scale": 3.5,
213
+ "video_guidance_scale_end": None,
214
+ "video_guidance_scale_method": "linear",
215
+ "video_negative_prompt": "V2",
216
+ "video_num_inference_steps": 10,
217
+ "video_overlap": 1,
218
+ "vision_clip_extractor_class_name": "ImageClipVisionFeatureExtractor",
219
+ "vision_clip_model_path": os.path.join(PROJECT_DIR, "checkpoints/IP-Adapter/models/image_encoder"),
220
+ "w_ind_noise": 0.5,
221
+ "width": None,
222
+ "write_info": False,
223
+ }
224
+ return Namespace(**args_dict)
225
+
226
+ # 工具函数
227
+ def generate_cuid():
228
+ return cuid.cuid()
229
+
230
+ def read_image_and_name(path):
231
+ """读取图像和名称"""
232
+ if isinstance(path, str):
233
+ path = [path]
234
+
235
+ images = []
236
+ names = []
237
+ for p in path:
238
+ try:
239
+ img = Image.open(p).convert("RGB")
240
+ img_np = np.array(img)
241
+ # 添加批次和通道维度以匹配5D格式 (b, c, t, h, w)
242
+ img_5d = np.expand_dims(np.expand_dims(img_np.transpose(2, 0, 1), 0), 2)
243
+ images.append(img_5d)
244
+ names.append(os.path.basename(p).split(".")[0])
245
+ except Exception as e:
246
+ logger.error(f"Failed to read image {p}: {e}")
247
+ continue
248
+
249
+ if not images:
250
+ return None, "no"
251
+
252
+ images_combined = np.concatenate(images, axis=2)
253
+ combined_name = "_".join(names)
254
+ return images_combined, combined_name
255
+
256
+ def get_signature_of_string(s, length=5):
257
+ """获取字符串的签名"""
258
+ import hashlib
259
+ return hashlib.md5(s.encode()).hexdigest()[:length]
260
+
261
+ def clean_str_for_save(s):
262
+ """清理字符串以便保存"""
263
+ import re
264
+ return re.sub(r'[\\/:*?"<>|]', '_', s)
265
+
266
+ def save_videos_grid_with_opencv(videos, output_path, texts=None, fps=4, tensor_order="b c t h w", n_cols=1, write_info=False, save_filetype="mp4", save_images=False):
267
+ """使用OpenCV保存视频网格"""
268
+ try:
269
+ # 确保视频数据格式正确
270
+ if tensor_order == "b c t h w":
271
+ # 转换为 b t h w c
272
+ videos = videos.transpose(0, 2, 3, 4, 1)
273
+ elif tensor_order == "b t c h w":
274
+ # 转换为 b t h w c
275
+ videos = videos.transpose(0, 1, 3, 4, 2)
276
+
277
+ # 取第一个视频
278
+ video = videos[0]
279
+ height, width, channels = video[0].shape
280
+
281
+ # 使用OpenCV保存视频
282
+ fourcc = cv2.VideoWriter_fourcc(*'mp4v')
283
+ out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
284
+
285
+ for frame in video:
286
+ # 转换RGB到BGR
287
+ frame_bgr = cv2.cvtColor(frame.astype(np.uint8), cv2.COLOR_RGB2BGR)
288
+ out.write(frame_bgr)
289
+
290
+ out.release()
291
+ logger.info(f"Video saved to {output_path}")
292
+ return output_path
293
+ except Exception as e:
294
+ logger.error(f"Failed to save video: {e}")
295
+ return None
296
+
297
+ # 初始化模型
298
+ def init_model(args):
299
+ """初始化MuseV模型"""
300
+ try:
301
+ logger.info("正在初始化MuseV模型...")
302
+
303
+ # 设置设备
304
+ device = "cuda" if torch.cuda.is_available() else "cpu"
305
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
306
+
307
+ logger.info(f"使用设备: {device}")
308
+
309
+ # 尝试导入真实的MuseV组件
310
+ try:
311
+ from musev.pipelines.pipeline_controlnet_predictor import DiffusersPipelinePredictor
312
+ from mmcm.utils.load_util import load_pyhon_obj
313
+ from musev.models.unet_loader import load_unet_by_name
314
+ from musev.models.referencenet_loader import load_referencenet_by_name
315
+ from musev.models.ip_adapter_loader import load_vision_clip_encoder_by_name, load_ip_adapter_image_proj_by_name
316
+ from musev.models.ip_adapter_face_loader import load_ip_adapter_face_extractor_and_proj_by_name
317
+
318
+ # 配置模型参数
319
+ config = {
320
+ "device": device,
321
+ "dtype": torch_dtype,
322
+ "enable_xformers_memory_efficient_attention": True if device == "cuda" else False,
323
+ "vae_model_path": args.vae_model_path if hasattr(args, 'vae_model_path') else None,
324
+ }
325
+
326
+ # 初始化预测器
327
+ predictor = DiffusersPipelinePredictor(config)
328
+
329
+ # 尝试加载模型组件
330
+ try:
331
+ # 加载Unet模型(运动模型)
332
+ if hasattr(args, 'unet_model_name') and args.unet_model_name:
333
+ unet = load_unet_by_name(args.unet_model_name, config)
334
+ predictor.unet = unet
335
+ logger.info(f"加载Unet模型: {args.unet_model_name}")
336
+
337
+ # 加载参考网络
338
+ if hasattr(args, 'referencenet_model_name') and args.referencenet_model_name:
339
+ referencenet = load_referencenet_by_name(args.referencenet_model_name, config)
340
+ predictor.referencenet = referencenet
341
+ logger.info(f"加载参考网络: {args.referencenet_model_name}")
342
+
343
+ # 加载IP适配器
344
+ if hasattr(args, 'ip_adapter_model_name') and args.ip_adapter_model_name:
345
+ vision_encoder = load_vision_clip_encoder_by_name(args.ip_adapter_model_name, config)
346
+ image_proj = load_ip_adapter_image_proj_by_name(args.ip_adapter_model_name, config)
347
+ predictor.vision_encoder = vision_encoder
348
+ predictor.image_proj = image_proj
349
+ logger.info(f"加载IP适配器: {args.ip_adapter_model_name}")
350
+
351
+ # 加载人脸模型(这是生成说话视频的关键组件)
352
+ if hasattr(args, 'enable_facein') and args.enable_facein:
353
+ face_extractor, face_proj = load_ip_adapter_face_extractor_and_proj_by_name("face_in", config)
354
+ predictor.face_extractor = face_extractor
355
+ predictor.face_proj = face_proj
356
+ logger.info("加载人脸特征提取器")
357
+
358
+ logger.info("MuseV模型初始化成功")
359
+ return predictor, device
360
+ except Exception as model_load_error:
361
+ logger.warning(f"加载模型组件时出错,将使用简化版本: {model_load_error}")
362
+
363
+ # 尝试创建简化版预测器
364
+ class SimplifiedMuseVPredictor:
365
+ def __init__(self):
366
+ self.device = device
367
+
368
+ def run_pipe_text2video(self, **kwargs):
369
+ logger.info("使用简化版MuseV预测器")
370
+ # 这里应该是调用真实的MuseV功能
371
+ # 由于可能缺少完整模型,我们创建一个基于输入图像的模拟视频
372
+ video_length = kwargs.get('video_length', 24)
373
+ height = kwargs.get('height', 512)
374
+ width = kwargs.get('width', 512)
375
+ condition_images = kwargs.get('condition_images', None)
376
+
377
+ # 创建一个简单的模拟视频
378
+ video = np.zeros((1, 3, video_length, height, width), dtype=np.uint8)
379
+
380
+ # 如果有条件图像,尝试使用它作为基础
381
+ if condition_images is not None:
382
+ try:
383
+ from PIL import Image
384
+ import numpy as np
385
+ img = Image.open(condition_images).resize((width, height)).convert("RGB")
386
+ img_np = np.array(img)
387
+
388
+ # 将静态图像转换为简单的视频(轻微缩放/移动)
389
+ for t in range(video_length):
390
+ # 简单的缩放动画
391
+ scale = 1.0 + 0.1 * np.sin(t * 0.2)
392
+ new_size = (int(width * scale), int(height * scale))
393
+ resized_img = cv2.resize(img_np, new_size)
394
+
395
+ # 居中放置
396
+ h_start = (resized_img.shape[0] - height) // 2
397
+ w_start = (resized_img.shape[1] - width) // 2
398
+ frame = resized_img[h_start:h_start+height, w_start:w_start+width]
399
+
400
+ video[0, :, t, :, :] = frame.transpose(2, 0, 1)
401
+ except Exception as e:
402
+ logger.error(f"处理条件图像时出错: {e}")
403
+ # 使用彩色渐变作为备选
404
+ for t in range(video_length):
405
+ r = int(255 * (t / video_length))
406
+ g = int(255 * 0.5)
407
+ b = int(255 * ((video_length - t) / video_length))
408
+ video[0, 0, t, :, :] = r # R channel
409
+ video[0, 1, t, :, :] = g # G channel
410
+ video[0, 2, t, :, :] = b # B channel
411
+
412
+ return video
413
+
414
+ return SimplifiedMuseVPredictor(), device
415
+
416
+ except ImportError as import_error:
417
+ logger.warning(f"无法导入MuseV组件,使用模拟预测器: {import_error}")
418
+
419
+ # 返回模拟预测器
420
+ class MockPredictor:
421
+ def run_pipe_text2video(self, **kwargs):
422
+ video_length = kwargs.get('video_length', 24)
423
+ height = kwargs.get('height', 512)
424
+ width = kwargs.get('width', 512)
425
+ condition_images = kwargs.get('condition_images', None)
426
+
427
+ # 创建模拟视频
428
+ video = np.zeros((1, 3, video_length, height, width), dtype=np.uint8)
429
+
430
+ # 如果有条件图像,尝试显示它
431
+ if condition_images is not None:
432
+ try:
433
+ from PIL import Image
434
+ img = Image.open(condition_images).resize((width, height)).convert("RGB")
435
+ img_np = np.array(img)
436
+
437
+ # 重复显示图像
438
+ for t in range(video_length):
439
+ video[0, :, t, :, :] = img_np.transpose(2, 0, 1)
440
+ except:
441
+ # 使用彩色块
442
+ colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0)]
443
+ for t in range(video_length):
444
+ r, g, b = colors[t % len(colors)]
445
+ video[0, 0, t, :, :] = r
446
+ video[0, 1, t, :, :] = g
447
+ video[0, 2, t, :, :] = b
448
+ else:
449
+ # 没有图像,使用彩色渐变
450
+ for t in range(video_length):
451
+ r = int(255 * (t / video_length))
452
+ g = int(255 * ((video_length - t) / video_length))
453
+ b = int(255 * 0.5)
454
+ video[0, 0, t, :, :] = r
455
+ video[0, 1, t, :, :] = g
456
+ video[0, 2, t, :, :] = b
457
+
458
+ return video
459
+
460
+ return MockPredictor(), device
461
+
462
+ except Exception as e:
463
+ logger.error(f"模型初始化失败: {e}")
464
+ # 返回最后的备用预测器
465
+ class FallbackMockPredictor:
466
+ def run_pipe_text2video(self, **kwargs):
467
+ video_length = kwargs.get('video_length', 24)
468
+ height = kwargs.get('height', 512)
469
+ width = kwargs.get('width', 512)
470
+
471
+ # 创建简单的错误指示视频
472
+ video = np.zeros((1, 3, video_length, height, width), dtype=np.uint8)
473
+ # 红色表示错误
474
+ for t in range(video_length):
475
+ video[0, 0, t, :, :] = 255 # R channel
476
+ video[0, 1, t, :, :] = 0 # G channel
477
+ video[0, 2, t, :, :] = 0 # B channel
478
+
479
+ return video
480
+
481
+ return FallbackMockPredictor(), device
482
+
483
+ # 最终的备用返回
484
+ return FallbackMockPredictor(), "cpu"
485
+
486
+ # 视频生成函数
487
+ def generate_video(
488
+ prompt,
489
+ image,
490
+ seed=42,
491
+ fps=8,
492
+ width=512,
493
+ height=512,
494
+ video_length=16,
495
+ img_edge_ratio=1.0,
496
+ progress=gr.Progress(track_tqdm=True)
497
+ ):
498
+ """生成视频的主要函数 - 支持上传照片生成说话视频"""
499
+ try:
500
+ progress(0, desc="开始视频生成...")
501
+
502
+ # 初始化参数
503
+ args = get_default_args()
504
+
505
+ # 为生成说话视频特别配置
506
+ args.enable_facein = True # 启用人脸特征提取
507
+ args.enable_ip_adapter = True # 启用IP适配器
508
+ args.enable_referencenet = True # 启用参考网络
509
+ args.use_condition_image = True # 使用条件图像
510
+ args.fix_condition_images = True # 固定条件图像(保持面部特征)
511
+ args.guidance_scale = 3.5 # 文本引导尺度
512
+ args.video_guidance_scale = 1.5 # 视频引导尺度
513
+ args.strength = 0.6 # 重绘强度(值越低越接近原图)
514
+ args.img_weight = 0.5 # 图像权重
515
+ args.motion_speed = 8.0 # 运动速度
516
+ args.need_img_based_video_noise = True # 基于图像的视频噪声
517
+
518
+ # 初始化模型
519
+ progress(0.1, desc="初始化MuseV模型...")
520
+ sd_predictor, device = init_model(args)
521
+
522
+ # 保存上传的图像
523
+ image_cuid = generate_cuid()
524
+ image_path = os.path.join(CACHE_PATH, f"{image_cuid}.jpg")
525
+ condition_images = None
526
+
527
+ if image is not None:
528
+ try:
529
+ # 确保图像格式正确
530
+ if len(image.shape) == 3 and image.shape[2] == 3:
531
+ # 已经是RGB格式
532
+ image_pil = Image.fromarray(image)
533
+ elif len(image.shape) == 2:
534
+ # 灰度图转RGB
535
+ image_pil = Image.fromarray(image).convert("RGB")
536
+ else:
537
+ # 其他格式尝试转换
538
+ image_pil = Image.fromarray(image)
539
+
540
+ image_pil.save(image_path)
541
+ condition_images = image_path
542
+ logger.info(f"已保存上传的图像: {image_path}")
543
+ except Exception as e:
544
+ logger.error(f"保存图像失败: {e}")
545
+
546
+ # 如果没有上传图像,提示用户
547
+ if condition_images is None:
548
+ logger.warning("未上传图像,将使用纯文本生成视频")
549
+
550
+ progress(0.3, desc="处理输入数据...")
551
+
552
+ # 设置种子
553
+ try:
554
+ if 'set_all_seed' in globals():
555
+ cpu_generator, gpu_generator = set_all_seed(int(seed))
556
+ logger.info(f"使用种子: {seed}")
557
+ else:
558
+ cpu_generator, gpu_generator = None, None
559
+ logger.warning("set_all_seed函数不可用,使用随机种子")
560
+ except Exception as e:
561
+ cpu_generator, gpu_generator = None, None
562
+ logger.error(f"设置种子失败: {e}")
563
+
564
+ # 准备提示词
565
+ if not prompt:
566
+ prompt = "一个人在说话" # 默认提示词,适合生成说话视频
567
+
568
+ # 准备负面提示词
569
+ negative_prompt = "模糊, 低质量, 变形, 扭曲, 像素化, 噪点, 不良照明, 不自然表情"
570
+
571
+ progress(0.5, desc="正在生成视频...")
572
+
573
+ # 运行视频生成
574
+ try:
575
+ # 调用MuseV的文本到视频管道
576
+ out_videos = sd_predictor.run_pipe_text2video(
577
+ video_length=video_length,
578
+ prompt=prompt,
579
+ width=width,
580
+ height=height,
581
+ generator=gpu_generator if gpu_generator else None,
582
+ noise_type=args.noise_type,
583
+ negative_prompt=negative_prompt,
584
+ video_negative_prompt=negative_prompt,
585
+ max_batch_num=args.n_batch,
586
+ strength=args.strength,
587
+ need_img_based_video_noise=args.need_img_based_video_noise,
588
+ video_num_inference_steps=args.video_num_inference_steps,
589
+ condition_images=condition_images, # 使用上传的图像作为条件
590
+ fix_condition_images=args.fix_condition_images, # 保持面部特征不变
591
+ video_guidance_scale=args.video_guidance_scale,
592
+ guidance_scale=args.guidance_scale,
593
+ num_inference_steps=args.num_inference_steps,
594
+ redraw_condition_image=args.redraw_condition_image,
595
+ img_weight=args.img_weight, # 增加图像权重
596
+ w_ind_noise=args.w_ind_noise,
597
+ n_vision_condition=args.n_vision_condition,
598
+ motion_speed=args.motion_speed, # 控制视频运动速度
599
+ need_hist_match=args.need_hist_match,
600
+ context_frames=args.context_frames,
601
+ context_stride=args.context_stride,
602
+ context_overlap=args.context_overlap,
603
+ )
604
+ except Exception as e:
605
+ logger.error(f"视频生成错误: {e}")
606
+ # 使用模拟视频作为备份
607
+ progress(0.7, desc="使用备份生成器...")
608
+ out_videos = np.zeros((1, 3, video_length, height, width), dtype=np.uint8)
609
+
610
+ # 如果有条件图像,尝试基于图像生成简单动画
611
+ if condition_images is not None:
612
+ try:
613
+ img = Image.open(condition_images).resize((width, height)).convert("RGB")
614
+ img_np = np.array(img)
615
+
616
+ # 创建一个简单的缩放/淡入动画
617
+ for t in range(video_length):
618
+ # 计算缩放比例
619
+ scale = 1.0 - 0.1 * np.cos(t * 0.3)
620
+ new_size = (int(width * scale), int(height * scale))
621
+ resized_img = cv2.resize(img_np, new_size)
622
+
623
+ # 居中放置
624
+ h_start = (height - new_size[1]) // 2
625
+ w_start = (width - new_size[0]) // 2
626
+ frame = np.zeros((height, width, 3), dtype=np.uint8)
627
+ frame[h_start:h_start+new_size[1], w_start:w_start+new_size[0]] = resized_img
628
+
629
+ video_frame = frame.transpose(2, 0, 1)
630
+ out_videos[0, :, t, :, :] = video_frame
631
+ except Exception as inner_e:
632
+ logger.error(f"创建基于图像的备份视频失败: {inner_e}")
633
+ # 使用彩色渐变作为最后的备选
634
+ colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0)]
635
+ for t in range(video_length):
636
+ r, g, b = colors[t % len(colors)]
637
+ out_videos[0, 0, t, :, :] = r # R channel
638
+ out_videos[0, 1, t, :, :] = g # G channel
639
+ out_videos[0, 2, t, :, :] = b # B channel
640
+ else:
641
+ # 没有图像,使用彩色渐变
642
+ for t in range(video_length):
643
+ r = int(255 * (t / video_length))
644
+ g = int(255 * 0.5)
645
+ b = int(255 * ((video_length - t) / video_length))
646
+ out_videos[0, 0, t, :, :] = r
647
+ out_videos[0, 1, t, :, :] = g
648
+ out_videos[0, 2, t, :, :] = b
649
+
650
+ progress(0.8, desc="正在保存视频...")
651
+
652
+ # 保存视频
653
+ save_file_name = f"video_{image_cuid}_{generate_cuid()}"
654
+ try:
655
+ if 'clean_str_for_save' in globals():
656
+ save_file_name = clean_str_for_save(save_file_name)
657
+ except:
658
+ # 如果clean_str_for_save不可用,使用原始文件名
659
+ pass
660
+
661
+ # 确保输出目录存在
662
+ os.makedirs(OUTPUT_DIR, exist_ok=True)
663
+ output_path = os.path.join(OUTPUT_DIR, f"{save_file_name}.{args.save_filetype}")
664
+
665
+ try:
666
+ # 使用MuseV提供的视频保存函数
667
+ if 'save_videos_grid_with_opencv' in globals():
668
+ save_videos_grid_with_opencv(
669
+ out_videos,
670
+ output_path,
671
+ fps=fps,
672
+ tensor_order="b c t h w",
673
+ save_filetype=args.save_filetype,
674
+ )
675
+ else:
676
+ # 备用的视频保存逻辑
677
+ fourcc = cv2.VideoWriter_fourcc(*'mp4v')
678
+ out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
679
+
680
+ # 转换视频格式
681
+ if out_videos.shape[1] == 3 and out_videos.shape[2] == video_length:
682
+ # b c t h w -> b t h w c
683
+ video_data = out_videos.transpose(0, 2, 3, 4, 1)
684
+ video_frames = video_data[0] # 取第一个视频
685
+
686
+ for frame in video_frames:
687
+ # 确保像素值在0-255范围内
688
+ frame_uint8 = np.clip(frame, 0, 255).astype(np.uint8)
689
+ # 转换RGB到BGR
690
+ frame_bgr = cv2.cvtColor(frame_uint8, cv2.COLOR_RGB2BGR)
691
+ out.write(frame_bgr)
692
+
693
+ out.release()
694
+ logger.info(f"视频已保存到: {output_path}")
695
+ except Exception as e:
696
+ logger.error(f"保存视频失���: {e}")
697
+ # 作为最后的备份,创建一个简单的视频
698
+ output_path = os.path.join(OUTPUT_DIR, f"fallback_video_{generate_cuid()}.mp4")
699
+ try:
700
+ fourcc = cv2.VideoWriter_fourcc(*'mp4v')
701
+ out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
702
+
703
+ # 创建一个简单的彩色渐变视频
704
+ for t in range(video_length):
705
+ frame = np.zeros((height, width, 3), dtype=np.uint8)
706
+ # 蓝色渐变
707
+ frame[:, :, 0] = (t * 255 // video_length) # B
708
+ frame[:, :, 1] = 100 # G
709
+ frame[:, :, 2] = 100 # R
710
+ out.write(frame)
711
+
712
+ out.release()
713
+ logger.info(f"已创建备用视频: {output_path}")
714
+ except Exception as inner_e:
715
+ logger.error(f"创建备用视频失败: {inner_e}")
716
+ return f"错误: 无法保存视频 ({str(e)})"
717
+
718
+ progress(1.0, desc="视频生成完成!")
719
+ return output_path
720
+ except Exception as e:
721
+ logger.error(f"视频生成失败: {e}")
722
+ # 提供一个简单的错误视频作为最后的备用方案
723
+ try:
724
+ error_video_path = os.path.join(OUTPUT_DIR, f"error_video_{generate_cuid()}.mp4")
725
+ fourcc = cv2.VideoWriter_fourcc(*'mp4v')
726
+ out = cv2.VideoWriter(error_video_path, fourcc, 1, (256, 256))
727
+ error_frame = np.zeros((256, 256, 3), dtype=np.uint8)
728
+ error_frame[:, :, 0] = 0 # B
729
+ error_frame[:, :, 1] = 0 # G
730
+ error_frame[:, :, 2] = 255 # R (红色表示错误)
731
+ for _ in range(5): # 5帧红色画面
732
+ out.write(error_frame)
733
+ out.release()
734
+ return error_video_path
735
+ except:
736
+ return f"错误: {str(e)}"
737
+
738
+
739
+
740
+ # 创建Gradio界面
741
+ def create_interface():
742
+ """创建支持照片说话视频生成的Gradio界面"""
743
+ with gr.Blocks(title="MuseV照片说话视频生成工具", theme=gr.themes.Soft()) as interface:
744
+ gr.Markdown("""
745
+ # MuseV照片说话视频生成工具
746
+ 上传照片,让照片中的人物开口说话!
747
+
748
+ ## 使用方法
749
+ 1. 输入描述你想在视频中看到的内容的提示词(特别是关于说话或表情的描述)
750
+ 2. **上传人物照片**(建议使用清晰的正面人像照片)
751
+ 3. 根据需要调整高级参数
752
+ 4. 点击"生成说话视频"按钮
753
+ 5. 等待视频生成完成后即可播放和下载
754
+
755
+ ## 提示
756
+ - 使用清晰的正面人物照片可获得最佳效果
757
+ - 提示词中可以包含如"说话"、"微笑"、"表情自然"等描述
758
+ - 视频生成时间取决于您的电脑性能,通常需要几十秒到几分钟
759
+ """)
760
+
761
+ with gr.Row():
762
+ with gr.Column(scale=1):
763
+ prompt = gr.Textbox(
764
+ label="提示词",
765
+ placeholder="描述照片中的人物在做什么,例如:'一个人在说话','微笑着打招呼'...",
766
+ lines=3,
767
+ value="一个人在说话,表情自然"
768
+ )
769
+
770
+ image = gr.Image(label="人物照片(推荐上传)", type="numpy", height=240)
771
+
772
+ with gr.Accordion("高级参数", open=False):
773
+ seed = gr.Slider(label="随机种子", minimum=0, maximum=1000000, value=42, step=1)
774
+ fps = gr.Slider(label="帧率", minimum=1, maximum=30, value=8, step=1)
775
+ width = gr.Slider(label="视频宽度", minimum=256, maximum=1024, value=512, step=64)
776
+ height = gr.Slider(label="视频高度", minimum=256, maximum=1024, value=512, step=64)
777
+ video_length = gr.Slider(label="视频长度(帧数)", minimum=8, maximum=64, value=16, step=4)
778
+ img_edge_ratio = gr.Slider(label="图像边缘比例", minimum=0.5, maximum=2.0, value=1.0, step=0.1)
779
+
780
+ generate_btn = gr.Button("生成说话视频", variant="primary")
781
+
782
+ with gr.Column(scale=1):
783
+ output_video = gr.Video(label="生成的说话视频", height=240)
784
+
785
+ # 设置生成按钮的点击事件
786
+ generate_btn.click(
787
+ fn=generate_video,
788
+ inputs=[prompt, image, seed, fps, width, height, video_length, img_edge_ratio],
789
+ outputs=output_video,
790
+ show_progress=True
791
+ )
792
+
793
+ # 示例提示词
794
+ gr.Markdown("""
795
+ ## 推荐提示词示例
796
+ - "一个人在说话,表情自然,嘴巴动起来"
797
+ - "微笑着说话,眼神温和"
798
+ - "高兴地打招呼,表情生动"
799
+ - "平静地讲述,面部表情自然"
800
+
801
+ ## 高级技巧
802
+ - 可��指定人物特征:"一个戴着眼镜的女人在说话"
803
+ - 可以添加场景描述:"在公园里,一个孩子开心地说话"
804
+ - 可以描述表情:"惊讶地说话,眉毛微扬"
805
+ """)
806
+
807
+ return interface
808
+
809
+ # 主函数
810
+ if __name__ == "__main__":
811
+ # 创建并启动Gradio界面
812
+ interface = create_interface()
813
+
814
+ # 启动界面(在Hugging Face Space中,share应该设置为False)
815
+ interface.launch(share=False)
packages.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ ffmpeg
2
+ libgl1
requirements.txt ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 深度学习框架(核心依赖)
2
+ torch>=2.0.0
3
+ torchvision>=0.15.0
4
+
5
+ # 扩散模型工具库(视频生成基础)
6
+ diffusers>=0.24.0
7
+ transformers>=4.30.0
8
+ accelerate>=0.21.0
9
+
10
+ # 张量与数值计算
11
+ einops>=0.6.1
12
+ numpy>=1.24.0
13
+ scipy>=1.10.0
14
+
15
+ # 配置文件处理
16
+ omegaconf>=2.3.0
17
+
18
+ # 图像/视频处理(补充 Python 层依赖)
19
+ opencv-python>=4.8.0
20
+ pillow>=9.5.0
21
+
22
+ # 工具类依赖
23
+ tqdm>=4.65.0
24
+ huggingface-hub>=0.16.0 # 加载 Hugging Face 模型/数据
25
+ filelock>=3.12.0 # 避免文件冲突
26
+ gradio==4.25.0 # 与space.yaml中指定的版本保持一致