depth-anything-3

Running on Zero

App Files Files Community

depth-anything-3 / SPACES_GPU_BEST_PRACTICES.md

linhaotong

update

e59f7b7 about 1 month ago

preview code

raw

history blame

14.3 kB

	# 🎯 Spaces GPU 最佳实践指南

	## 📚 spaces.GPU 工作原理

	### 架构概览

	```
	┌─────────────────────────────────────────────────────────┐
	│ 主进程 (Main Process) │
	│ - CPU 环境 │
	│ - ❌ 不能初始化 CUDA │
	│ - ✅ 可以创建 Gradio UI │
	│ - ✅ 可以创建 ModelInference 实例（但不加载模型） │
	└─────────────────────────────────────────────────────────┘
	│
	│ 调用 @spaces.GPU 装饰的函数
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ 子进程 (GPU Worker Process) │
	│ - GPU 环境 │
	│ - ✅ 可以初始化 CUDA │
	│ - ✅ 可以加载模型到 GPU │
	│ - ✅ 运行推理 │
	│ - ✅ 全局变量缓存（每个子进程独立） │
	└─────────────────────────────────────────────────────────┘
	│
	│ pickle 序列化返回值
	│
	▼
	┌─────────────────────────────────────────────────────────┐
	│ 主进程接收返回值 │
	│ - ✅ 必须是 CPU 数据（numpy, 基本类型） │
	│ - ❌ 不能包含 CUDA 张量 │
	└─────────────────────────────────────────────────────────┘
	```

	## ✅ 最佳实践：模型加载策略

	### ❌ 错误做法 1：主进程加载模型

	```python
	# ❌ 错误：在主进程加载模型
	class EventHandlers:
	def __init__(self):
	self.model_inference = ModelInference()
	# ❌ 如果在主进程调用这个，会触发 CUDA 初始化错误
	self.model_inference.initialize_model("cuda") # 💥
	```

	为什么错误？
	- 主进程不能初始化 CUDA
	- 会立即报错：`CUDA must not be initialized in the main process`

	### ❌ 错误做法 2：实例变量存储模型

	```python
	# ❌ 错误：使用实例变量存储模型
	class ModelInference:
	def __init__(self):
	self.model = None # ❌ 实例变量

	def initialize_model(self, device):
	if self.model is None:
	self.model = load_model() # ❌ 保存在实例中
	return self.model
	```

	为什么错误？
	- 实例在主进程创建
	- 模型状态可能跨进程混乱
	- 第二次调用时状态不确定

	### ✅ 正确做法：子进程全局变量缓存

	```python
	# ✅ 正确：使用全局变量在子进程中缓存
	_MODEL_CACHE = None # 全局变量，每个子进程独立

	class ModelInference:
	def __init__(self):
	# ✅ 不存储任何状态
	pass

	def initialize_model(self, device: str = "cuda"):
	global _MODEL_CACHE

	if _MODEL_CACHE is None:
	# ✅ 在子进程中加载（第一次调用时）
	print("Loading model in GPU subprocess...")
	model_dir = os.environ.get("DA3_MODEL_DIR", "...")
	_MODEL_CACHE = DepthAnything3.from_pretrained(model_dir)
	_MODEL_CACHE = _MODEL_CACHE.to(device) # ✅ 在子进程中移动
	_MODEL_CACHE.eval()
	else:
	# ✅ 复用缓存的模型
	print("Using cached model")

	return _MODEL_CACHE # ✅ 返回模型，不存储
	```

	为什么正确？
	- ✅ 模型只在子进程加载（GPU 环境）
	- ✅ 全局变量在子进程内安全（每个子进程独立）
	- ✅ 不污染主进程
	- ✅ 可以缓存复用（避免重复加载）

	## 🎯 完整实现示例

	### 文件结构

	```
	app.py # 主入口，配置 @spaces.GPU
	depth_anything_3/app/modules/
	├── model_inference.py # 模型推理（使用全局变量）
	└── event_handlers.py # 事件处理（主进程，不加载模型）
	```

	### 1. app.py - 装饰器配置

	```python
	import spaces
	from depth_anything_3.app.modules.model_inference import ModelInference

	# ✅ 装饰 run_inference 方法
	original_run_inference = ModelInference.run_inference

	@spaces.GPU(duration=120)
	def gpu_run_inference(self, args, *kwargs):
	"""
	在 GPU 子进程中运行推理。

	这个函数会在独立的 GPU 子进程中执行，
	可以安全地初始化 CUDA 和加载模型。
	"""
	return original_run_inference(self, args, *kwargs)

	# 替换原方法
	ModelInference.run_inference = gpu_run_inference

	# ✅ 主进程：只创建应用，不加载模型
	if __name__ == "__main__":
	app = DepthAnything3App(...)
	app.launch(host="0.0.0.0", port=7860)
	```

	### 2. model_inference.py - 模型管理

	```python
	import torch
	from depth_anything_3.api import DepthAnything3

	# ========================================
	# ✅ 全局变量缓存（子进程安全）
	# ========================================
	_MODEL_CACHE = None

	class ModelInference:
	def __init__(self):
	"""
	初始化 - 不存储任何状态。

	注意：这个实例在主进程创建，但模型加载在子进程。
	"""
	pass # ✅ 无实例变量

	def initialize_model(self, device: str = "cuda"):
	"""
	在子进程中加载模型。

	使用全局变量缓存，因为：
	1. @spaces.GPU 在子进程运行
	2. 每个子进程有独立的全局命名空间
	3. 可以安全缓存，避免重复加载
	"""
	global _MODEL_CACHE

	if _MODEL_CACHE is None:
	# 第一次调用：加载模型
	model_dir = os.environ.get("DA3_MODEL_DIR", "...")
	print(f"🔄 Loading model in GPU subprocess from {model_dir}")

	_MODEL_CACHE = DepthAnything3.from_pretrained(model_dir)
	_MODEL_CACHE = _MODEL_CACHE.to(device) # ✅ 在子进程中移动
	_MODEL_CACHE.eval()

	print(f"✅ Model loaded on {device}")
	else:
	# 后续调用：复用缓存
	print("✅ Using cached model")
	# 确保在正确的设备上（防御性编程）
	_MODEL_CACHE = _MODEL_CACHE.to(device)

	return _MODEL_CACHE

	def run_inference(self, target_dir, ...):
	"""
	运行推理 - 在 GPU 子进程中执行。

	这个函数被 @spaces.GPU 装饰，会在子进程运行。
	"""
	# ✅ 在子进程中获取模型（局部变量）
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model = self.initialize_model(device) # ✅ 返回模型，不存储

	# ✅ 运行推理
	with torch.no_grad():
	prediction = model.inference(...)

	# ✅ 处理结果
	# ...

	# ✅ 关键：返回前移动所有 CUDA 张量到 CPU
	prediction = self._move_to_cpu(prediction)

	return prediction, processed_data

	def _move_to_cpu(self, prediction):
	"""移动所有 CUDA 张量到 CPU，确保 pickle 安全"""
	# ... 实现见下文
	return prediction
	```

	### 3. event_handlers.py - 主进程代码

	```python
	class EventHandlers:
	def __init__(self):
	"""
	主进程初始化 - 不加载模型。

	注意：这里创建 ModelInference 实例是安全的，
	因为它不立即加载模型。模型会在子进程中加载。
	"""
	# ✅ 可以创建实例（不加载模型）
	self.model_inference = ModelInference()

	# ❌ 不要在这里调用 initialize_model()
	# ❌ 不要在这里加载模型

	def gradio_demo(self, ...):
	"""
	Gradio 回调 - 在主进程调用。

	这个函数会调用 self.model_inference.run_inference，
	而 run_inference 被 @spaces.GPU 装饰，会在子进程运行。
	"""
	# ✅ 调用被装饰的方法（自动在子进程运行）
	result = self.model_inference.run_inference(...)
	return result
	```

	## 🔑 关键原则总结

	### ✅ DO（应该做）

	1. 主进程：只创建实例，不加载模型
	```python
	# ✅ 主进程
	model_inference = ModelInference() # 安全
	# 不调用 initialize_model()
	```

	2. 子进程：使用全局变量缓存模型
	```python
	# ✅ 子进程（@spaces.GPU 装饰的函数内）
	_MODEL_CACHE = None # 全局变量
	model = initialize_model() # 在子进程加载
	```

	3. 返回前：移动所有张量到 CPU
	```python
	# ✅ 返回前
	prediction = move_all_tensors_to_cpu(prediction)
	return prediction
	```

	4. 清理 GPU 内存
	```python
	# ✅ 推理后
	torch.cuda.empty_cache()
	```

	### ❌ DON'T（不应该做）

	1. 主进程：不要初始化 CUDA
	```python
	# ❌ 主进程
	model.to("cuda") # 💥 错误
	torch.cuda.is_available() # 💥 可能触发初始化
	```

	2. 不要用实例变量存储模型
	```python
	# ❌
	self.model = load_model() # 状态混乱
	```

	3. 不要返回 CUDA 张量
	```python
	# ❌
	return prediction # 如果包含 CUDA 张量，会报错
	```

	4. 不要在 __init__ 中加载模型
	```python
	# ❌
	def __init__(self):
	self.model = load_model() # 在主进程执行，会报错
	```

	## 📊 执行流程对比

	### ❌ 错误流程

	```
	主进程启动
	↓
	创建 ModelInference() 实例
	↓
	__init__ 中 self.model = None # ✅ 安全
	↓
	第一次调用 run_inference
	↓
	@spaces.GPU 创建子进程
	↓
	子进程：self.model = load_model() # ✅ 在子进程
	↓
	返回 prediction（包含 CUDA 张量） # ❌ 错误
	↓
	pickle 尝试在主进程重建 CUDA 张量 # 💥 报错
	```

	### ✅ 正确流程

	```
	主进程启动
	↓
	创建 ModelInference() 实例（无状态） # ✅
	↓
	第一次调用 run_inference
	↓
	@spaces.GPU 创建子进程
	↓
	子进程：_MODEL_CACHE = load_model() # ✅ 全局变量
	↓
	子进程：model = _MODEL_CACHE # ✅ 局部变量
	↓
	子进程：prediction = model.inference(...)
	↓
	子进程：prediction = move_to_cpu(prediction) # ✅
	↓
	返回 prediction（所有张量在 CPU） # ✅
	↓
	主进程：安全接收 CPU 数据 # ✅
	```

	## 🧪 验证清单

	### 主进程检查

	```python
	# ✅ 应该通过
	def test_main_process():
	# 可以创建实例
	model_inference = ModelInference()

	# 不应该有模型
	assert not hasattr(model_inference, 'model') or model_inference.model is None

	# 不应该初始化 CUDA
	# (这个测试需要在主进程运行)
	```

	### 子进程检查

	```python
	# ✅ 应该通过
	@spaces.GPU
	def test_gpu_subprocess():
	model_inference = ModelInference()

	# 可以加载模型
	model = model_inference.initialize_model("cuda")
	assert model is not None

	# 模型应该在 GPU
	# (检查模型参数设备)

	# 可以运行推理
	# ...

	# 返回前应该移到 CPU
	# ...
	```

	## 🎓 常见问题

	### Q1: 为什么不能用实例变量？

	A: 因为实例在主进程创建，如果存储模型状态，会跨进程混乱。

	```python
	# ❌ 问题
	self.model = load_model() # 状态可能混乱

	# ✅ 解决
	_MODEL_CACHE = load_model() # 每个子进程独立
	```

	### Q2: 全局变量安全吗？

	A: 是的！因为：
	- 每个子进程有独立的全局命名空间
	- 主进程不会访问子进程的全局变量
	- 不会跨进程污染

	### Q3: 模型会重复加载吗？

	A: 不会！因为：
	- 全局变量在子进程内缓存
	- 同一个子进程的多次调用会复用
	- 不同子进程各自缓存（如果需要）

	### Q4: 如何清理模型？

	A: 通常不需要手动清理，因为：
	- 子进程结束后自动清理
	- 如果需要，可以在子进程中：
	```python
	global _MODEL_CACHE
	_MODEL_CACHE = None
	del model
	torch.cuda.empty_cache()
	```

	## 📝 完整代码模板

	```python
	# ========================================
	# model_inference.py
	# ========================================
	_MODEL_CACHE = None # 全局缓存

	class ModelInference:
	def __init__(self):
	pass # 无状态

	def initialize_model(self, device="cuda"):
	global _MODEL_CACHE
	if _MODEL_CACHE is None:
	_MODEL_CACHE = load_model().to(device)
	return _MODEL_CACHE

	def run_inference(self, ...):
	model = self.initialize_model("cuda")
	prediction = model.inference(...)
	prediction = self._move_to_cpu(prediction)
	return prediction

	# ========================================
	# app.py
	# ========================================
	@spaces.GPU(duration=120)
	def gpu_run_inference(self, args, *kwargs):
	return ModelInference.run_inference(self, args, *kwargs)

	ModelInference.run_inference = gpu_run_inference
	```

	## 🎯 总结

	核心原则：

	1. ✅ 主进程 = CPU 环境，不加载模型，不初始化 CUDA
	2. ✅ 子进程 = GPU 环境，加载模型，运行推理
	3. ✅ 全局变量缓存，每个子进程独立
	4. ✅ 返回 CPU 数据，确保 pickle 安全

	遵循这些原则，你的 Spaces GPU 应用就能稳定运行！🚀