aha2023 commited on
Commit
b4a3c28
·
1 Parent(s): e08d0c9

Add SDMatte model files and scripts

Browse files
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SDMatte_plus-fp16-and-bf16
2
+
3
+ This repository provides optimized, inference-only versions of the original [SDMatte model by LongfeiHuang](https://huggingface.co/LongfeiHuang/SDMatte).
4
+
5
+ The models here have been specifically processed to be lightweight and efficient for deployment and use in applications like ComfyUI, without compromising the quality of the matting results.
6
+
7
+ ### What is this?
8
+
9
+ This repository contains inference-only weights for the **SDMatte** model. The original checkpoint file (`.pth`) was a full training checkpoint, which included not only the model weights but also ~6.5 GB of trainer states (like optimizer states). These trainer states are crucial for resuming training but are unnecessary for performing inference (i.e., actually using the model for matting).
10
+
11
+ ### Optimizations Performed
12
+
13
+ 1. **Removal of Trainer States**: The largest optimization was stripping the `trainer` key from the original checkpoint. This removes all unnecessary data related to the training process, significantly reducing the file size without affecting the model's output.
14
+
15
+ 2. **16-bit Precision Quantization**: The model weights have been converted from their original 32-bit floating-point precision (FP32) to 16-bit precision. We provide two popular formats:
16
+ * **FP16 (half-precision)**: Offers a great balance of speed, reduced memory usage, and high quality. It is supported by most modern NVIDIA GPUs (10-series and newer).
17
+ * **BF16 (bfloat16)**: Offers a dynamic range identical to FP32, making it more resilient to overflow/underflow issues. It provides the best performance on newer NVIDIA GPUs (RTX 30-series and newer).
18
+
19
+ ### DIY Quantization with `convert_precision.py`
20
+
21
+ This repository also includes the Python script, `convert_precision.py`, which was used to create these fp16 and bf16 models. You can use this script to convert the original FP32 checkpoint yourself.
22
+
23
+ 1. Place the original FP32 `SDMatte_plus.pth` file in the same folder as the script.
24
+ 2. Open the `convert_precision.py` file with a text editor.
25
+ 3. Modify the `TARGET_PRECISION` variable at the top to either `'fp16'` or `'bf16'`.
26
+ 4. Run the script from your terminal: `python convert_precision.py`.
27
+
28
+ ### Acknowledgements
29
+
30
+ Huge thanks to **LongfeiHuang** for creating and open-sourcing the original SDMatte model. This repository is merely an optimized packaging of their incredible work. Please visit the original repository for more details on the model's architecture and training.
31
+
32
+ * **Original Model**: [https://huggingface.co/LongfeiHuang/SDMatte](https://huggingface.co/LongfeiHuang/SDMatte)
33
+
34
+ ---
35
+
36
+ ## 中文版本
37
+
38
+ ### 这是什么?
39
+
40
+ 本仓库提供原始 [SDMatte 模型(作者 LongfeiHuang)](https://huggingface.co/LongfeiHuang/SDMatte) 的优化版,仅用于推理。
41
+
42
+ 这里的模型都经过了专门处理,旨在使其轻量化且高效,以方便在 ComfyUI 等应用中部署和使用,同时不影响抠图效果的质量。
43
+
44
+ ### 执行的优化
45
+
46
+ 1. **移除 Trainer 状态**:最大的优化是剥离了原始检查点中的 `trainer` 键。这移除了所有与训练过程相关的非必要数据(如优化器状态),在完全不影响模型输出质量的前提下,极大地减小了文件体积。原始文件中约 6.5 GB 的数据都属于此类。
47
+
48
+ 2. **16 位精度量化**:模型权重已从原始的 32 位浮点精度 (FP32) 转换为 16 位精度。我们提供了两种主流格式:
49
+ * **FP16 (半精度)**:在速度、显存占用和高质量之间取得了绝佳的平衡。它被大多数现代 NVIDIA GPU(10 系及更新)所支持。
50
+ * **BF16 (bfloat16)**:拥有与 FP32 相同的动态范围,使其在处理数据溢出/下溢问题时更具弹性。它在较新的 NVIDIA GPU(RTX 30 系及更新)上能提供最佳性能。
51
+
52
+ ### 使用 `convert_precision.py` 自行量化
53
+
54
+ 本仓库同样包含了用于创建这些 fp16 和 bf16 模型的 Python 脚本 `convert_precision.py`。您可以使用此脚本,自行将原始的 FP32 检查点进行转换。
55
+
56
+ 1. 将原始的 FP32 `SDMatte_plus.pth` 文件放置于脚本所在的同一个文件夹内。
57
+ 2. 使用文本编辑器打开 `convert_precision.py` 文件。
58
+ 3. 修改文件顶部的 `TARGET_PRECISION` 变量,将其设置为 `'fp16'` 或 `'bf16'`。
59
+ 4. 在您的终端中运行脚本:`python convert_precision.py`。
60
+
61
+ ### 致谢
62
+
63
+ 非常感谢 **LongfeiHuang** 创建并开源了卓越的 SDMatte 模型。本仓库的工作仅仅是对其出色成果的优化和打包。有关模型架构和训练的更多详细信息,请访问原始仓库。
64
+
65
+ * **原始模型仓库**:[https://huggingface.co/LongfeiHuang/SDMatte](https://huggingface.co/LongfeiHuang/SDMatte)
SDMatte_plus_bf16_inference.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f04e5cb6a50af6516cea8908ed2de77d5ae2753b33b6e4ea0cbb61146a27f55
3
+ size 2594696326
SDMatte_plus_fp16_inference.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f407176bc081bd58636c472fa150891559d9bb24ae00399e9d86254c4624572
3
+ size 2594696326
convert_precision.py ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import os
3
+
4
+ # --- 配置 ---
5
+ # 1. 设置目标精度: 'fp16' 或 'bf16'
6
+ TARGET_PRECISION = 'bf16'
7
+
8
+ # 2. 设置原始32位训练检查点文件的路径
9
+ fp32_checkpoint_path = r"E:\comfyui\ComfyUI-aki-v1.3\models\SDMatte\SDMatte_plus.pth"
10
+
11
+ # --------------------------------------------------------------------
12
+
13
+ # 自动生成输出文件名
14
+ output_filename = fp32_checkpoint_path.replace('.pth', f'_{TARGET_PRECISION}_inference.pth')
15
+
16
+ if not os.path.exists(fp32_checkpoint_path):
17
+ print(f"[错误] 文件不存在: {fp32_checkpoint_path}")
18
+ else:
19
+ try:
20
+ print(f"--- 开始处理训练检查点: {fp32_checkpoint_path} ---")
21
+ full_checkpoint = torch.load(fp32_checkpoint_path, map_location="cpu", weights_only=False)
22
+
23
+ # 检查 'model' 键是否存在,这是包含我们所需权重的部分
24
+ if 'model' in full_checkpoint:
25
+ # 明确提取出模型的 state_dict
26
+ state_dict = full_checkpoint['model']
27
+ print("成功提取到 'model' 键中的权重字典。")
28
+ else:
29
+ # 如果没有 'model' 键,则假定整个文件就是 state_dict
30
+ print("[警告] 未在顶层找到 'model' 键,将尝试转换整个文件。")
31
+ state_dict = full_checkpoint
32
+
33
+ print(f"开始将权重转换为 {TARGET_PRECISION} ...")
34
+
35
+ target_dtype = torch.float16 if TARGET_PRECISION == 'fp16' else torch.bfloat16
36
+
37
+ # 遍历权重字典中的每一项并进行转换
38
+ for key in state_dict:
39
+ if isinstance(state_dict[key], torch.Tensor) and state_dict[key].is_floating_point():
40
+ state_dict[key] = state_dict[key].to(target_dtype)
41
+
42
+ print(f"正在保存纯推理模型到: {output_filename} ...")
43
+ # 直接保存处理后的 state_dict,不包含任何训练相关的附加信息
44
+ torch.save(state_dict, output_filename)
45
+
46
+ # 打印大小对比
47
+ original_size_gb = os.path.getsize(fp32_checkpoint_path) / (1024**3)
48
+ final_size_gb = os.path.getsize(output_filename) / (1024**3)
49
+
50
+ print("\n--- 转换成功 ---")
51
+ print(f"原始训练检查点大小: {original_size_gb:.2f} GB")
52
+ print(f"生成的纯推理模型大小 ({TARGET_PRECISION.upper()}): {final_size_gb:.2f} GB")
53
+ print("说明: 新文件只包含用于推理的核心模型权重,已移除训练相关的优化器状态。")
54
+
55
+ except Exception as e:
56
+ print(f"\n[错误] 处理过程中发生错误: {e}")