File size: 4,312 Bytes
c4950bb
 
 
 
 
 
 
 
 
 
 
 
 
 
91ffb8e
c4950bb
 
 
322d107
 
 
 
c4950bb
 
 
 
 
322d107
c4950bb
322d107
c4950bb
 
 
322d107
 
c4950bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
license: mit
language:
- en
base_model:
- sd2-community/stable-diffusion-2-1
pipeline_tag: image-to-image
tags:
- material-decomposition
- diffusion
---

# Model Overview

This repository hosts the pretrained parameters for the SuperMat project, as described in [*"SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates"* (ICCV 2025)](https://arxiv.org/abs/2411.17515)

| Model | Description |
| --- | --- |
| supermat.pth | Base SuperMat model for material decomposition
| supermat_mv.pth | Multi-view version of SuperMat processing six orthogonal views
| uv_refine_bc.pth | UV refinement network for albedo materials
| uv_refine_rm.pth | UV refinement network for roughness & metallic materials

All models are built upon the base model `stabilityai/stable-diffusion-2-1`.
Note: The official `stabilityai/stable-diffusion-2-1` model has been removed. You may need to obtain the base model parameters through alternative sources, such as `sd2-community/stable-diffusion-2-1`.

# Model Details
#### SuperMat (supermat.pth)
The core model for material decomposition. It takes RGBA images as input and decomposes materials from the target object.
#### SuperMat Multi-View (supermat_mv.pth)
An extended version that processes six orthogonal views simultaneously. This model leverages multi-view consistency for improved material estimation. For each view, the camera-to-world (c2w) matrix is provided as camera embeddings.
#### UV Refinement Networks
Two specialized networks for refining UV maps:
- uv_refine_bc.pth: Refines the UV map for albedo materials
- uv_refine_rm.pth: Refines the UV map for roughness & metallic materials

# Download & Usage
Download the desired model(s) from this repository and place them in the checkpoints folder:

```
checkpoints/
β”œβ”€β”€ supermat.pth
β”œβ”€β”€ supermat_mv.pth
β”œβ”€β”€ uv_refine_bc.pth
└── uv_refine_rm.pth
```
The models are independent of each other, so you only need to download those required for your specific inference task.

# Input Requirements
#### Image Format
- SuperMat models expect RGBA images where only the target object appears as foreground, with alpha values set to `0` for all other regions
- During inference, the input image is alpha-composited with a gray background `(0.5, 0.5, 0.5)`
#### Resolution Preferences
- SuperMat models: `512Γ—512` resolution (recommended)
- UV refinement networks: `1024Γ—1024` resolution (recommended)
#### Multi-View Specific Requirements
For the multi-view model:
- All inputs for a single case should be organized in one folder
- Input images must follow the naming convention as shown in `examples/bag_rendered_6views`
- Camera information is stored in `meta.json` (refer to the example for the required format with c2w matrices)

# Quick Inference Examples
#### SuperMat Single-Image
```
python inference_supermat.py \
  --input examples/ring_rendered_2views \
  --output-dir outputs \
  --checkpoint checkpoints/supermat.pth \
  --base-model sd2-community/stable-diffusion-2-1 \
  --device cuda:0 \
  --image-size 512
```

#### SuperMat Multi-View
```
python inference_supermat_mv.py \
  --input examples/bag_rendered_6views \
  --output-dir outputs_mv \
  --checkpoint checkpoints/supermat_mv.pth \
  --base-model sd2-community/stable-diffusion-2-1 \
  --device cuda:0 \
  --image-size 512 \
  --num_views 6 \
  --use-camera-embeds
```

#### UV Refinement (Albedo)
```
python inference_uv_refine.py \
  --input-uv examples/axe_uv/uv_bc.png \
  --input-uv-position examples/axe_uv/uv_position.png \
  --input-uv-mask examples/axe_uv/uv_mask.png \
  --output-dir outputs_uv_bc \
  --checkpoint checkpoints/uv_refine_bc.pth \
  --base-model sd2-community/stable-diffusion-2-1 \
  --device cuda:0 \
  --image-size 1024
```

For complete usage instructions, please refer to the [main repository](https://github.com/hyj542682306/SuperMat).

# Citation
If you find these models useful in your research, please cite:
```
@inproceedings{hong2025supermat,
  title={Supermat: Physically consistent pbr material estimation at interactive rates},
  author={Hong, Yijia and Guo, Yuan-Chen and Yi, Ran and Chen, Yulong and Cao, Yan-Pei and Ma, Lizhuang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={25083--25093},
  year={2025}
}
```