Update model card with paper, project, and code links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +44 -15
README.md CHANGED
@@ -1,34 +1,43 @@
1
  ---
 
2
  license: cc-by-nc-4.0
 
3
  tags:
4
  - normal-estimation
5
  - depth-estimation
6
  - diffusion
7
  - transparent-objects
8
- library_name: diffusers
9
- pipeline_tag: image-to-image
10
  ---
11
 
12
- # TransNormal
 
 
 
 
 
 
13
 
14
- Surface normal estimation for transparent objects using diffusion models with DINOv3 semantic guidance.
15
 
16
  ## Usage
17
 
 
 
18
  ```python
19
  from transnormal import TransNormalPipeline, create_dino_encoder
20
  import torch
21
 
22
- # Load DINO encoder (download separately)
 
23
  dino_encoder = create_dino_encoder(
24
  model_name="dinov3_vith16plus",
25
- weights_path="path/to/dinov3_vith16plus",
26
- projector_path="path/to/cross_attention_projector.pt",
27
  device="cuda",
28
  dtype=torch.bfloat16,
29
  )
30
 
31
- # Load pipeline
32
  pipe = TransNormalPipeline.from_pretrained(
33
  "longxiang-ai/transnormal-v1",
34
  dino_encoder=dino_encoder,
@@ -36,20 +45,40 @@ pipe = TransNormalPipeline.from_pretrained(
36
  )
37
  pipe = pipe.to("cuda")
38
 
39
- # Inference
40
- normal_map = pipe("image.jpg", output_type="pil")
 
 
 
 
 
 
 
41
  ```
42
 
43
  ## Citation
44
 
 
 
45
  ```bibtex
46
- @article{transnormal2025,
47
- title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
48
- author={Li, Mingwei and Fan, Hehe and Yang, Yi},
49
- year={2025}
 
 
 
 
50
  }
51
  ```
52
 
53
  ## License
54
 
55
- CC BY-NC 4.0
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: diffusers
3
  license: cc-by-nc-4.0
4
+ pipeline_tag: image-to-image
5
  tags:
6
  - normal-estimation
7
  - depth-estimation
8
  - diffusion
9
  - transparent-objects
 
 
10
  ---
11
 
12
+ # TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation
13
+
14
+ This is the official repository for the paper [TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation](https://huggingface.co/papers/2602.00839).
15
+
16
+ [**Project Page**](https://longxiang-ai.github.io/TransNormal) | [**GitHub**](https://github.com/longxiang-ai/TransNormal)
17
+
18
+ **Authors**: Mingwei Li, Hehe Fan, Yi Yang
19
 
20
+ TransNormal is a novel framework that adapts pre-trained diffusion priors for single-step normal regression for transparent objects. It addresses challenges like complex light refraction and reflection by integrating dense visual semantics from DINOv3 via a cross-attention mechanism, providing strong geometric cues for textureless transparent surfaces. The framework also employs a multi-task learning objective and wavelet-based regularization to preserve fine-grained structural details.
21
 
22
  ## Usage
23
 
24
+ To use this model, you need to set up the DINOv3 encoder separately (as it requires access approval from Meta AI).
25
+
26
  ```python
27
  from transnormal import TransNormalPipeline, create_dino_encoder
28
  import torch
29
 
30
+ # Create DINO encoder
31
+ # Note: Use bfloat16 instead of float16 to avoid potential issues with DINOv3
32
  dino_encoder = create_dino_encoder(
33
  model_name="dinov3_vith16plus",
34
+ weights_path="path/to/dinov3_vith16plus", # Path to approved DINOv3 weights
35
+ projector_path="./weights/transnormal/cross_attention_projector.pt",
36
  device="cuda",
37
  dtype=torch.bfloat16,
38
  )
39
 
40
+ # Load TransNormal pipeline
41
  pipe = TransNormalPipeline.from_pretrained(
42
  "longxiang-ai/transnormal-v1",
43
  dino_encoder=dino_encoder,
 
45
  )
46
  pipe = pipe.to("cuda")
47
 
48
+ # Run inference
49
+ normal_map = pipe(
50
+ image="path/to/image.jpg",
51
+ output_type="pil", # Choose from "np", "pil", or "pt"
52
+ )
53
+
54
+ # Save the result
55
+ from transnormal import save_normal_map
56
+ save_normal_map(normal_map, "output_normal.png")
57
  ```
58
 
59
  ## Citation
60
 
61
+ If you find our work useful, please consider citing:
62
+
63
  ```bibtex
64
+ @misc{li2026transnormal,
65
+ title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
66
+ author={Mingwei Li and Hehe Fan and Yi Yang},
67
+ year={2026},
68
+ eprint={2602.00839},
69
+ archivePrefix={arXiv},
70
+ primaryClass={cs.CV},
71
+ url={https://arxiv.org/abs/2602.00839},
72
  }
73
  ```
74
 
75
  ## License
76
 
77
+ This project is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
78
+
79
+ ## Acknowledgements
80
+
81
+ This work builds upon:
82
+ - [Lotus](https://github.com/EnVision-Research/Lotus) - Diffusion-based depth and normal estimation
83
+ - [DINOv3](https://github.com/facebookresearch/dinov3) - Self-supervised vision transformer from Meta AI
84
+ - [Stable Diffusion 2](https://www.modelscope.cn/AI-ModelScope/stable-diffusion-2-base) - Base diffusion model