nielsr HF Staff commited on
Commit
0a77db3
·
verified ·
1 Parent(s): 59075bc

Update model card with paper, project, and code links

Browse files

Hi! I'm Niels from the Hugging Face community team.

This PR improves the model card for TransNormal by:
- Linking it to the corresponding [Hugging Face paper page](https://huggingface.co/papers/2602.00839).
- Adding direct links to the [official project page](https://longxiang-ai.github.io/TransNormal) and [GitHub repository](https://github.com/longxiang-ai/TransNormal).
- Including the authors for proper attribution.
- Providing a brief summary of the method based on the paper abstract.
- Updating the sample usage code to include recommended `bfloat16` precision and showing how to save the output normal map.
- Updating the citation information to the latest BibTeX entry.
- Adding a link to the license and the acknowledgements section.

This makes the model more discoverable and easier to use for the community. Please review and merge if it looks good!

Files changed (1) hide show
  1. README.md +44 -15
README.md CHANGED
@@ -1,34 +1,43 @@
1
  ---
 
2
  license: cc-by-nc-4.0
 
3
  tags:
4
  - normal-estimation
5
  - depth-estimation
6
  - diffusion
7
  - transparent-objects
8
- library_name: diffusers
9
- pipeline_tag: image-to-image
10
  ---
11
 
12
- # TransNormal
 
 
 
 
 
 
13
 
14
- Surface normal estimation for transparent objects using diffusion models with DINOv3 semantic guidance.
15
 
16
  ## Usage
17
 
 
 
18
  ```python
19
  from transnormal import TransNormalPipeline, create_dino_encoder
20
  import torch
21
 
22
- # Load DINO encoder (download separately)
 
23
  dino_encoder = create_dino_encoder(
24
  model_name="dinov3_vith16plus",
25
- weights_path="path/to/dinov3_vith16plus",
26
- projector_path="path/to/cross_attention_projector.pt",
27
  device="cuda",
28
  dtype=torch.bfloat16,
29
  )
30
 
31
- # Load pipeline
32
  pipe = TransNormalPipeline.from_pretrained(
33
  "longxiang-ai/transnormal-v1",
34
  dino_encoder=dino_encoder,
@@ -36,20 +45,40 @@ pipe = TransNormalPipeline.from_pretrained(
36
  )
37
  pipe = pipe.to("cuda")
38
 
39
- # Inference
40
- normal_map = pipe("image.jpg", output_type="pil")
 
 
 
 
 
 
 
41
  ```
42
 
43
  ## Citation
44
 
 
 
45
  ```bibtex
46
- @article{transnormal2025,
47
- title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
48
- author={Li, Mingwei and Fan, Hehe and Yang, Yi},
49
- year={2025}
 
 
 
 
50
  }
51
  ```
52
 
53
  ## License
54
 
55
- CC BY-NC 4.0
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: diffusers
3
  license: cc-by-nc-4.0
4
+ pipeline_tag: image-to-image
5
  tags:
6
  - normal-estimation
7
  - depth-estimation
8
  - diffusion
9
  - transparent-objects
 
 
10
  ---
11
 
12
+ # TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation
13
+
14
+ This is the official repository for the paper [TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation](https://huggingface.co/papers/2602.00839).
15
+
16
+ [**Project Page**](https://longxiang-ai.github.io/TransNormal) | [**GitHub**](https://github.com/longxiang-ai/TransNormal)
17
+
18
+ **Authors**: Mingwei Li, Hehe Fan, Yi Yang
19
 
20
+ TransNormal is a novel framework that adapts pre-trained diffusion priors for single-step normal regression for transparent objects. It addresses challenges like complex light refraction and reflection by integrating dense visual semantics from DINOv3 via a cross-attention mechanism, providing strong geometric cues for textureless transparent surfaces. The framework also employs a multi-task learning objective and wavelet-based regularization to preserve fine-grained structural details.
21
 
22
  ## Usage
23
 
24
+ To use this model, you need to set up the DINOv3 encoder separately (as it requires access approval from Meta AI).
25
+
26
  ```python
27
  from transnormal import TransNormalPipeline, create_dino_encoder
28
  import torch
29
 
30
+ # Create DINO encoder
31
+ # Note: Use bfloat16 instead of float16 to avoid potential issues with DINOv3
32
  dino_encoder = create_dino_encoder(
33
  model_name="dinov3_vith16plus",
34
+ weights_path="path/to/dinov3_vith16plus", # Path to approved DINOv3 weights
35
+ projector_path="./weights/transnormal/cross_attention_projector.pt",
36
  device="cuda",
37
  dtype=torch.bfloat16,
38
  )
39
 
40
+ # Load TransNormal pipeline
41
  pipe = TransNormalPipeline.from_pretrained(
42
  "longxiang-ai/transnormal-v1",
43
  dino_encoder=dino_encoder,
 
45
  )
46
  pipe = pipe.to("cuda")
47
 
48
+ # Run inference
49
+ normal_map = pipe(
50
+ image="path/to/image.jpg",
51
+ output_type="pil", # Choose from "np", "pil", or "pt"
52
+ )
53
+
54
+ # Save the result
55
+ from transnormal import save_normal_map
56
+ save_normal_map(normal_map, "output_normal.png")
57
  ```
58
 
59
  ## Citation
60
 
61
+ If you find our work useful, please consider citing:
62
+
63
  ```bibtex
64
+ @misc{li2026transnormal,
65
+ title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
66
+ author={Mingwei Li and Hehe Fan and Yi Yang},
67
+ year={2026},
68
+ eprint={2602.00839},
69
+ archivePrefix={arXiv},
70
+ primaryClass={cs.CV},
71
+ url={https://arxiv.org/abs/2602.00839},
72
  }
73
  ```
74
 
75
  ## License
76
 
77
+ This project is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
78
+
79
+ ## Acknowledgements
80
+
81
+ This work builds upon:
82
+ - [Lotus](https://github.com/EnVision-Research/Lotus) - Diffusion-based depth and normal estimation
83
+ - [DINOv3](https://github.com/facebookresearch/dinov3) - Self-supervised vision transformer from Meta AI
84
+ - [Stable Diffusion 2](https://www.modelscope.cn/AI-ModelScope/stable-diffusion-2-base) - Base diffusion model