Longxiang-ai
/

TransNormal

@@ -1,34 +1,43 @@
 ---
 license: cc-by-nc-4.0
 tags:
 - normal-estimation
 - depth-estimation
 - diffusion
 - transparent-objects
-library_name: diffusers
-pipeline_tag: image-to-image
 ---
-# TransNormal
-Surface normal estimation for transparent objects using diffusion models with DINOv3 semantic guidance.
 ## Usage
 ```python
 from transnormal import TransNormalPipeline, create_dino_encoder
 import torch
-# Load DINO encoder (download separately)
 dino_encoder = create_dino_encoder(
     model_name="dinov3_vith16plus",
-    weights_path="path/to/dinov3_vith16plus",
-    projector_path="path/to/cross_attention_projector.pt",
     device="cuda",
     dtype=torch.bfloat16,
 )
-# Load pipeline
 pipe = TransNormalPipeline.from_pretrained(
     "longxiang-ai/transnormal-v1",
     dino_encoder=dino_encoder,
@@ -36,20 +45,40 @@ pipe = TransNormalPipeline.from_pretrained(
 )
 pipe = pipe.to("cuda")
-# Inference
-normal_map = pipe("image.jpg", output_type="pil")
 ```
 ## Citation
 ```bibtex
-@article{transnormal2025,
-  title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
-  author={Li, Mingwei and Fan, Hehe and Yang, Yi},
-  year={2025}
 }
 ```
 ## License
-CC BY-NC 4.0

 ---
+library_name: diffusers
 license: cc-by-nc-4.0
+pipeline_tag: image-to-image
 tags:
 - normal-estimation
 - depth-estimation
 - diffusion
 - transparent-objects
 ---
+# TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation
+This is the official repository for the paper [TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation](https://huggingface.co/papers/2602.00839).
+[**Project Page**](https://longxiang-ai.github.io/TransNormal) | [**GitHub**](https://github.com/longxiang-ai/TransNormal)
+**Authors**: Mingwei Li, Hehe Fan, Yi Yang
+TransNormal is a novel framework that adapts pre-trained diffusion priors for single-step normal regression for transparent objects. It addresses challenges like complex light refraction and reflection by integrating dense visual semantics from DINOv3 via a cross-attention mechanism, providing strong geometric cues for textureless transparent surfaces. The framework also employs a multi-task learning objective and wavelet-based regularization to preserve fine-grained structural details.
 ## Usage
+To use this model, you need to set up the DINOv3 encoder separately (as it requires access approval from Meta AI).
 ```python
 from transnormal import TransNormalPipeline, create_dino_encoder
 import torch
+# Create DINO encoder
+# Note: Use bfloat16 instead of float16 to avoid potential issues with DINOv3
 dino_encoder = create_dino_encoder(
     model_name="dinov3_vith16plus",
+    weights_path="path/to/dinov3_vith16plus", # Path to approved DINOv3 weights
+    projector_path="./weights/transnormal/cross_attention_projector.pt",
     device="cuda",
     dtype=torch.bfloat16,
 )
+# Load TransNormal pipeline
 pipe = TransNormalPipeline.from_pretrained(
     "longxiang-ai/transnormal-v1",
     dino_encoder=dino_encoder,
 )
 pipe = pipe.to("cuda")
+# Run inference
+normal_map = pipe(
+    image="path/to/image.jpg",
+    output_type="pil",  # Choose from "np", "pil", or "pt"
+)
+# Save the result
+from transnormal import save_normal_map
+save_normal_map(normal_map, "output_normal.png")
 ```
 ## Citation
+If you find our work useful, please consider citing:
 ```bibtex
+@misc{li2026transnormal,
+      title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
+      author={Mingwei Li and Hehe Fan and Yi Yang},
+      year={2026},
+      eprint={2602.00839},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2602.00839},
 }
 ```
 ## License
+This project is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
+## Acknowledgements
+This work builds upon:
+- [Lotus](https://github.com/EnVision-Research/Lotus) - Diffusion-based depth and normal estimation
+- [DINOv3](https://github.com/facebookresearch/dinov3) - Self-supervised vision transformer from Meta AI
+- [Stable Diffusion 2](https://www.modelscope.cn/AI-ModelScope/stable-diffusion-2-base) - Base diffusion model