Update model card with paper, project, and code links

Hi! I'm Niels from the Hugging Face community team.

This PR improves the model card for TransNormal by:
- Linking it to the corresponding [Hugging Face paper page](https://huggingface.co/papers/2602.00839).
- Adding direct links to the [official project page](https://longxiang-ai.github.io/TransNormal) and [GitHub repository](https://github.com/longxiang-ai/TransNormal).
- Including the authors for proper attribution.
- Providing a brief summary of the method based on the paper abstract.
- Updating the sample usage code to include recommended `bfloat16` precision and showing how to save the output normal map.
- Updating the citation information to the latest BibTeX entry.
- Adding a link to the license and the acknowledgements section.

This makes the model more discoverable and easier to use for the community. Please review and merge if it looks good!

Files changed (1) hide show

README.md +44 -15

README.md CHANGED Viewed

@@ -1,34 +1,43 @@
 ---
 license: cc-by-nc-4.0
 tags:
 - normal-estimation
 - depth-estimation
 - diffusion
 - transparent-objects
-library_name: diffusers
-pipeline_tag: image-to-image
 ---
-# TransNormal
-Surface normal estimation for transparent objects using diffusion models with DINOv3 semantic guidance.
 ## Usage
 ```python
 from transnormal import TransNormalPipeline, create_dino_encoder
 import torch
-# Load DINO encoder (download separately)
 dino_encoder = create_dino_encoder(
     model_name="dinov3_vith16plus",
-    weights_path="path/to/dinov3_vith16plus",
-    projector_path="path/to/cross_attention_projector.pt",
     device="cuda",
     dtype=torch.bfloat16,
 )
-# Load pipeline
 pipe = TransNormalPipeline.from_pretrained(
     "longxiang-ai/transnormal-v1",
     dino_encoder=dino_encoder,
@@ -36,20 +45,40 @@ pipe = TransNormalPipeline.from_pretrained(
 )
 pipe = pipe.to("cuda")
-# Inference
-normal_map = pipe("image.jpg", output_type="pil")
 ```
 ## Citation
 ```bibtex
-@article{transnormal2025,
-  title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
-  author={Li, Mingwei and Fan, Hehe and Yang, Yi},
-  year={2025}
 }
 ```
 ## License
-CC BY-NC 4.0

 ---
+library_name: diffusers
 license: cc-by-nc-4.0
+pipeline_tag: image-to-image
 tags:
 - normal-estimation
 - depth-estimation
 - diffusion
 - transparent-objects
 ---
+# TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation
+This is the official repository for the paper [TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation](https://huggingface.co/papers/2602.00839).
+[**Project Page**](https://longxiang-ai.github.io/TransNormal) | [**GitHub**](https://github.com/longxiang-ai/TransNormal)
+**Authors**: Mingwei Li, Hehe Fan, Yi Yang
+TransNormal is a novel framework that adapts pre-trained diffusion priors for single-step normal regression for transparent objects. It addresses challenges like complex light refraction and reflection by integrating dense visual semantics from DINOv3 via a cross-attention mechanism, providing strong geometric cues for textureless transparent surfaces. The framework also employs a multi-task learning objective and wavelet-based regularization to preserve fine-grained structural details.
 ## Usage
+To use this model, you need to set up the DINOv3 encoder separately (as it requires access approval from Meta AI).
 ```python
 from transnormal import TransNormalPipeline, create_dino_encoder
 import torch
+# Create DINO encoder
+# Note: Use bfloat16 instead of float16 to avoid potential issues with DINOv3
 dino_encoder = create_dino_encoder(
     model_name="dinov3_vith16plus",
+    weights_path="path/to/dinov3_vith16plus", # Path to approved DINOv3 weights
+    projector_path="./weights/transnormal/cross_attention_projector.pt",
     device="cuda",
     dtype=torch.bfloat16,
 )
+# Load TransNormal pipeline
 pipe = TransNormalPipeline.from_pretrained(
     "longxiang-ai/transnormal-v1",
     dino_encoder=dino_encoder,
 )
 pipe = pipe.to("cuda")
+# Run inference
+normal_map = pipe(
+    image="path/to/image.jpg",
+    output_type="pil",  # Choose from "np", "pil", or "pt"
+)
+# Save the result
+from transnormal import save_normal_map
+save_normal_map(normal_map, "output_normal.png")
 ```
 ## Citation
+If you find our work useful, please consider citing:
 ```bibtex
+@misc{li2026transnormal,
+      title={TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation},
+      author={Mingwei Li and Hehe Fan and Yi Yang},
+      year={2026},
+      eprint={2602.00839},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2602.00839},
 }
 ```
 ## License
+This project is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
+## Acknowledgements
+This work builds upon:
+- [Lotus](https://github.com/EnVision-Research/Lotus) - Diffusion-based depth and normal estimation
+- [DINOv3](https://github.com/facebookresearch/dinov3) - Self-supervised vision transformer from Meta AI
+- [Stable Diffusion 2](https://www.modelscope.cn/AI-ModelScope/stable-diffusion-2-base) - Base diffusion model