File size: 4,348 Bytes
12510fb f471943 3a2dbc9 f471943 12510fb eed0afe 12510fb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | ---
license: cc-by-nc-4.0
library_name: pytorch
tags:
- inverse-rendering
- image-decomposition
- basecolor
- normal-map
- pbr
- material-estimation
- shadenet
pipeline_tag: image-to-image
datasets:
- flickr8k
---
# ShadeNet 28M
A lightweight inverse rendering model that decomposes RGB images into PBR material maps (basecolor, normal, roughness/metallic/depth) and reconstructs them back to RGB.
## Examples
## Examples
<a href="https://huggingface.co/singam96/ShadeNet/resolve/main/assets/input_result.png" target="_blank"><img src="https://huggingface.co/singam96/ShadeNet/resolve/main/assets/input_result.png" alt="Sample 1"></a>
<a href="https://huggingface.co/singam96/ShadeNet/resolve/main/assets/152029243_b3582c36fa_result.png" target="_blank"><img src="https://huggingface.co/singam96/ShadeNet/resolve/main/assets/152029243_b3582c36fa_result.png" alt="Sample 2"></a>
<a href="https://huggingface.co/singam96/ShadeNet/resolve/main/assets/160585932_fa6339f248_result.png" target="_blank"><img src="https://huggingface.co/singam96/ShadeNet/resolve/main/assets/160585932_fa6339f248_result.png" alt="Sample 3"></a>
*Each result shows: Input (blue) β Basecolor, Normal, Depth, Roughness, Metallic (green) β Recon RGB (orange). Click an image to view full size.*
## Architecture
The model is a **MobileNetUNet (27.9M params)** with:
- **MobileNetV2 backbone** (frozen except last 8 layers) for feature extraction
- **Parallel Encoder** for additional learned features
- **UNet-style decoder** with skip connections, channel attention, and spatial attention
- **Dual mode** forward pass:
- **Mode 0**: RGB β Inverse Maps (basecolor, normal, roughness/metallic/depth)
- **Mode 1**: Inverse Maps β RGB reconstruction
## Output Maps
| Map | Channels | Description |
|-----|----------|-------------|
| **Basecolor** | 3 | Albedo / diffuse color |
| **Normal** | 3 | Surface normals (tangent space) |
| **Roughness** | 1 | R channel of RMD - surface roughness |
| **Metallic** | 1 | G channel of RMD - metalness |
| **Depth** | 1 | B channel of RMD - relative depth |
| **RGB** | 3 | Reconstructed RGB from inverse maps |
## Files
```
shadenet/
βββ app.py # Gradio Space app
βββ inference.py # Standalone inference script (CLI)
βββ inference_utils.py # Inference utilities (tiling, compositing)
βββ model.py # Model architecture
βββ layers.py # Layer components
βββ config.py # Configuration
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ checkpoints/
β βββ last.ckpt # PyTorch Lightning checkpoint (model weights)
βββ onnx/
βββ model_mode0.onnx # Mode 0 ONNX (RGB β inverse maps)
βββ model_mode0_quantized.onnx # Quantized mode 0
βββ model_mode1.onnx # Mode 1 ONNX (inverse maps β RGB)
βββ model_mode1_quantized.onnx # Quantized mode 1
```
## Usage
### Gradio Space
A **HuggingFace Space** hosts this model as an interactive web app β upload an image in your browser and see results instantly, no installation needed.
The `app.py` in this repo is the Space entrypoint. To create one:
1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
2. Select **Gradio SDK**, choose this repo as the source
3. Space will auto-launch with the Gradio interface
### CLI Inference
```bash
pip install -r requirements.txt
python inference.py input.jpg --output_dir ./output
```
### ONNX Inference
The `onnx/` folder contains exported ONNX models for deployment without PyTorch:
- `model_mode0.onnx` / `model_mode0_quantized.onnx`: RGB β basecolor, normal, RMD
- `model_mode1.onnx` / `model_mode1_quantized.onnx`: Inverse maps β RGB
Input shape: `[1, 3, 512, 512]`, values in `[-1, 1]`
## Training
Trained on Flickr8k with paired inverse-rendered data. The model learns both forward (RGBβinverse) and reverse (inverseβRGB) mappings simultaneously using a combined L1 + MSE loss per output map.
- Optimizer: AdamW / Prodigy
- Image size: 512Γ512
- Precision: 16-mixed
- Loss weights: basecolor=1.0, normal=1.5, RMD=1.0, RGB=1.0
## Citation
```
@software{shadenet,
author = {Sachin},
title = {ShadeNet},
year = {2026},
}
```
|