File size: 6,028 Bytes
f1d96ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
958f6a3
f1d96ff
2c785f3
f1d96ff
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
license: cc-by-4.0
tags:
- pytorch
- computer-vision
- remote-sensing
- mars
- dem-prediction
- u-net
- multi-task-learning
datasets:
- ESA-Datalabs/MCTED
---

# MarsDEMNet

MarsDEMNet is a comparative deep learning study for single-image Digital Elevation Model (DEM) prediction from Mars CTX satellite imagery. Four architectures are evaluated, a classical Random Forest baseline, a single-output U-Net, a multi-output U-Net with multi-task learning, and an encoder depth ablation — all trained on the MCTED dataset of 80,898 paired CTX orthoimage and DEM patches.

## Model Details

### Model Description

MarsDEMNet addresses a fundamental coverage asymmetry on Mars: while the CTX instrument has photographed ~99.5% of the Martian surface at 5–6 m/pixel, high-resolution stereo DEMs exist for only ~0.5–1% of that coverage. Models trained on MCTED learn to predict dense elevation maps from single optical images, extending effective DEM coverage to nearly the entire planet.

- **Model type:** Convolutional encoder-decoder (U-Net)
- **License:** CC-BY 4.0
- **Finetuned from:** Trained from scratch — no pretrained weights

### Model Sources

- **Repository:** https://github.com/harshithkethavath/MarsDEMNet
- **Dataset:** https://huggingface.co/datasets/ESA-Datalabs/MCTED

## Checkpoints

Four model checkpoints are provided:

| File | Architecture | Val RMSE | Val MAE | Delta-1 |
|---|---|---|---|---|
| `marsdеmnet-unet-elevation-4block.pt` | Single-output U-Net, 4-block encoder, 7.8M params | 74.38m | 52.86m | 0.418 |
| `marsdеmnet-unet-multitask-4block.pt` | Multi-output U-Net, 4-block encoder, 7.8M params | 74.29m | 52.68m | 0.422 |
| `marsdеmnet-unet-multitask-3block.pt` | Multi-output U-Net, 3-block encoder, 1.9M params | 82.80m | 58.29m | 0.440 |
| `marsdеmnet-unet-multitask-5block.pt` | Multi-output U-Net, 5-block encoder, 31.4M params | 59.88m | 42.67m | 0.409 |

The 5-block multi-output model is the best overall, achieving 19% lower RMSE than the 4-block baseline with no overfitting observed.

## How to Get Started

```python
import torch
from scripts.deeplearning.unet import UNet

# Single-output (elevation only) — 4-block
model = UNet(in_channels=1, out_channels=1, num_blocks=4, base_ch=32)
ckpt  = torch.load("marsdеmnet-unet-elevation-4block.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state"])
model.eval()

# Multi-output (elevation + slope + roughness) — 5-block (best)
model = UNet(in_channels=1, out_channels=3, num_blocks=5, base_ch=32)
ckpt  = torch.load("marsdеmnet-unet-multitask-5block.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state"])
model.eval()

# Inference
with torch.no_grad():
    # optical: (1, 1, 518, 518) normalized CTX patch
    pred = model(optical)
    # Single-output: pred shape (1, 1, 518, 518) — elevation
    # Multi-output:  pred shape (1, 3, 518, 518) — [elevation, slope, roughness]
```

Input normalization: clip to 2nd–98th percentile, then z-score per patch. DEM targets are mean-subtracted per patch (relative elevation in meters).

## Training Details

### Training Data

MCTED (Mars CTX Terrain-Elevation Dataset) — 80,898 paired CTX orthoimage and DEM patches derived from 1,122 quality-filtered stereo scenes. Geography-aware train/val split at the scene level to prevent spatial leakage. Train: 65,090 patches. Val: 15,808 patches.

### Training Procedure

- **Optimizer:** AdamW, lr=1e-4, weight_decay=1e-4
- **Schedule:** Cosine annealing to 1e-6 over 50 epochs
- **Early stopping:** Patience 10 on val RMSE
- **Batch size:** 16
- **Augmentation:** Random horizontal/vertical flips and 90° rotations applied jointly to image and labels
- **Loss:** Masked MAE (single-output); weighted sum of masked MAE losses (multi-output, uniform 1:1:1 weights)
- **Training regime:** fp32
- **Hardware:** NVIDIA H100 GPU

### Preprocessing

- CTX patches: percentile clip (2nd–98th) + per-patch z-score normalization
- DEM patches: per-patch mean subtraction (relative elevation)
- Validity masking: logical AND of NaN mask and deviation mask; invalid pixels excluded from loss and metrics

## Evaluation

### Metrics

- **MAE** — mean absolute elevation error in meters
- **RMSE** — primary ranking metric; penalizes large errors
- **Delta-1** — fraction of valid pixels where max(pred/gt, gt/pred) < 1.25

### Results

| Model | Params | Val RMSE | Val MAE | Delta-1 |
|---|---|---|---|---|
| Random Forest (classical baseline) | — | 58.39m (elev std) | 41.29m | — |
| Single-output U-Net (4-block) | 7.8M | 74.38m | 52.86m | 0.418 |
| Multi-output U-Net uniform (4-block) | 7.8M | 74.29m | 52.68m | 0.422 |
| Multi-output U-Net (3-block ablation) | 1.9M | 82.80m | 58.29m | 0.440 |
| Multi-output U-Net (5-block ablation) | 31.4M | **59.88m** | **42.67m** | 0.409 |

## Bias, Risks, and Limitations

- Models are trained on regions of Mars where stereo DEMs exist, which are geographically biased toward scientifically interesting terrain. Performance on flat, featureless plains may be lower.
- Textureless terrain with no illumination gradient provides no depth cue, a known failure mode.
- Predictions are relative elevation (mean-subtracted per patch), not absolute MOLA-referenced altitude.
- Not suitable for safety-critical mission planning without further validation.

## Technical Specifications

### Model Architecture

U-Net encoder-decoder with configurable depth. Each encoder block: Conv2d(3×3) → BatchNorm → ReLU × 2 → MaxPool. Decoder: bilinear upsampling + lateral skip connections. Multi-output variant has three separate 1×1 conv heads for elevation, slope, and roughness.

## Citation

If you use MarsDEMNet, please cite:

```bibtex
@misc{marsdеmnet2026,
  title     = {MarsDEMNet: Classical and Deep Learning Approaches for Single-Image Digital Elevation Model Prediction from Mars CTX Imagery},
  author    = {Harshith Kethavath},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/harshithkethavath/MarsDEMNet}
}
```