MarsDEMNet

MarsDEMNet is a comparative deep learning study for single-image Digital Elevation Model (DEM) prediction from Mars CTX satellite imagery. Four architectures are evaluated, a classical Random Forest baseline, a single-output U-Net, a multi-output U-Net with multi-task learning, and an encoder depth ablation — all trained on the MCTED dataset of 80,898 paired CTX orthoimage and DEM patches.

Model Details

Model Description

MarsDEMNet addresses a fundamental coverage asymmetry on Mars: while the CTX instrument has photographed ~99.5% of the Martian surface at 5–6 m/pixel, high-resolution stereo DEMs exist for only ~0.5–1% of that coverage. Models trained on MCTED learn to predict dense elevation maps from single optical images, extending effective DEM coverage to nearly the entire planet.

Model type: Convolutional encoder-decoder (U-Net)
License: CC-BY 4.0
Finetuned from: Trained from scratch — no pretrained weights

Model Sources

Repository: https://github.com/harshithkethavath/MarsDEMNet
Dataset: https://huggingface.co/datasets/ESA-Datalabs/MCTED

Checkpoints

Four model checkpoints are provided:

File	Architecture	Val RMSE	Val MAE	Delta-1
`marsdеmnet-unet-elevation-4block.pt`	Single-output U-Net, 4-block encoder, 7.8M params	74.38m	52.86m	0.418
`marsdеmnet-unet-multitask-4block.pt`	Multi-output U-Net, 4-block encoder, 7.8M params	74.29m	52.68m	0.422
`marsdеmnet-unet-multitask-3block.pt`	Multi-output U-Net, 3-block encoder, 1.9M params	82.80m	58.29m	0.440
`marsdеmnet-unet-multitask-5block.pt`	Multi-output U-Net, 5-block encoder, 31.4M params	59.88m	42.67m	0.409

The 5-block multi-output model is the best overall, achieving 19% lower RMSE than the 4-block baseline with no overfitting observed.

How to Get Started

import torch
from scripts.deeplearning.unet import UNet

# Single-output (elevation only) — 4-block
model = UNet(in_channels=1, out_channels=1, num_blocks=4, base_ch=32)
ckpt  = torch.load("marsdеmnet-unet-elevation-4block.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state"])
model.eval()

# Multi-output (elevation + slope + roughness) — 5-block (best)
model = UNet(in_channels=1, out_channels=3, num_blocks=5, base_ch=32)
ckpt  = torch.load("marsdеmnet-unet-multitask-5block.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state"])
model.eval()

# Inference
with torch.no_grad():
    # optical: (1, 1, 518, 518) normalized CTX patch
    pred = model(optical)
    # Single-output: pred shape (1, 1, 518, 518) — elevation
    # Multi-output:  pred shape (1, 3, 518, 518) — [elevation, slope, roughness]

Input normalization: clip to 2nd–98th percentile, then z-score per patch. DEM targets are mean-subtracted per patch (relative elevation in meters).

Training Details

Training Data

MCTED (Mars CTX Terrain-Elevation Dataset) — 80,898 paired CTX orthoimage and DEM patches derived from 1,122 quality-filtered stereo scenes. Geography-aware train/val split at the scene level to prevent spatial leakage. Train: 65,090 patches. Val: 15,808 patches.

Training Procedure

Optimizer: AdamW, lr=1e-4, weight_decay=1e-4
Schedule: Cosine annealing to 1e-6 over 50 epochs
Early stopping: Patience 10 on val RMSE
Batch size: 16
Augmentation: Random horizontal/vertical flips and 90° rotations applied jointly to image and labels
Loss: Masked MAE (single-output); weighted sum of masked MAE losses (multi-output, uniform 1:1:1 weights)
Training regime: fp32
Hardware: NVIDIA H100 GPU

Preprocessing

CTX patches: percentile clip (2nd–98th) + per-patch z-score normalization
DEM patches: per-patch mean subtraction (relative elevation)
Validity masking: logical AND of NaN mask and deviation mask; invalid pixels excluded from loss and metrics

Evaluation

Metrics

MAE — mean absolute elevation error in meters
RMSE — primary ranking metric; penalizes large errors
Delta-1 — fraction of valid pixels where max(pred/gt, gt/pred) < 1.25

Results

Model	Params	Val RMSE	Val MAE	Delta-1
Random Forest (classical baseline)	—	58.39m (elev std)	41.29m	—
Single-output U-Net (4-block)	7.8M	74.38m	52.86m	0.418
Multi-output U-Net uniform (4-block)	7.8M	74.29m	52.68m	0.422
Multi-output U-Net (3-block ablation)	1.9M	82.80m	58.29m	0.440
Multi-output U-Net (5-block ablation)	31.4M	59.88m	42.67m	0.409

Bias, Risks, and Limitations

Models are trained on regions of Mars where stereo DEMs exist, which are geographically biased toward scientifically interesting terrain. Performance on flat, featureless plains may be lower.
Textureless terrain with no illumination gradient provides no depth cue, a known failure mode.
Predictions are relative elevation (mean-subtracted per patch), not absolute MOLA-referenced altitude.
Not suitable for safety-critical mission planning without further validation.

Technical Specifications

Model Architecture

U-Net encoder-decoder with configurable depth. Each encoder block: Conv2d(3×3) → BatchNorm → ReLU × 2 → MaxPool. Decoder: bilinear upsampling + lateral skip connections. Multi-output variant has three separate 1×1 conv heads for elevation, slope, and roughness.

Citation

If you use MarsDEMNet, please cite:

@misc{marsdеmnet2026,
  title     = {MarsDEMNet: Classical and Deep Learning Approaches for Single-Image Digital Elevation Model Prediction from Mars CTX Imagery},
  author    = {Harshith Kethavath},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/harshithkethavath/MarsDEMNet}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

harshithkethavath
/

MarsDEMNet