NCS-v1-2.5d-base

This is the 2.5D variant of the NCS-model, a seismic foundation model trained on a large share of full-stack seismic cubes from the Norwegian Continental Shelf (NCS) available through the public DISKOS database. This model has been developed by the Norwegian Computing Center (NR) in collaboration with the industry partners Equinor ASA and AkerBP ASA.

Model Description

NCS-v1-2.5d-base extends the standard ViT-MAE approach to a "2.5D" setting: instead of processing a single 2D slice, the model simultaneously encodes multiple 2D slices extracted at different azimuthal directions (inline, crossline and 2 diagonal slices) through the same seismic location (Waldeland et al., 2025). A shared CLS token and direction-aware positional embeddings — combining (x, y) sine-cosine encodings with a cyclic orientation code (Lee et al., 2022) — enable the model to learn representations that integrate structural information across orientations.

This approach provides richer spatial context than a purely 2D model while being substantially more computationally efficient than full 3D tokenization. The 2.5D variant achieves the best overall accuracy/efficiency trade-off across the tested benchmarks.

Usage

NCS-v1-2.5d-base has been designed to produce features that can be used for fine-tuning on dowsntream tasks such as seismic facies classification, salt body segmentation, geological structure detection (e.g., injectites, faults), content-based seismic image retrieval, horizon and event tracking.

How to Use

Loading the Model

Install the NCS package from this repository before running the example below.

from NCS.models.vit25d import ViT25DModel

model = ViT25DModel.from_pretrained("NorskRegnesentralSTI/NCS-v1-2.5d-base")

Feature Extraction

import torch

# Input: batch of multi-view seismic crops (B, V, H, W), normalized by cube std and clamped to 3σ
pixel_values = torch.randn(1, 4, 224, 224)
directions = torch.tensor([[0, 1, 2, 3]], dtype=torch.int32)  # dir0, dir45, dir90, dir135

with torch.no_grad():
    outputs = model(pixel_values=pixel_values, directions=directions)

# CLS token (image-level feature)
cls_features = outputs.last_hidden_state[:, 0, :]  # shape: (B, 768)

# Patch-level features for direction dir0 (first 196 patches after CLS)
num_patches = 196
dir0_patches = outputs.last_hidden_state[:, 1:1+num_patches, :]  # shape: (B, 196, 768)

Inference on Seismic Volumes

For running inference over full seismic volumes (SEG-Y / SGZ), use the NCS inference pipeline:

uv run scripts/inference.py \
  --model-path NorskRegnesentralSTI/NCS-v1-2.5d-base \
  --input-path /path/to/volume.segy \
  --output-path ./features_25d.zarr \
  --direction dir0 \
  --densify 1 \
  --num-overlap-patches 7 \
  --overlap-filter ramp \
  --batch-size 32 \
  --device cuda:0 \
  --dtype float16

Training Details

Pretraining Data

The model was pretrained on seismic reflection data from the Norwegian Continental Shelf (NCS), sourced from the DISKOS national data repository. The training corpus consists of 829 full-stack time and depth migrated 3D seismic cubes (~27 TB), spanning diverse geological settings, acquisition vintages, and processing generations across the NCS.

Preprocessing

Seismic amplitudes are standardized per-cube to unit variance.
Values are clipped at ±3 standard deviations.
For each training sample, 2D slices are extracted at 4 azimuthal directions (0°, 45°, 90°, 135°) through the same spatial location.
Diagonal slices (45°, 135°) are center-cropped and resized to correct for the √2 elongation.
Single-channel slices are passed as separate views to a shared patch projection layer.

Training Procedure

Pretraining method: Masked Autoencoder (MAE) with 85% masking ratio (applied after concatenating patches across views; per-view mask count is not enforced)
Initialization: ImageNet MAE ViT weights (RGB projection channels averaged to single-channel input; original positional encodings dropped)
Framework: PyTorch with flash-attention kernels
Hardware: 16 × NVIDIA GH200 GPUs
Precision: bfloat16 mixed precision
Global batch size: 2048
Learning rate: Cosine schedule, base LR = 1.5 × 10⁻⁴, effective LR = base_lr × batch_size / 256, warmup ratio = 0.05
Epochs: 100 (~1M samples per epoch)
Sampling: Density-aware sampling from seismic cubes, biased toward regions with sparser spatial coverage
Decoder: Lightweight 8-layer MAE decoder

Evaluation Protocol

Representations are evaluated with a frozen backbone using a k-nearest-neighbor (kNN, k=5) classifier on patch-level embeddings. Four interpretation benchmarks were used: salt segmentation, package segmentation, injectite mapping, and flatspot mapping, measured by mean Intersection-over-Union (mIoU). Only 100 labeled points per class are used (or a single labeled line for injectites).

Code

The model code and inference pipeline are available at: https://github.com/NorskRegnesentral/NCS_models

Citation

If you use this model, please cite:

@article{ordonez2025ncsmodel,
  title={The {NCS}-model: A seismic foundation model trained on the Norwegian repository of public seismic data},
  author={Ordo{\~n}ez, Alba and Forgaard, Theodor Johannes Line and Wade, David and Bugge, Aina Juell and Nese, H{\aa}kon and Waldeland, Anders Ueland},
  journal={arXiv preprint arXiv:2603.23211},
  year={2025}
}

Acknowledgments

This work is funded by The Research Council of Norway through the SFI Visual Intelligence (Centre for Research-based Innovation), grant no. 309439, and the industry partners Equinor ASA and AkerBP ASA. We also thank Equinor and AkerBP for providing access to the seismic data used in the evaluation.