Model Overview

Description

Standard ultrasound imaging assumes sound travels at the same speed everywhere in the body. In practice, different tissues propagate sound at different speeds, causing image blur. This model estimates a spatially-varying speed-of-sound (SoS) map from raw sensor data, enabling the beamformer to correct for local tissue properties and produce sharper images โ€” analogous to autofocus on a camera, but for ultrasound.

NV-Raw2Insights-US estimates 2D SoS maps from raw ultrasound in-phase/quadrature (IQ) channel data โ€” the complex-valued signals captured by each transducer element before any image is formed. A 1D CNN RF encoder maps multi-static IQ acquisitions to a latent representation; a 2D CNN head decodes it to a 32 x 32 SoS map. Ground-truth SoS fields for training are computed by the DBUA differentiable beamforming solver, a physics-based method too slow for real-time use.

This model is for research and development only.

License

CC BY-NC 4.0 -- Attribution required, non-commercial use only.

Deployment Geography

Global

Use Case

Target users: researchers in ultrasound imaging, ultrasound perception, and signal processing.

Intended applications:

  • Speed-of-sound reconstruction from raw, pre-beamformed channel data
  • Adaptive beamforming using estimated speed-of-sound maps
  • Learned inverse problems in acoustic imaging
  • Comparison of learned vs. physics-based speed-of-sound solvers

This is not a clinically validated medical device and should not be used for clinical diagnostic purposes.

Release Date

Hugging Face April 2026


References

  1. W. Simson, L. Zhuang, S. J. Sanabria, N. Antil, J. J. Dahl, and D. Hyun, "Differentiable Beamforming for Ultrasound Autofocusing," in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 428--437, Springer, 2023. GitHub

Model Architecture

Architecture Type: Convolutional Neural Network (CNN)

Network Architecture: Two-stage convolutional pipeline (RF Encoder + SoS Estimation Head)

Component Architecture Role
Stage 1 -- RF Encoder 1D CNN over fast-time axis Encodes per-channel complex IQ samples into a latent representation
Stage 2 -- SoS Head 2D CNN Decodes aggregated latents to a 32 x 32 speed-of-sound map

Base Model: None; trained from scratch.

Parameters: 2.3M total (encoder 361K + decoder 1.98M)

Design Choices

The RF encoder convolves along the fast-time (depth) axis independently per channel. The two-stage factorization separates representation learning from task-specific estimation; the latent space is reusable for other downstream tasks (e.g., B-mode reconstruction, aberration correction). Training occurs in two phases. Phase 1 minimizes MSE between predicted and DBUA-generated ground-truth SoS maps end-to-end. The architecture uses standard convolutional components only -- no diffusion, vector quantization, or autoregression.


Inputs

Input Type: Ultrasound IQ (in-phase / quadrature) channel data

Input Format: Complex-valued numerical tensor

Input Parameters: Three-dimensional (3D)

Property Specification
Shape 180 x 180 x 1024 (transmits x receives x time samples)
Value type Complex (baseband demodulated IQ)
Transducer Siemens 15L4 linear array
Acquisition Multi-static sequence, 180 sequential single-element transmits
Preprocessing Baseband demodulation required prior to inference

Other Properties Related to Input: Requires multi-static acquisition with a linear array. Other transducer geometries (phased, convex), transmit schemes (plane-wave, focused), and element counts are untested.


Outputs

Output Type: Speed-of-sound map

Output Format: Real-valued 2D numerical array (float32)

Output Parameters: Two-dimensional (2D)

Property Specification
Shape 32 x 32 pixels
Units meters per second (m/s)
Expected range 1400 -- 1600 m/s

Other Properties Related to Output: Values outside 1400--1600 m/s indicate out-of-distribution extrapolation and should not be trusted.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.


Software Integration

Runtime Engine: PyTorch

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere or newer (recommended)

Supported Operating System(s):

  • Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.


Model Versions

Version Status Description
v0.1-dev Active development Two-stage architecture (1D CNN RF encoder + 2D CNN SoS head), MSE training via DBUA ground truth

Training and Evaluation Datasets

Training Dataset

Data Modality: Other: Ultrasound channel capture IQ channel data (complex-valued 3D tensors [180, 180, 1024] float32) and scalar metadata.

Training Data Size: 640 GB on disk (2,391 samples, Apache Arrow format). Unit: ultrasound channel frames.

Data Collection Method: Automatic/Sensors (Siemens Healthineers research ultrasound scanner, 180-element linear array probe)

Labeling Method: Automatic/Sensors โ€” all labels are derived from the acquired sensor data. No human annotation is involved.

Properties: 2,391 training samples of raw complex-valued RF ultrasound IQ data from Full Synthetic Aperture acquisitions. Machine-derived sensor data from laboratory tissue-mimicking phantoms. No personal data, no copyrighted content, no linguistic characteristics. Sensor: Siemens Healthineers research ultrasound scanner, 180-element linear array probe, 6.5 MHz center frequency, 13.33 MHz baseband sampling rate.

Evaluation Dataset

Data Collection Method: Automatic/Sensors (Siemens Healthineers research ultrasound scanner, demo phantom acquisitions)

Labeling Method: Automatic/Sensors โ€” same derived labeling pipeline as training data.

Properties: 350 validation samples. Same modality, format, and sensor as training data. Machine-derived sensor data from laboratory phantoms held out from training. No personal data, no copyrighted content, no linguistic characteristics.

This dataset will not be released.


Inference

Acceleration Engine: PyTorch (with torch.compile)

Test Hardware:

  • NVIDIA IGX Orin (Thor iGPU)
  • NVIDIA RTX PRO 6000 Blackwell (dGPU)

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Collection including nvidia/NV-Raw2Insights-US