Model Overview

Description

Standard ultrasound imaging assumes sound travels at the same speed everywhere in the body. In practice, different tissues propagate sound at different speeds, causing image blur. This model estimates a spatially-varying speed-of-sound (SoS) map from raw sensor data, enabling the beamformer to correct for local tissue properties and produce sharper images — analogous to autofocus on a camera, but for ultrasound.

NV-Raw2Insights-US estimates 2D SoS maps from raw ultrasound in-phase/quadrature (IQ) channel data — the complex-valued signals captured by each transducer element before any image is formed. A 1D CNN RF encoder maps multi-static IQ acquisitions to a latent representation; a 2D CNN head decodes it to a 32 x 32 SoS map. Ground-truth SoS fields for training are computed by the DBUA differentiable beamforming solver, a physics-based method too slow for real-time use.

This model is for research and development only.

License

CC BY-NC 4.0 -- Attribution required, non-commercial use only.

Deployment Geography

Global

Use Case

Target users: researchers in ultrasound imaging, ultrasound perception, and signal processing.

Intended applications:

Speed-of-sound reconstruction from raw, pre-beamformed channel data
Adaptive beamforming using estimated speed-of-sound maps
Learned inverse problems in acoustic imaging
Comparison of learned vs. physics-based speed-of-sound solvers

This is not a clinically validated medical device and should not be used for clinical diagnostic purposes.

Release Date

Hugging Face April 2026

References

W. Simson, L. Zhuang, S. J. Sanabria, N. Antil, J. J. Dahl, and D. Hyun, "Differentiable Beamforming for Ultrasound Autofocusing," in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 428--437, Springer, 2023. GitHub

Model Architecture

Architecture Type: Convolutional Neural Network (CNN)

Network Architecture: Two-stage convolutional pipeline (RF Encoder + SoS Estimation Head)

Component	Architecture	Role
Stage 1 -- RF Encoder	1D CNN over fast-time axis	Encodes per-channel complex IQ samples into a latent representation
Stage 2 -- SoS Head	2D CNN	Decodes aggregated latents to a 32 x 32 speed-of-sound map

Base Model: None; trained from scratch.

Parameters: 2.3M total (encoder 361K + decoder 1.98M)

Design Choices

The RF encoder convolves along the fast-time (depth) axis independently per channel. The two-stage factorization separates representation learning from task-specific estimation; the latent space is reusable for other downstream tasks (e.g., B-mode reconstruction, aberration correction). Training occurs in two phases. Phase 1 minimizes MSE between predicted and DBUA-generated ground-truth SoS maps end-to-end. The architecture uses standard convolutional components only -- no diffusion, vector quantization, or autoregression.

Inputs

Input Type: Ultrasound IQ (in-phase / quadrature) channel data

Input Format: Complex-valued numerical tensor

Input Parameters: Three-dimensional (3D)

Property	Specification
Shape	180 x 180 x 1024 (transmits x receives x time samples)
Value type	Complex (baseband demodulated IQ)
Transducer	Siemens 15L4 linear array
Acquisition	Multi-static sequence, 180 sequential single-element transmits
Preprocessing	Baseband demodulation required prior to inference

Other Properties Related to Input: Requires multi-static acquisition with a linear array. Other transducer geometries (phased, convex), transmit schemes (plane-wave, focused), and element counts are untested.

Outputs

Output Type: Speed-of-sound map

Output Format: Real-valued 2D numerical array (float32)

Output Parameters: Two-dimensional (2D)

Property	Specification
Shape	32 x 32 pixels
Units	meters per second (m/s)
Expected range	1400 -- 1600 m/s

Other Properties Related to Output: Values outside 1400--1600 m/s indicate out-of-distribution extrapolation and should not be trusted.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration

Runtime Engine: PyTorch

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere or newer (recommended)

Supported Operating System(s):

Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Versions

Version	Status	Description
v0.1-dev	Active development	Two-stage architecture (1D CNN RF encoder + 2D CNN SoS head), MSE training via DBUA ground truth

Training and Evaluation Datasets

Training Dataset

Data Modality: Other: Ultrasound channel capture IQ channel data (complex-valued 3D tensors [180, 180, 1024] float32) and scalar metadata.

Training Data Size: 640 GB on disk (2,391 samples, Apache Arrow format). Unit: ultrasound channel frames.

Data Collection Method: Automatic/Sensors (Siemens Healthineers research ultrasound scanner, 180-element linear array probe)

Labeling Method: Automatic/Sensors — all labels are derived from the acquired sensor data. No human annotation is involved.

Properties: 2,391 training samples of raw complex-valued RF ultrasound IQ data from Full Synthetic Aperture acquisitions. Machine-derived sensor data from laboratory tissue-mimicking phantoms. No personal data, no copyrighted content, no linguistic characteristics. Sensor: Siemens Healthineers research ultrasound scanner, 180-element linear array probe, 6.5 MHz center frequency, 13.33 MHz baseband sampling rate.

Evaluation Dataset

Data Collection Method: Automatic/Sensors (Siemens Healthineers research ultrasound scanner, demo phantom acquisitions)

Labeling Method: Automatic/Sensors — same derived labeling pipeline as training data.

Properties: 350 validation samples. Same modality, format, and sensor as training data. Machine-derived sensor data from laboratory phantoms held out from training. No personal data, no copyrighted content, no linguistic characteristics.

This dataset will not be released.

Inference

Acceleration Engine: PyTorch (with torch.compile)

Test Hardware:

NVIDIA IGX Orin (Thor iGPU)
NVIDIA RTX PRO 6000 Blackwell (dGPU)

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including nvidia/NV-Raw2Insights-US

MedTech Open Models

Collection

Open models for physical AI and medical imaging — robot control, surgical simulation, segmentation, reconstruction, generation, and reasoning. • 13 items • Updated 17 days ago • 9