Model Overview
Description
Standard ultrasound imaging assumes sound travels at the same speed everywhere in the body. In practice, different tissues propagate sound at different speeds, causing image blur. This model estimates a spatially-varying speed-of-sound (SoS) map from raw sensor data, enabling the beamformer to correct for local tissue properties and produce sharper images โ analogous to autofocus on a camera, but for ultrasound.
NV-Raw2Insights-US estimates 2D SoS maps from raw ultrasound in-phase/quadrature (IQ) channel data โ the complex-valued signals captured by each transducer element before any image is formed. A 1D CNN RF encoder maps multi-static IQ acquisitions to a latent representation; a 2D CNN head decodes it to a 32 x 32 SoS map. Ground-truth SoS fields for training are computed by the DBUA differentiable beamforming solver, a physics-based method too slow for real-time use.
This model is for research and development only.
License
CC BY-NC 4.0 -- Attribution required, non-commercial use only.
Deployment Geography
Global
Use Case
Target users: researchers in ultrasound imaging, ultrasound perception, and signal processing.
Intended applications:
- Speed-of-sound reconstruction from raw, pre-beamformed channel data
- Adaptive beamforming using estimated speed-of-sound maps
- Learned inverse problems in acoustic imaging
- Comparison of learned vs. physics-based speed-of-sound solvers
This is not a clinically validated medical device and should not be used for clinical diagnostic purposes.
Release Date
Hugging Face April 2026
References
- W. Simson, L. Zhuang, S. J. Sanabria, N. Antil, J. J. Dahl, and D. Hyun, "Differentiable Beamforming for Ultrasound Autofocusing," in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 428--437, Springer, 2023. GitHub
Model Architecture
Architecture Type: Convolutional Neural Network (CNN)
Network Architecture: Two-stage convolutional pipeline (RF Encoder + SoS Estimation Head)
| Component | Architecture | Role |
|---|---|---|
| Stage 1 -- RF Encoder | 1D CNN over fast-time axis | Encodes per-channel complex IQ samples into a latent representation |
| Stage 2 -- SoS Head | 2D CNN | Decodes aggregated latents to a 32 x 32 speed-of-sound map |
Base Model: None; trained from scratch.
Parameters: 2.3M total (encoder 361K + decoder 1.98M)
Design Choices
The RF encoder convolves along the fast-time (depth) axis independently per channel. The two-stage factorization separates representation learning from task-specific estimation; the latent space is reusable for other downstream tasks (e.g., B-mode reconstruction, aberration correction). Training occurs in two phases. Phase 1 minimizes MSE between predicted and DBUA-generated ground-truth SoS maps end-to-end. The architecture uses standard convolutional components only -- no diffusion, vector quantization, or autoregression.
Inputs
Input Type: Ultrasound IQ (in-phase / quadrature) channel data
Input Format: Complex-valued numerical tensor
Input Parameters: Three-dimensional (3D)
| Property | Specification |
|---|---|
| Shape | 180 x 180 x 1024 (transmits x receives x time samples) |
| Value type | Complex (baseband demodulated IQ) |
| Transducer | Siemens 15L4 linear array |
| Acquisition | Multi-static sequence, 180 sequential single-element transmits |
| Preprocessing | Baseband demodulation required prior to inference |
Other Properties Related to Input: Requires multi-static acquisition with a linear array. Other transducer geometries (phased, convex), transmit schemes (plane-wave, focused), and element counts are untested.
Outputs
Output Type: Speed-of-sound map
Output Format: Real-valued 2D numerical array (float32)
Output Parameters: Two-dimensional (2D)
| Property | Specification |
|---|---|
| Shape | 32 x 32 pixels |
| Units | meters per second (m/s) |
| Expected range | 1400 -- 1600 m/s |
Other Properties Related to Output: Values outside 1400--1600 m/s indicate out-of-distribution extrapolation and should not be trusted.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration
Runtime Engine: PyTorch
Supported Hardware Microarchitecture Compatibility:
- NVIDIA Ampere or newer (recommended)
Supported Operating System(s):
- Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
Model Versions
| Version | Status | Description |
|---|---|---|
| v0.1-dev | Active development | Two-stage architecture (1D CNN RF encoder + 2D CNN SoS head), MSE training via DBUA ground truth |
Training and Evaluation Datasets
Training Dataset
Data Modality: Other: Ultrasound channel capture IQ channel data (complex-valued 3D tensors [180, 180, 1024] float32) and scalar metadata.
Training Data Size: 640 GB on disk (2,391 samples, Apache Arrow format). Unit: ultrasound channel frames.
Data Collection Method: Automatic/Sensors (Siemens Healthineers research ultrasound scanner, 180-element linear array probe)
Labeling Method: Automatic/Sensors โ all labels are derived from the acquired sensor data. No human annotation is involved.
Properties: 2,391 training samples of raw complex-valued RF ultrasound IQ data from Full Synthetic Aperture acquisitions. Machine-derived sensor data from laboratory tissue-mimicking phantoms. No personal data, no copyrighted content, no linguistic characteristics. Sensor: Siemens Healthineers research ultrasound scanner, 180-element linear array probe, 6.5 MHz center frequency, 13.33 MHz baseband sampling rate.
Evaluation Dataset
Data Collection Method: Automatic/Sensors (Siemens Healthineers research ultrasound scanner, demo phantom acquisitions)
Labeling Method: Automatic/Sensors โ same derived labeling pipeline as training data.
Properties: 350 validation samples. Same modality, format, and sensor as training data. Machine-derived sensor data from laboratory phantoms held out from training. No personal data, no copyrighted content, no linguistic characteristics.
This dataset will not be released.
Inference
Acceleration Engine: PyTorch (with torch.compile)
Test Hardware:
- NVIDIA IGX Orin (Thor iGPU)
- NVIDIA RTX PRO 6000 Blackwell (dGPU)
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.