| <!DOCTYPE html |
| PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml" lang="en-us" xml:lang="en-us"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"></meta> |
| <meta name="copyright" content="(C) Copyright 2005"></meta> |
| <meta name="DC.rights.owner" content="(C) Copyright 2005"></meta> |
| <meta name="DC.Type" content="concept"></meta> |
| <meta name="DC.Title" content="Release Notes"></meta> |
| <meta name="DC.Coverage" content="Nsight Systems"></meta> |
| <meta name="DC.subject" content="Release Notes"></meta> |
| <meta name="keywords" content="Release Notes"></meta> |
| <meta name="DC.Format" content="XHTML"></meta> |
| <meta name="DC.Identifier" content="abstract"></meta> |
| <link rel="stylesheet" type="text/css" href="../common/formatting/commonltr.css"></link> |
| <link rel="stylesheet" type="text/css" href="../common/formatting/site.css"></link> |
| <title>Release Notes :: Nsight Systems Documentation</title> |
| |
| |
| |
| <script type="text/javascript" charset="utf-8" src="../common/scripts/tynt/tynt.js"></script> |
| --> |
| |
| <script src="https://assets.adobedtm.com/5d4962a43b79/c1061d2c5e7b/launch-191c2462b890.min.js"></script> |
| <script type="text/javascript" charset="utf-8" src="../common/formatting/jquery.min.js"></script> |
| <script type="text/javascript" charset="utf-8" src="../common/formatting/jquery.ba-hashchange.min.js"></script> |
| <script type="text/javascript" charset="utf-8" src="../common/formatting/jquery.scrollintoview.min.js"></script> |
| <script type="text/javascript" src="../search/htmlFileList.js"></script> |
| <script type="text/javascript" src="../search/htmlFileInfoList.js"></script> |
| <script type="text/javascript" src="../search/nwSearchFnt.min.js"></script> |
| <script type="text/javascript" src="../search/stemmers/en_stemmer.min.js"></script> |
| <script type="text/javascript" src="../search/index-1.js"></script> |
| <script type="text/javascript" src="../search/index-2.js"></script> |
| <script type="text/javascript" src="../search/index-3.js"></script> |
| <link rel="canonical" href="https://docs.nvidia.com/nsight-systems/ReleaseNotes/index.html"></link> |
| <link rel="stylesheet" type="text/css" href="../common/formatting/qwcode.highlight.css"></link> |
| </head> |
| <body> |
| |
| <header id="header"><span id="company">NVIDIA</span><span id="site-title">Nsight Systems Documentation</span><form id="search" method="get" action="search"> |
| <input type="text" name="search-text"></input><fieldset id="search-location"> |
| <legend>Search In:</legend> |
| <label><input type="radio" name="search-type" value="site"></input>Entire Site</label> |
| <label><input type="radio" name="search-type" value="document"></input>Just This Document</label></fieldset> |
| <button type="reset">clear search</button> |
| <button id="submit" type="submit">search</button></form> |
| </header> |
| <div id="site-content"> |
| <nav id="site-nav"> |
| <div class="category closed"><a href="../index.html" title="The root of the site.">Nsight Systems |
| v2023.1.1</a></div> |
| <div class="category"><a href="index.html" title="Release Notes">Release Notes</a></div> |
| <ul> |
| <li> |
| <div class="section-link"><a href="#whats-new">1. What's New</a></div> |
| </li> |
| <li> |
| <div class="section-link"><a href="#known-issues">2. Known Issues</a></div> |
| <ul> |
| <li> |
| <div class="section-link"><a href="#general-issues">2.1. General Issues</a></div> |
| </li> |
| <li> |
| <div class="section-link"><a href="#vgpu-issues">2.2. vGPU Issues</a></div> |
| </li> |
| <li> |
| <div class="section-link"><a href="#docker-issues">2.3. Docker Issues</a></div> |
| </li> |
| <li> |
| <div class="section-link"><a href="#cuda-trace-issues">2.4. CUDA Trace Issues</a></div> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| </nav> |
| <div id="resize-nav"></div> |
| <nav id="search-results"> |
| <h2>Search Results</h2> |
| <ol></ol> |
| </nav> |
| |
| <div id="contents-container"> |
| <div id="breadcrumbs-container"> |
| <div id="release-info">Release Notes |
| (<a href="../pdf/ReleaseNotes.pdf">PDF</a>) |
| |
| - |
| |
| v2023.1.1 |
| (<a href="https://docs.nvidia.com/nsight-systems/">older</a>) |
| - |
| Last updated January 15, 2023 |
| - |
| <a href="mailto:devtools-support@nvidia.com?subject=Nsight Systems Documentation Feedback: Release Notes">Send Feedback</a></div> |
| </div> |
| <article id="contents"> |
| <div class="topic nested0" id="abstract"><a name="abstract" shape="rect"> |
| </a><h2 class="title topictitle1"><a href="#abstract" name="abstract" shape="rect">Release Notes</a></h2> |
| <div class="body conbody"> |
| <p class="p"> |
| Release notes and known issues. |
| |
| </p> |
| </div> |
| </div> |
| <div class="topic nested0" id="whats-new"><a name="whats-new" shape="rect"> |
| </a><h2 class="title topictitle1"><a href="#whats-new" name="whats-new" shape="rect">1. What's New</a></h2> |
| <div class="body"> |
| <div class="p"> |
| <ul class="ul"> |
| <li class="li">NVIDIA Switch event sampling. |
| |
| </li> |
| <li class="li">CUDA mempool trace</li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <div class="topic nested0" id="known-issues"><a name="known-issues" shape="rect"> |
| </a><h2 class="title topictitle1"><a href="#known-issues" name="known-issues" shape="rect">2. Known Issues</a></h2> |
| <div class="body"> |
| <p class="p"></p> |
| </div> |
| <div class="topic nested1" id="general-issues"><a name="general-issues" shape="rect"> |
| </a><h3 class="title topictitle2"><a href="#general-issues" name="general-issues" shape="rect">2.1. General Issues</a></h3> |
| <div class="body"> |
| <div class="p"> |
| <ul class="ul"> |
| <li class="li"> |
| <p class="p">The current release of <span class="ph">Nsight Systems</span> |
| CLI doesn't support naming a session with a name longer than 127 |
| characters. Profiling an executable with a name exceeding 111 characters |
| is also unsupported by the <samp class="ph codeph"><span class="ph">nsys</span> profile</samp> |
| command. Those limitations will be removed in a future version of the CLI. |
| |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p"><span class="ph">Nsight Systems</span> 2020.4 introduces collection of |
| thread scheduling information without full sampling. While this allows |
| system information at a lower cost, it does add overhead. To turn off |
| thread schedule information collection add --cpuctxsw=none to your command |
| line or turn off in the GUI |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">Profiling greater than 5 minutes is not officially supported at |
| this time. Profiling high activity applications, on high performance |
| machines, over a long analysis time can create large result files that |
| may take a very long time to load, run out of memory, or lock up the |
| system. If you have a complex application, we recommend starting with a |
| short profiling session duration of no more than 5 minutes for your |
| initial profile. If your application has a natural repeating pattern, |
| often referred to as a frame, you may typically only need a few of these. |
| This suggested limit will increase in future releases. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">Attaching or re-attaching to a process from the |
| GUI is not supported with the x86_64 Linux or IBM Power target. Equivalent |
| results can be obtained by using the interactive CLI to launch the process |
| and then starting and stopping analysis at multiple points. |
| </p> |
| </li> |
| <li class="li">To reduce overhead, |
| <p class="p"><span class="ph">Nsight Systems</span> traces a subset of |
| API calls likely to impact performance when tracing APIs rather than all |
| possible calls. There is currently no way to change the subset being |
| traced when using the CLI. See respective library portion of this |
| documentation for a list of calls traced by default. The CLI limitation |
| will be removed in a future version of the product. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">There is an upper bound on the default size used by the tool to |
| record trace events during the collection. If you see the following |
| diagnostic error, then <span class="ph">Nsight Systems</span> hit the upper limit. |
| </p><pre xml:space="preserve">Reached the size limit on recording trace events for this process. |
| Try reducing the profiling duration or reduce the number of features |
| traced.</pre></li> |
| <li class="li"> |
| <p class="p">When profiling a framework or application that |
| uses CUPTI, like some versions of TensorFlow(tm), |
| <span class="ph">Nsight Systems</span> will not be |
| able to trace CUDA usage due to limitations in CUPTI. These limitations |
| will be corrected in a future version of CUPTI. Consider turning off the |
| application's use of CUPTI if CUDA tracing is required. |
| </p> |
| <p class="p">As an example, in the TensorFlow <samp class="ph codeph">mnist_with_summaries.py</samp> |
| tutorial, you will be able to use <span class="ph">Nsight Systems</span> to perform |
| CUDA trace if you remove usage of <samp class="ph codeph">RunOptions.FULL_TRACE</samp> |
| from the code. For more information, |
| <a class="xref" href="https://www.tensorflow.org/api_docs/python/tf/RunOptions" target="_blank" shape="rect">see RunOptions documentation</a>. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">Tracing an application that uses a memory allocator that is not |
| thread-safe is not supported. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">Tracing OS Runtime libraries in an application that preloads |
| glibc symbols is unsupported and can lead to undefined behavior. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p"><span class="ph">Nsight Systems</span> cannot profile |
| applications launched through a virtual window manager like GNU Screen. |
| </p> |
| </li> |
| <li class="li"> |
| <div class="p">Using <span class="ph">Nsight Systems</span> MPI trace functionality with the |
| Darshan runtime module can lead to segfaults. To resolve the issue, |
| unload the module. |
| <pre xml:space="preserve">module unload darshan-runtime</pre></div> |
| </li> |
| <li class="li"> |
| <p class="p">Profiling MPI Fortran APIs with MPI_Status as an argument, e.g. |
| MPI_Recv, MPI_Test[all], MPI_Wait[all], can potentially cause memory |
| corruption for MPICH versions 3.0.x. The reason is that the MPI_Status |
| structure in MPICH 3.0.x has a different memory layout than in other |
| MPICH versions (2.1.x and >=3.1.x have been tested) and the version (3.3.2) |
| we used to compile the <span class="ph">Nsight Systems</span> MPI interception |
| library. |
| </p> |
| </li> |
| <li class="li"> |
| <div class="p">Using <samp class="ph codeph"><span class="ph">nsys</span> export</samp> to export to |
| an SQLite database will fail if the destination filesystem doesn't |
| support file locking. The error message will mention: |
| <pre xml:space="preserve">std::exception::what: database is locked</pre></div> |
| </li> |
| <li class="li"> |
| <p class="p">On some Linux systems when VNC is used, some widgets can be rendered |
| incorrectly, or <span class="ph">Nsight Systems</span> can crash when opening |
| Analysis Summary or Diagnostics Summary pages. In this case, try forcing |
| a specific software renderer: <samp class="ph codeph">GALLIUM_DRIVER=llvmpipe nsys-ui</samp></p> |
| </li> |
| <li class="li"> |
| <p class="p">Due to <a class="xref" href="https://github.com/open-mpi/ompi/issues/6648" target="_blank" shape="rect">a known bug in Open MPI 4.0.1</a>, |
| target application may crash at the end of execution when being profiled |
| by <span class="ph">Nsight Systems</span>. To avoid the issue, use a different Open MPI |
| version, or add <samp class="ph codeph">--mca btl ^vader</samp> option to |
| <samp class="ph codeph">mpirun</samp> command line. |
| </p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <div class="topic nested1" id="vgpu-issues"><a name="vgpu-issues" shape="rect"> |
| </a><h3 class="title topictitle2"><a href="#vgpu-issues" name="vgpu-issues" shape="rect">2.2. vGPU Issues</a></h3> |
| <div class="body"> |
| <div class="p"> |
| <ul class="ul"> |
| <li class="li"> |
| <p class="p">When running <span class="ph">Nsight Systems</span> on vGPU you should always |
| use the profiler grant. See <a class="xref" href="https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#enabling-cuda-toolkit-profilers-vgpu" target="_blank" shape="rect">Virtual GPU Software Documentation</a> |
| for details on enabling NVIDIA CUDA Toolkit profilers for NVIDIA vGPUs. |
| Without the grant, unexpected migrations may crash a running session, |
| report an error and abort. It may also silently produce a corrupted |
| report which may be unloadable or show inaccurate data with no warning. |
| |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">Starting with vGPU 13.0, device level metrics collection is |
| exposed to end users even on vGPU. Device level metrics will give info |
| about all the work being executed on the GPU. The work might be in the |
| same VM or some other VM running on the same physical GPU. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">As of CUDA 11.4 and R470 TRD1 driver release, |
| <span class="ph">Nsight Systems</span> is supported in a vGPU environment which |
| requires a vGPU license. If the license is not obtained after 20 minutes, |
| the tool will still work but the reported GPU performance metrics data |
| will be inaccurate. This is because of a feature in vGPU environment |
| which reduces performance but retains functionality as specified in |
| <a class="xref" href="https://docs.nvidia.com/grid/latest/grid-licensing-user-guide/index.html#software-enforcement-grid-licensing" target="_blank" shape="rect">Grid Licensing User Guide</a>. |
| |
| </p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <div class="topic nested1" id="docker-issues"><a name="docker-issues" shape="rect"> |
| </a><h3 class="title topictitle2"><a href="#docker-issues" name="docker-issues" shape="rect">2.3. Docker Issues</a></h3> |
| <div class="body"> |
| <div class="p"> |
| <ul class="ul"> |
| <li class="li"> |
| <p class="p">In a Docker, when a system's host utilizes a kernel older than v4.3, it is not possible for <span class="ph">Nsight Systems</span> to collect sampling data unless both the host and Docker are running a RHEL or CentOS operating system utilizing kernel version |
| 3.10.1-693 or newer. A user override for this will be made available in a future version. |
| </p> |
| </li> |
| <li class="li">When <samp class="ph codeph">docker exec</samp> is called on a running container and stdout is kept open from a command invoked inside that shell, the exec shell hangs until |
| the command exits. You can avoid this issue by running with <samp class="ph codeph">docker exec --tty</samp>. See the bug reports at: |
| |
| <div class="p"> |
| <ul class="ul"> |
| <li class="li"> |
| <p class="p"><a class="xref" href="https://github.com/moby/moby/issues/33039" target="_blank" shape="rect">https://github.com/moby/moby/issues/33039</a></p> |
| </li> |
| <li class="li"> |
| <p class="p"><a class="xref" href="https://github.com/drud/ddev/issues/732" target="_blank" shape="rect">https://github.com/drud/ddev/issues/732</a></p> |
| </li> |
| </ul> |
| </div> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <div class="topic nested1" id="cuda-trace-issues"><a name="cuda-trace-issues" shape="rect"> |
| </a><h3 class="title topictitle2"><a href="#cuda-trace-issues" name="cuda-trace-issues" shape="rect">2.4. CUDA Trace Issues</a></h3> |
| <div class="body"> |
| <div class="p"> |
| <ul class="ul"> |
| <li class="li"> |
| <p class="p"> |
| When using CUDA Toolkit 10.X, tracing of DtoD memory copy operations may |
| result in a crash. To avoid this issue, update CUDA Toolkit to 11.X or |
| the latest version. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p"><span class="ph">Nsight Systems</span> will not trace kernels when a CDP (CUDA |
| Dynamic Parallelism) kernel is found in a target application on |
| Volta devices or later. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">On Tegra platforms, CUDA trace requires root privileges. Use the |
| <strong class="ph b">Launch as root</strong> checkbox in project settings to make the profiled |
| application run as root. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">If the target application uses |
| multiple streams from multiple threads, CUDA event buffers may not be |
| released properly. In this case, you will see the following diagnostic |
| error: |
| </p><pre xml:space="preserve">Couldn't allocate CUPTI bufer x times. Some CUPTI events may |
| be missing.</pre><p class="p">Please contact the <span class="ph">Nsight Systems</span> team. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">In this version of |
| <span class="ph">Nsight Systems</span>, if you are starting and stopping profiling |
| inside your application using the interactive CLI, the CUDA memory |
| allocation graph generation is only guaranteed to be correct in the first |
| profiling range. This limitation will be removed in a future version of |
| the product. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">CUDA GPU trace collection requires a fraction of |
| GPU memory. If your application utilizes all available GPU memory, CUDA |
| trace might not work or can break your application. As an example cuDNN |
| application can crash with <samp class="ph codeph">CUDNN_STATUS_INTERNAL_ERROR</samp> |
| error if GPU memory allocation fails. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">For older Linux kernels, prior to 4.4, when profiling very |
| short-lived applications (~1 second) that exit in the middle of the |
| profiling session, it is possible that <span class="ph">Nsight Systems</span> will |
| not show the CUDA events on the timeline. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">When more than 64k serialized CUDA kernels and memory copies are |
| executed in the application, you may encounter the following exception |
| during profiling: |
| </p><pre xml:space="preserve">InvalidArgumentException: "Wrong event order detected"</pre><p class="p">Please upgrade to the CUDA 9.2 driver at minimum to avoid this problem. |
| If you cannot upgrade, you can get a partial analysis, missing |
| potentially a large fraction of CUDA events, by using the CLI. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">On Vibrante, when running a profiling session with multiple targets |
| that are guest VMs in a CCC configuration behind a NAT, you may encounter |
| an error with the following text during profiling: |
| </p><pre xml:space="preserve">Failed to sync time on device.</pre><p class="p">Please edit the group connection settings, select <strong class="ph b">Targets on the |
| same SoC</strong> checkbox there and try again. |
| </p> |
| </li> |
| <li class="li"> |
| <p class="p">When using the 455 driver, as shipped with CUDA Tool Kit 11.1, and |
| tracing CUDA with <span class="ph">Nsight Systems</span> you many encounter a crash when |
| the application exits. To avoid this issue, end your profiling session |
| before the application exits or update your driver. |
| </p> |
| </li> |
| </ul> |
| </div> |
| <p class="p hr"></p> |
| </div> |
| </div> |
| </div> |
| |
| <hr id="contents-end"></hr> |
| |
| </article> |
| </div> |
| </div> |
| <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/formatting/common.min.js"></script> |
| <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/scripts/google-analytics/google-analytics-write.js"></script> |
| <script language="JavaScript" type="text/javascript" charset="utf-8" src="../common/scripts/google-analytics/google-analytics-tracker.js"></script> |
| <script type="text/javascript">_satellite.pageBottom();</script> |
| <script type="text/javascript">var switchTo5x=true;</script><script type="text/javascript">stLight.options({publisher: "998dc202-a267-4d8e-bce9-14debadb8d92", doNotHash: false, doNotCopy: false, hashAddressBar: false});</script></body> |
| </html> |