Update Readme
#11
by
jbalam-nv
- opened
README.md
CHANGED
|
@@ -111,7 +111,7 @@ img {
|
|
| 111 |
<!-- | [](#datasets) -->
|
| 112 |
|
| 113 |
|
| 114 |
-
[Sortformer](https://arxiv.org/abs/2409.06656)[1] is a novel end-to-end neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models.
|
| 115 |
|
| 116 |
<div align="center">
|
| 117 |
<img src="sortformer_intro.png" width="750" />
|
|
@@ -119,6 +119,18 @@ img {
|
|
| 119 |
|
| 120 |
Sortformer resolves permutation problem in diarization following the arrival-time order of the speech segments from each speaker.
|
| 121 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
## Model Architecture
|
| 123 |
|
| 124 |
Sortformer consists of an L-size (18 layers) [NeMo Encoder for
|
|
|
|
| 111 |
<!-- | [](#datasets) -->
|
| 112 |
|
| 113 |
|
| 114 |
+
NVIDIA [Sortformer](https://arxiv.org/abs/2409.06656)[1] is a novel end-to-end neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models.
|
| 115 |
|
| 116 |
<div align="center">
|
| 117 |
<img src="sortformer_intro.png" width="750" />
|
|
|
|
| 119 |
|
| 120 |
Sortformer resolves permutation problem in diarization following the arrival-time order of the speech segments from each speaker.
|
| 121 |
|
| 122 |
+
## Discover more from NVIDIA:
|
| 123 |
+
For documentation, deployment guides, enterprise-ready APIs, and the latest open models—including Nemotron and other cutting-edge speech, translation, and generative AI—visit the NVIDIA Developer Portal at [developer.nvidia.com](developer.nvidia.com).
|
| 124 |
+
Join the community to access tools, support, and resources to accelerate your development with NVIDIA’s NeMo, Riva, NIM, and foundation models.<br>
|
| 125 |
+
|
| 126 |
+
### Explore more from NVIDIA: <br>
|
| 127 |
+
What is [Nemotron](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/)?<br>
|
| 128 |
+
NVIDIA Developer [Nemotron](https://developer.nvidia.com/nemotron)<br>
|
| 129 |
+
[NVIDIA Riva Speech](https://developer.nvidia.com/riva?sortBy=developer_learning_library%2Fsort%2Ffeatured_in.riva%3Adesc%2Ctitle%3Aasc#demos)<br>
|
| 130 |
+
[NeMo Documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/models.html)<br>
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
|
| 134 |
## Model Architecture
|
| 135 |
|
| 136 |
Sortformer consists of an L-size (18 layers) [NeMo Encoder for
|