| Field | Response | |
| :------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- | |
| Intended Domain: | Speech to text transcription | |
| Model Type: | ASR | |
| Intended Users: | This model is intended for developers, researchers, academics, and industries building conversational based applications. | |
| Output: | Text | |
| Describe how the model works: | The model transcribes audio input into text in the input language. | |
| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable | |
| Technical Limitations & Mitigation: | Transcripts may not be 100% accurate. Accuracy varies based on the characteristics of input audio (Domain, Use Case, Accent, Noise, Speech Type, Context of speech, etc.) | |
| Verified to have met prescribed NVIDIA quality standards: | Yes | |
| Performance Metrics: | Word Error Rate (WER), Silence Robustness (characters per minute of silent audio), Latency (milliseconds), Throughput (total audio processed per unit of time). | |
| Potential Known Risks: | Not recommended for word-for-word transcription, as accuracy varies with the characteristics of the input audio (domain, use case, accent, noise level, speech type, and speech context). | |
| Licensing: | GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement (found at [https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)) | |