Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Automatic Speech Recognition","local":"automatic-speech-recognition","sections":[{"title":"Recommended models","local":"recommended-models","sections":[],"depth":3},{"title":"Using the API","local":"using-the-api","sections":[],"depth":3},{"title":"API specification","local":"api-specification","sections":[{"title":"Request","local":"request","sections":[],"depth":4},{"title":"Response","local":"response","sections":[],"depth":4}],"depth":3}],"depth":2}"> | |
| <link href="/docs/inference-providers/pr_1663/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/entry/start.d5f15666.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/scheduler.ddb4e551.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/singletons.0f5b782d.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/index.ce98237b.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/paths.b324c1e2.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/entry/app.68b4644d.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/index.e16e4efa.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/nodes/0.80863911.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/nodes/9.c0e7ae90.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/Tip.20abb04f.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/index.e108c5ed.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/InferenceSnippet.8df18a84.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/CodeBlock.754e6cfc.js"> | |
| <link rel="modulepreload" href="/docs/inference-providers/pr_1663/en/_app/immutable/chunks/IconCurl.399d095b.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Automatic Speech Recognition","local":"automatic-speech-recognition","sections":[{"title":"Recommended models","local":"recommended-models","sections":[],"depth":3},{"title":"Using the API","local":"using-the-api","sections":[],"depth":3},{"title":"API specification","local":"api-specification","sections":[{"title":"Request","local":"request","sections":[],"depth":4},{"title":"Response","local":"response","sections":[],"depth":4}],"depth":3}],"depth":2}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h2 class="relative group"><a id="automatic-speech-recognition" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#automatic-speech-recognition"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Automatic Speech Recognition</span></h2> <p data-svelte-h="svelte-jdm6p4">Automatic Speech Recognition (ASR), also known as Speech to Text (STT), is the task of transcribing a given audio to text.</p> <p data-svelte-h="svelte-1iml56d">Example applications:</p> <ul data-svelte-h="svelte-1k9cxyb"><li>Transcribing a podcast</li> <li>Building a voice assistant</li> <li>Generating subtitles for a video</li></ul> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1rx87kg">For more details about the <code>automatic-speech-recognition</code> task, check out its <a href="https://huggingface.co/tasks/automatic-speech-recognition" rel="nofollow">dedicated page</a>! You will find examples and related materials.</p></div> <h3 class="relative group"><a id="recommended-models" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#recommended-models"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Recommended models</span></h3> <ul data-svelte-h="svelte-1gcp2h7"><li><a href="https://huggingface.co/openai/whisper-large-v3" rel="nofollow">openai/whisper-large-v3</a>: A powerful ASR model by OpenAI.</li> <li><a href="https://huggingface.co/facebook/seamless-m4t-v2-large" rel="nofollow">facebook/seamless-m4t-v2-large</a>: An end-to-end model that performs ASR and Speech Translation by MetaAI.</li></ul> <p data-svelte-h="svelte-uhr3pl">Explore all available models and find the one that suits you best <a href="https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending" rel="nofollow">here</a>.</p> <h3 class="relative group"><a id="using-the-api" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#using-the-api"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Using the API</span></h3> <div class="flex gap-x-2 justify-between md:items-top w-full text-sm not-prose flex-col md:flex-row"> <div><p class="font-mono text-xs opacity-50 hidden md:block" data-svelte-h="svelte-1s5bpew">Language</p> <div class="my-1.5 flex items-center gap-x-1 gap-y-0.5 flex-wrap"><button class="text-md flex select-none items-center rounded-lg border px-1.5 py-1 leading-none border-gray-800 bg-black text-white dark:bg-gray-700" type="button"><svg class="mr-1.5 text-current" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M15.84.5a16.4,16.4,0,0,0-3.57.32C9.1,1.39,8.53,2.53,8.53,4.64V7.48H16v1H5.77a4.73,4.73,0,0,0-4.7,3.74,14.82,14.82,0,0,0,0,7.54c.57,2.28,1.86,3.82,4,3.82h2.6V20.14a4.73,4.73,0,0,1,4.63-4.63h7.38a3.72,3.72,0,0,0,3.73-3.73V4.64A4.16,4.16,0,0,0,19.65.82,20.49,20.49,0,0,0,15.84.5ZM11.78,2.77a1.39,1.39,0,0,1,1.38,1.46,1.37,1.37,0,0,1-1.38,1.38A1.42,1.42,0,0,1,10.4,4.23,1.44,1.44,0,0,1,11.78,2.77Z" fill="#5a9fd4"></path><path d="M16.16,31.5a16.4,16.4,0,0,0,3.57-.32c3.17-.57,3.74-1.71,3.74-3.82V24.52H16v-1H26.23a4.73,4.73,0,0,0,4.7-3.74,14.82,14.82,0,0,0,0-7.54c-.57-2.28-1.86-3.82-4-3.82h-2.6v3.41a4.73,4.73,0,0,1-4.63,4.63H12.35a3.72,3.72,0,0,0-3.73,3.73v7.14a4.16,4.16,0,0,0,3.73,3.82A20.49,20.49,0,0,0,16.16,31.5Zm4.06-2.27a1.39,1.39,0,0,1-1.38-1.46,1.37,1.37,0,0,1,1.38-1.38,1.42,1.42,0,0,1,1.38,1.38A1.44,1.44,0,0,1,20.22,29.23Z" fill="#ffd43b"></path></svg> Python </button><button class="text-md flex select-none items-center rounded-lg border px-1.5 py-1 leading-none hover:shadow-xs cursor-pointer text-gray-500 opacity-90 hover:text-gray-700 dark:hover:text-gray-200" type="button"><svg class="mr-1.5 text-current" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><rect width="32" height="32" fill="#f7df1e"></rect><path d="M21.5,25a3.27,3.27,0,0,0,3,1.83c1.25,0,2-.63,2-1.49,0-1-.81-1.39-2.19-2L23.56,23C21.39,22.1,20,20.94,20,18.49c0-2.25,1.72-4,4.41-4a4.44,4.44,0,0,1,4.27,2.41l-2.34,1.5a2,2,0,0,0-1.93-1.29,1.31,1.31,0,0,0-1.44,1.29c0,.9.56,1.27,1.85,1.83l.75.32c2.55,1.1,4,2.21,4,4.72,0,2.71-2.12,4.19-5,4.19a5.78,5.78,0,0,1-5.48-3.07Zm-10.63.26c.48.84.91,1.55,1.94,1.55s1.61-.39,1.61-1.89V14.69h3V25c0,3.11-1.83,4.53-4.49,4.53a4.66,4.66,0,0,1-4.51-2.75Z"></path></svg> JavaScript </button><button class="text-md flex select-none items-center rounded-lg border px-1.5 py-1 leading-none hover:shadow-xs cursor-pointer text-gray-500 opacity-90 hover:text-gray-700 dark:hover:text-gray-200" type="button"><svg class="mr-1.5 text-current" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><rect width="32" height="32" rx="4" fill="#1683a5"></rect><path d="M6.71,14A5,5,0,0,1,8.82,9.29l2.64-2.2c1.67-1.37,2.52-1.41,4.6-1.41H21.7c1.19,0,2.45.27,2.45,1.79s-1.4,1.78-2.45,1.78H15.44a3.31,3.31,0,0,0-2,.89L11.24,12c-.55.44-1,.81-1,1.52v4.41c0,.7.41,1.07,1,1.52l2.16,1.82a3.34,3.34,0,0,0,2,.89H21.7c1.05,0,2.45.23,2.45,1.78s-1.26,1.78-2.45,1.78H16.06c-2.08,0-2.94,0-4.6-1.4L8.82,22.09A5.05,5.05,0,0,1,6.71,17.4Z" fill="#fff"></path></svg> cURL </button></div></div> <div><p class="font-mono text-xs opacity-50 hidden md:block" data-svelte-h="svelte-1kuuf89">Client</p> <div class="my-1.5 flex items-center gap-x-1 gap-y-0.5 flex-wrap"><button class="text-md flex select-none items-center rounded-lg border px-1.5 py-1 leading-none border-gray-800 bg-black text-white dark:bg-gray-700" type="button">huggingface_hub </button><button class="text-md flex select-none items-center rounded-lg border px-1.5 py-1 leading-none hover:shadow-xs cursor-pointer text-gray-500 opacity-90 hover:text-gray-700 dark:hover:text-gray-200" type="button">requests </button></div></div> <div><p class="font-mono text-xs opacity-50 hidden md:block" data-svelte-h="svelte-1p9m5m3">Provider</p> <div class="my-1.5 flex items-center gap-x-1 gap-y-0.5 flex-wrap"><button class="text-md flex select-none items-center rounded-lg border px-1.5 py-1 leading-none border-gray-800 bg-black text-white dark:bg-gray-700" type="button"><svg class="mr-1.5 text-current" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 26 26"><path fill-rule="evenodd" clip-rule="evenodd" d="M16.5899 2.37891C16.9579 2.37891 17.2529 2.67812 17.2881 3.04443C17.6019 6.31174 20.2023 8.91191 23.4698 9.22569C23.8361 9.26089 24.1353 9.55582 24.1353 9.92378V16.0761C24.1353 16.4441 23.8361 16.739 23.4698 16.7742C20.2023 17.088 17.6019 19.6881 17.2881 22.9555C17.2529 23.3218 16.9579 23.621 16.5899 23.621H10.4373C10.0692 23.621 9.77432 23.3218 9.73912 22.9555C9.42534 19.6881 6.82494 17.088 3.5574 16.7742C3.19109 16.739 2.89185 16.4441 2.89185 16.0761V9.92378C2.89185 9.55582 3.19109 9.26089 3.55741 9.22569C6.82494 8.91191 9.42534 6.31174 9.73912 3.04443C9.77432 2.67812 10.0692 2.37891 10.4373 2.37891H16.5899ZM7.15714 12.982C7.15714 16.5163 10.0192 19.3814 13.5498 19.3814C17.0804 19.3814 19.9426 16.5163 19.9426 12.982C19.9426 9.44762 17.0804 6.58248 13.5498 6.58248C10.0192 6.58248 7.15714 9.44762 7.15714 12.982Z" fill="currentColor"></path></svg> fal </button><button class="text-md flex select-none items-center rounded-lg border px-1.5 py-1 leading-none hover:shadow-xs cursor-pointer text-gray-500 opacity-90 hover:text-gray-700 dark:hover:text-gray-200" type="button"><svg class="mr-1.5 text-current" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 26 26"><rect x="3.34856" y="3.02654" width="19.9474" height="19.9474" rx="2.95009" fill="#FFD21E" stroke="#FFB41E" stroke-width="1.18004"></rect><path fill-rule="evenodd" clip-rule="evenodd" d="M7.69336 9.74609V16.9754H9.32329V13.9595H11.8181V16.9754H13.4591V9.74609H11.8181V12.5292H9.32329V9.74609H7.69336ZM15.1646 9.74609V16.9754H16.7945V14.1702H19.3004V12.7953H16.7945V11.121H19.7217V9.74609H15.1646Z" fill="#814D00"></path></svg> HF Inference API </button> </div></div> <div><p class="font-mono text-xs invisible hidden md:block" data-svelte-h="svelte-hnzs25">Settings</p> <div class="flex not-prose my-1.5"><div class="relative hidden md:block "> <button class=" " type="button"> <button class="text-md flex select-none items-center rounded-lg border px-1.5 py-1 leading-none hover:shadow-xs cursor-pointer text-gray-500 opacity-90 hover:text-gray-700 dark:hover:text-gray-200" type="button" title="Settings dropdown"><svg class="mr-1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 9 7"><path fill="currentColor" d="M8.537 1.153H7.361A1.445 1.445 0 0 0 5.954 0c-.689 0-1.263.49-1.407 1.153H.5v.576h4.047a1.445 1.445 0 0 0 1.407 1.153c.689 0 1.263-.49 1.407-1.153h1.176v-.576M5.954 2.305a.847.847 0 0 1-.861-.864c0-.49.373-.865.861-.865s.861.375.861.865-.373.864-.861.864M.5 5.764h1.177a1.445 1.445 0 0 0 1.406 1.152c.69 0 1.263-.49 1.407-1.152h4.047v-.577H4.49a1.445 1.445 0 0 0-1.407-1.152c-.688 0-1.263.49-1.406 1.152H.5v.577M3.083 4.61c.488 0 .862.375.862.864 0 .49-.374.865-.862.865a.847.847 0 0 1-.86-.865c0-.49.372-.864.86-.864"></path></svg> | |
| Settings</button> </button> </div> <div class="relative md:hidden "> <button class=" " type="button"> <button class="text-md flex select-none items-center rounded-lg border px-1.5 py-1 leading-none hover:shadow-xs cursor-pointer text-gray-500 opacity-90 hover:text-gray-700 dark:hover:text-gray-200" type="button" title="Settings dropdown"><svg class="mr-1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 9 7"><path fill="currentColor" d="M8.537 1.153H7.361A1.445 1.445 0 0 0 5.954 0c-.689 0-1.263.49-1.407 1.153H.5v.576h4.047a1.445 1.445 0 0 0 1.407 1.153c.689 0 1.263-.49 1.407-1.153h1.176v-.576M5.954 2.305a.847.847 0 0 1-.861-.864c0-.49.373-.865.861-.865s.861.375.861.865-.373.864-.861.864M.5 5.764h1.177a1.445 1.445 0 0 0 1.406 1.152c.69 0 1.263-.49 1.407-1.152h4.047v-.577H4.49a1.445 1.445 0 0 0-1.407-1.152c-.688 0-1.263.49-1.406 1.152H.5v.577M3.083 4.61c.488 0 .862.375.862.864 0 .49-.374.865-.862.865a.847.847 0 0 1-.86-.865c0-.49.372-.864.86-.864"></path></svg> | |
| Settings</button> </button> </div> <div class="flex-grow md:hidden"></div></div></div></div> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> huggingface_hub <span class="hljs-keyword">import</span> InferenceClient | |
| client = InferenceClient( | |
| provider=<span class="hljs-string">"fal-ai"</span>, | |
| api_key=<span class="hljs-string">"hf_xxxxxxxxxxxxxxxxxxxxxxxx"</span>, | |
| ) | |
| output = client.automatic_speech_recognition(<span class="hljs-string">"sample1.flac"</span>, model=<span class="hljs-string">"openai/whisper-large-v3"</span>)<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="api-specification" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#api-specification"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>API specification</span></h3> <h4 class="relative group"><a id="request" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#request"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Request</span></h4> <table data-svelte-h="svelte-men3hv"><thead><tr><th align="left">Payload</th> <th align="left"></th> <th align="left"></th></tr></thead> <tbody><tr><td align="left"><strong>inputs*</strong></td> <td align="left"><em>string</em></td> <td align="left">The input audio data as a base64-encoded string. If no <code>parameters</code> are provided, you can also provide the audio data as a raw bytes payload.</td></tr> <tr><td align="left"><strong>parameters</strong></td> <td align="left"><em>object</em></td> <td align="left"></td></tr> <tr><td align="left"><strong> return_timestamps</strong></td> <td align="left"><em>boolean</em></td> <td align="left">Whether to output corresponding timestamps with the generated text</td></tr> <tr><td align="left"><strong> generation_parameters</strong></td> <td align="left"><em>object</em></td> <td align="left"></td></tr> <tr><td align="left"><strong> temperature</strong></td> <td align="left"><em>number</em></td> <td align="left">The value used to modulate the next token probabilities.</td></tr> <tr><td align="left"><strong> top_k</strong></td> <td align="left"><em>integer</em></td> <td align="left">The number of highest probability vocabulary tokens to keep for top-k-filtering.</td></tr> <tr><td align="left"><strong> top_p</strong></td> <td align="left"><em>number</em></td> <td align="left">If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.</td></tr> <tr><td align="left"><strong> typical_p</strong></td> <td align="left"><em>number</em></td> <td align="left">Local typicality measures how similar the conditional probability of predicting a target token next is to the expected conditional probability of predicting a random token next, given the partial text already generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that add up to typical_p or higher are kept for generation. See <a href="https://hf.co/papers/2202.00666" rel="nofollow">this paper</a> for more details.</td></tr> <tr><td align="left"><strong> epsilon_cutoff</strong></td> <td align="left"><em>number</em></td> <td align="left">If set to float strictly between 0 and 1, only tokens with a conditional probability greater than epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the size of the model. See <a href="https://hf.co/papers/2210.15191" rel="nofollow">Truncation Sampling as Language Model Desmoothing</a> for more details.</td></tr> <tr><td align="left"><strong> eta_cutoff</strong></td> <td align="left"><em>number</em></td> <td align="left">Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between 0 and 1, a token is only considered if it is greater than either eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model. See <a href="https://hf.co/papers/2210.15191" rel="nofollow">Truncation Sampling as Language Model Desmoothing</a> for more details.</td></tr> <tr><td align="left"><strong> max_length</strong></td> <td align="left"><em>integer</em></td> <td align="left">The maximum length (in tokens) of the generated text, including the input.</td></tr> <tr><td align="left"><strong> max_new_tokens</strong></td> <td align="left"><em>integer</em></td> <td align="left">The maximum number of tokens to generate. Takes precedence over max_length.</td></tr> <tr><td align="left"><strong> min_length</strong></td> <td align="left"><em>integer</em></td> <td align="left">The minimum length (in tokens) of the generated text, including the input.</td></tr> <tr><td align="left"><strong> min_new_tokens</strong></td> <td align="left"><em>integer</em></td> <td align="left">The minimum number of tokens to generate. Takes precedence over min_length.</td></tr> <tr><td align="left"><strong> do_sample</strong></td> <td align="left"><em>boolean</em></td> <td align="left">Whether to use sampling instead of greedy decoding when generating new tokens.</td></tr> <tr><td align="left"><strong> early_stopping</strong></td> <td align="left"><em>enum</em></td> <td align="left">Possible values: never, true, false.</td></tr> <tr><td align="left"><strong> num_beams</strong></td> <td align="left"><em>integer</em></td> <td align="left">Number of beams to use for beam search.</td></tr> <tr><td align="left"><strong> num_beam_groups</strong></td> <td align="left"><em>integer</em></td> <td align="left">Number of groups to divide num_beams into in order to ensure diversity among different groups of beams. See <a href="https://hf.co/papers/1610.02424" rel="nofollow">this paper</a> for more details.</td></tr> <tr><td align="left"><strong> penalty_alpha</strong></td> <td align="left"><em>number</em></td> <td align="left">The value balances the model confidence and the degeneration penalty in contrastive search decoding.</td></tr> <tr><td align="left"><strong> use_cache</strong></td> <td align="left"><em>boolean</em></td> <td align="left">Whether the model should use the past last key/values attentions to speed up decoding</td></tr></tbody></table> <p data-svelte-h="svelte-xa4wks">Some options can be configured by passing headers to the Inference API. Here are the available headers:</p> <table data-svelte-h="svelte-2rfiu7"><thead><tr><th align="left">Headers</th> <th align="left"></th> <th align="left"></th></tr></thead> <tbody><tr><td align="left"><strong>authorization</strong></td> <td align="left"><em>string</em></td> <td align="left">Authentication header in the form <code>'Bearer: hf_****'</code> when <code>hf_****</code> is a personal user access token with Inference API permission. You can generate one from <a href="https://huggingface.co/settings/tokens" rel="nofollow">your settings page</a>.</td></tr> <tr><td align="left"><strong>x-use-cache</strong></td> <td align="left"><em>boolean, default to <code>true</code></em></td> <td align="left">There is a cache layer on the inference API to speed up requests we have already seen. Most models can use those results as they are deterministic (meaning the outputs will be the same anyway). However, if you use a nondeterministic model, you can set this parameter to prevent the caching mechanism from being used, resulting in a real new query. Read more about caching <a href="../parameters#caching%5D">here</a>.</td></tr> <tr><td align="left"><strong>x-wait-for-model</strong></td> <td align="left"><em>boolean, default to <code>false</code></em></td> <td align="left">If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error, as it will limit hanging in your application to known places. Read more about model availability <a href="../overview#eligibility%5D">here</a>.</td></tr></tbody></table> <p data-svelte-h="svelte-1ps9cb1">For more information about Inference API headers, check out the parameters <a href="../parameters">guide</a>.</p> <h4 class="relative group"><a id="response" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#response"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Response</span></h4> <table data-svelte-h="svelte-4hq5ft"><thead><tr><th align="left">Body</th> <th align="left"></th> <th align="left"></th></tr></thead> <tbody><tr><td align="left"><strong>text</strong></td> <td align="left"><em>string</em></td> <td align="left">The recognized text.</td></tr> <tr><td align="left"><strong>chunks</strong></td> <td align="left"><em>object[]</em></td> <td align="left">When returnTimestamps is enabled, chunks contains a list of audio chunks identified by the model.</td></tr> <tr><td align="left"><strong> text</strong></td> <td align="left"><em>string</em></td> <td align="left">A chunk of text identified by the model</td></tr> <tr><td align="left"><strong> timestamp</strong></td> <td align="left"><em>number[]</em></td> <td align="left">The start and end timestamps corresponding with the text</td></tr></tbody></table> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/hub-docs/blob/main/docs/inference-providers/tasks/automatic-speech-recognition.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1o5mypj = { | |
| assets: "/docs/inference-providers/pr_1663/en", | |
| base: "/docs/inference-providers/pr_1663/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/inference-providers/pr_1663/en/_app/immutable/entry/start.d5f15666.js"), | |
| import("/docs/inference-providers/pr_1663/en/_app/immutable/entry/app.68b4644d.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 9], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 31.9 kB
- Xet hash:
- ba4bffb0d86c2d34d0e6316d939204d7a56a35350a3a65ef30bff9a880c69bcf
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.