Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / transformers /main /zh /llm_tutorial.html

rtrm

3 months ago

download

raw

51.9 kB

	<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"使用LLMs进行生成","local":"使用llms进行生成","sections":[],"depth":2}">
	<link href="/docs/transformers/main/zh/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/entry/start.a61b9c50.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/scheduler.9991993c.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/singletons.2822fe91.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/index.02cfeb18.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/paths.d66588b4.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/entry/app.99775688.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/index.7fc9a5e7.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/nodes/0.f4c5a5c1.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/each.e59479a4.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/nodes/25.a5af4afa.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/Tip.9de92fc6.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/CodeBlock.e11cba92.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/DocNotebookDropdown.a0cb4c0f.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/globals.7f7f1b26.js">
	<link rel="modulepreload" href="/docs/transformers/main/zh/_app/immutable/chunks/EditOnGithub.84ab7f0e.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"使用LLMs进行生成","local":"使用llms进行生成","sections":[],"depth":2}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h2 class="relative group"><a id="使用llms进行生成" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#使用llms进行生成"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>使用LLMs进行生成</span></h2> <div class="flex space-x-1 absolute z-10 right-0 top-0"> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg"> </button> </div> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg"> </button> </div></div> <p data-svelte-h="svelte-8lgfa7">LLMs，即大语言模型，是文本生成背后的关键组成部分。简单来说，它们包含经过大规模预训练的transformer模型，用于根据给定的输入文本预测下一个词（或更准确地说，下一个<code>token</code>）。由于它们一次只预测一个<code>token</code>，因此除了调用模型之外，您需要执行更复杂的操作来生成新的句子——您需要进行自回归生成。</p> <p data-svelte-h="svelte-lef7ht">自回归生成是在给定一些初始输入，通过迭代调用模型及其自身的生成输出来生成文本的推理过程。在🤗 Transformers中，这由<a href="/docs/transformers/main/zh/main_classes/text_generation#transformers.GenerationMixin.generate">generate()</a>方法处理，所有具有生成能力的模型都可以使用该方法。</p> <p data-svelte-h="svelte-lx14wd">本教程将向您展示如何：</p> <ul data-svelte-h="svelte-13m7zir"><li>使用LLM生成文本</li> <li>避免常见的陷阱</li> <li>帮助您充分利用LLM下一步指导</li></ul> <p data-svelte-h="svelte-noc1o7">在开始之前，请确保已安装所有必要的库：</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install transformers bitsandbytes>=0.39.0 -q<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="生成文本" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#生成文本"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>生成文本</span></h2> <p data-svelte-h="svelte-1wbwdf2">一个用于<a href="tasks/language_modeling">因果语言建模</a>训练的语言模型，将文本<code>tokens</code>序列作为输入，并返回下一个<code>token</code>的概率分布。</p> <figure class="image table text-center m-0 w-full" data-svelte-h="svelte-4ssllr"><video style="max-width: 90%; margin: auto;" autoplay="" loop="" muted="" playsinline="" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/assisted-generation/gif_1_1080p.mov"></video> <figcaption>"LLM的前向传递"</figcaption></figure> <p data-svelte-h="svelte-1ikd8ml">使用LLM进行自回归生成的一个关键方面是如何从这个概率分布中选择下一个<code>token</code>。这个步骤可以随意进行，只要最终得到下一个迭代的<code>token</code>。这意味着可以简单的从概率分布中选择最可能的<code>token</code>，也可以复杂的在对结果分布进行采样之前应用多种变换，这取决于你的需求。</p> <figure class="image table text-center m-0 w-full" data-svelte-h="svelte-bzy8d0"><video style="max-width: 90%; margin: auto;" autoplay="" loop="" muted="" playsinline="" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/assisted-generation/gif_2_1080p.mov"></video> <figcaption>"自回归生成迭代地从概率分布中选择下一个token以生成文本"</figcaption></figure> <p data-svelte-h="svelte-yw7aac">上述过程是迭代重复的，直到达到某个停止条件。理想情况下，停止条件由模型决定，该模型应学会在何时输出一个结束序列（<code>EOS</code>）标记。如果不是这种情况，生成将在达到某个预定义的最大长度时停止。</p> <p data-svelte-h="svelte-scvmxm">正确设置<code>token</code>选择步骤和停止条件对于让你的模型按照预期的方式执行任务至关重要。这就是为什么我们为每个模型都有一个[~generation.GenerationConfig]文件，它包含一个效果不错的默认生成参数配置，并与您模型一起加载。</p> <p data-svelte-h="svelte-23apfz">让我们谈谈代码！</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1jb7moc">如果您对基本的LLM使用感兴趣，我们高级的<a href="pipeline_tutorial"><code>Pipeline</code></a>接口是一个很好的起点。然而，LLMs通常需要像<code>quantization</code>和<code>token选择步骤的精细控制</code>等高级功能，这最好通过<a href="/docs/transformers/main/zh/main_classes/text_generation#transformers.GenerationMixin.generate">generate()</a>来完成。使用LLM进行自回归生成也是资源密集型的操作，应该在GPU上执行以获得足够的吞吐量。</p></div> <p data-svelte-h="svelte-11xv34o">首先，您需要加载模型。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoModelForCausalLM

	<span class="hljs-meta">>>> </span>model = AutoModelForCausalLM.from_pretrained(
	<span class="hljs-meta">... </span> <span class="hljs-string">"mistralai/Mistral-7B-v0.1"</span>, device_map=<span class="hljs-string">"auto"</span>, load_in_4bit=<span class="hljs-literal">True</span>
	<span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-6j494d">您将会注意到在<code>from_pretrained</code>调用中的两个标志：</p> <ul data-svelte-h="svelte-4d7bnv"><li><code>device_map</code>确保模型被移动到您的GPU(s)上</li> <li><code>load_in_4bit</code>应用<a href="main_classes/quantization">4位动态量化</a>来极大地减少资源需求</li></ul> <p data-svelte-h="svelte-seh5kq">还有其他方式来初始化一个模型，但这是一个开始使用LLM很好的起点。</p> <p data-svelte-h="svelte-57ytyl">接下来，你需要使用一个<a href="tokenizer_summary">tokenizer</a>来预处理你的文本输入。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoTokenizer

	<span class="hljs-meta">>>> </span>tokenizer = AutoTokenizer.from_pretrained(<span class="hljs-string">"mistralai/Mistral-7B-v0.1"</span>, padding_side=<span class="hljs-string">"left"</span>)
	<span class="hljs-meta">>>> </span>model_inputs = tokenizer([<span class="hljs-string">"A list of colors: red, blue"</span>], return_tensors=<span class="hljs-string">"pt"</span>).to(<span class="hljs-string">"cuda"</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1b81zmv"><code>model_inputs</code>变量保存着分词后的文本输入以及注意力掩码。尽管<a href="/docs/transformers/main/zh/main_classes/text_generation#transformers.GenerationMixin.generate">generate()</a>在未传递注意力掩码时会尽其所能推断出注意力掩码，但建议尽可能传递它以获得最佳结果。</p> <p data-svelte-h="svelte-1d607k0">在对输入进行分词后，可以调用<a href="/docs/transformers/main/zh/main_classes/text_generation#transformers.GenerationMixin.generate">generate()</a>方法来返回生成的<code>tokens</code>。生成的<code>tokens</code>应该在打印之前转换为文本。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>generated_ids = model.generate(**model_inputs)
	<span class="hljs-meta">>>> </span>tokenizer.batch_decode(generated_ids, skip_special_tokens=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]
	<span class="hljs-string">'A list of colors: red, blue, green, yellow, orange, purple, pink,'</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-dnai18">最后，您不需要一次处理一个序列！您可以批量输入，这将在小延迟和低内存成本下显著提高吞吐量。您只需要确保正确地填充您的输入（详见下文）。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>tokenizer.pad_token = tokenizer.eos_token <span class="hljs-comment"># Most LLMs don't have a pad token by default</span>
	<span class="hljs-meta">>>> </span>model_inputs = tokenizer(
	<span class="hljs-meta">... </span> [<span class="hljs-string">"A list of colors: red, blue"</span>, <span class="hljs-string">"Portugal is"</span>], return_tensors=<span class="hljs-string">"pt"</span>, padding=<span class="hljs-literal">True</span>
	<span class="hljs-meta">... </span>).to(<span class="hljs-string">"cuda"</span>)
	<span class="hljs-meta">>>> </span>generated_ids = model.generate(**model_inputs)
	<span class="hljs-meta">>>> </span>tokenizer.batch_decode(generated_ids, skip_special_tokens=<span class="hljs-literal">True</span>)
	[<span class="hljs-string">'A list of colors: red, blue, green, yellow, orange, purple, pink,'</span>,
	<span class="hljs-string">'Portugal is a country in southwestern Europe, on the Iber'</span>]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1eqj8jy">就是这样！在几行代码中，您就可以利用LLM的强大功能。</p> <h2 class="relative group"><a id="常见陷阱" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#常见陷阱"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>常见陷阱</span></h2> <p data-svelte-h="svelte-1awc9fu">有许多<a href="generation_strategies">生成策略</a>，有时默认值可能不适合您的用例。如果您的输出与您期望的结果不匹配，我们已经创建了一个最常见的陷阱列表以及如何避免它们。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoModelForCausalLM, AutoTokenizer

	<span class="hljs-meta">>>> </span>tokenizer = AutoTokenizer.from_pretrained(<span class="hljs-string">"mistralai/Mistral-7B-v0.1"</span>)
	<span class="hljs-meta">>>> </span>tokenizer.pad_token = tokenizer.eos_token <span class="hljs-comment"># Most LLMs don't have a pad token by default</span>
	<span class="hljs-meta">>>> </span>model = AutoModelForCausalLM.from_pretrained(
	<span class="hljs-meta">... </span> <span class="hljs-string">"mistralai/Mistral-7B-v0.1"</span>, device_map=<span class="hljs-string">"auto"</span>, load_in_4bit=<span class="hljs-literal">True</span>
	<span class="hljs-meta">... </span>)<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="生成的输出太短太长" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#生成的输出太短太长"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>生成的输出太短/太长</span></h3> <p data-svelte-h="svelte-a1qysq">如果在<a href="/docs/transformers/main/zh/main_classes/text_generation#transformers.GenerationConfig">GenerationConfig</a>文件中没有指定，<code>generate</code>默认返回20个tokens。我们强烈建议在您的<code>generate</code>调用中手动设置<code>max_new_tokens</code>以控制它可以返回的最大新tokens数量。请注意，LLMs（更准确地说，仅<a href="https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt" rel="nofollow">解码器模型</a>）也将输入提示作为输出的一部分返回。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>model_inputs = tokenizer([<span class="hljs-string">"A sequence of numbers: 1, 2"</span>], return_tensors=<span class="hljs-string">"pt"</span>).to(<span class="hljs-string">"cuda"</span>)

	<span class="hljs-meta">>>> </span><span class="hljs-comment"># By default, the output will contain up to 20 tokens</span>
	<span class="hljs-meta">>>> </span>generated_ids = model.generate(**model_inputs)
	<span class="hljs-meta">>>> </span>tokenizer.batch_decode(generated_ids, skip_special_tokens=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]
	<span class="hljs-string">'A sequence of numbers: 1, 2, 3, 4, 5'</span>

	<span class="hljs-meta">>>> </span><span class="hljs-comment"># Setting `max_new_tokens` allows you to control the maximum length</span>
	<span class="hljs-meta">>>> </span>generated_ids = model.generate(**model_inputs, max_new_tokens=<span class="hljs-number">50</span>)
	<span class="hljs-meta">>>> </span>tokenizer.batch_decode(generated_ids, skip_special_tokens=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]
	<span class="hljs-string">'A sequence of numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,'</span><!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="错误的生成模式" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#错误的生成模式"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>错误的生成模式</span></h3> <p data-svelte-h="svelte-14l9x7e">默认情况下，除非在<a href="/docs/transformers/main/zh/main_classes/text_generation#transformers.GenerationConfig">GenerationConfig</a>文件中指定，否则<code>generate</code>会在每个迭代中选择最可能的token（贪婪解码）。对于您的任务，这可能是不理想的；像聊天机器人或写作文章这样的创造性任务受益于采样。另一方面，像音频转录或翻译这样的基于输入的任务受益于贪婪解码。通过将<code>do_sample=True</code>启用采样，您可以在这篇<a href="https://huggingface.co/blog/how-to-generate" rel="nofollow">博客文章</a>中了解更多关于这个话题的信息。</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-comment"># Set seed or reproducibility -- you don't need this unless you want full reproducibility</span>
	<span class="hljs-meta">>>> </span><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> set_seed
	<span class="hljs-meta">>>> </span>set_seed(<span class="hljs-number">42</span>)

	<span class="hljs-meta">>>> </span>model_inputs = tokenizer([<span class="hljs-string">"I am a cat."</span>], return_tensors=<span class="hljs-string">"pt"</span>).to(<span class="hljs-string">"cuda"</span>)

	<span class="hljs-meta">>>> </span><span class="hljs-comment"># LLM + greedy decoding = repetitive, boring output</span>
	<span class="hljs-meta">>>> </span>generated_ids = model.generate(**model_inputs)
	<span class="hljs-meta">>>> </span>tokenizer.batch_decode(generated_ids, skip_special_tokens=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]
	<span class="hljs-string">'I am a cat. I am a cat. I am a cat. I am a cat'</span>

	<span class="hljs-meta">>>> </span><span class="hljs-comment"># With sampling, the output becomes more creative!</span>
	<span class="hljs-meta">>>> </span>generated_ids = model.generate(**model_inputs, do_sample=<span class="hljs-literal">True</span>)
	<span class="hljs-meta">>>> </span>tokenizer.batch_decode(generated_ids, skip_special_tokens=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]
	<span class="hljs-string">'I am a cat. Specifically, I am an indoor-only cat. I'</span><!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="错误的填充位置" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#错误的填充位置"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>错误的填充位置</span></h3> <p data-svelte-h="svelte-pao9hp">LLMs是<a href="https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt" rel="nofollow">仅解码器</a>架构，意味着它们会持续迭代您的输入提示。如果您的输入长度不相同，则需要对它们进行填充。由于LLMs没有接受过从<code>pad tokens</code>继续训练，因此您的输入需要左填充。确保在生成时不要忘记传递注意力掩码！</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span><span class="hljs-comment"># The tokenizer initialized above has right-padding active by default: the 1st sequence,</span>
	<span class="hljs-meta">>>> </span><span class="hljs-comment"># which is shorter, has padding on the right side. Generation fails to capture the logic.</span>
	<span class="hljs-meta">>>> </span>model_inputs = tokenizer(
	<span class="hljs-meta">... </span> [<span class="hljs-string">"1, 2, 3"</span>, <span class="hljs-string">"A, B, C, D, E"</span>], padding=<span class="hljs-literal">True</span>, return_tensors=<span class="hljs-string">"pt"</span>
	<span class="hljs-meta">... </span>).to(<span class="hljs-string">"cuda"</span>)
	<span class="hljs-meta">>>> </span>generated_ids = model.generate(**model_inputs)
	<span class="hljs-meta">>>> </span>tokenizer.batch_decode(generated_ids, skip_special_tokens=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]
	<span class="hljs-string">'1, 2, 33333333333'</span>

	<span class="hljs-meta">>>> </span><span class="hljs-comment"># With left-padding, it works as expected!</span>
	<span class="hljs-meta">>>> </span>tokenizer = AutoTokenizer.from_pretrained(<span class="hljs-string">"mistralai/Mistral-7B-v0.1"</span>, padding_side=<span class="hljs-string">"left"</span>)
	<span class="hljs-meta">>>> </span>tokenizer.pad_token = tokenizer.eos_token <span class="hljs-comment"># Most LLMs don't have a pad token by default</span>
	<span class="hljs-meta">>>> </span>model_inputs = tokenizer(
	<span class="hljs-meta">... </span> [<span class="hljs-string">"1, 2, 3"</span>, <span class="hljs-string">"A, B, C, D, E"</span>], padding=<span class="hljs-literal">True</span>, return_tensors=<span class="hljs-string">"pt"</span>
	<span class="hljs-meta">... </span>).to(<span class="hljs-string">"cuda"</span>)
	<span class="hljs-meta">>>> </span>generated_ids = model.generate(**model_inputs)
	<span class="hljs-meta">>>> </span>tokenizer.batch_decode(generated_ids, skip_special_tokens=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>]
	<span class="hljs-string">'1, 2, 3, 4, 5, 6,'</span><!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="错误的提示" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#错误的提示"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>错误的提示</span></h3> <p data-svelte-h="svelte-15btsci">一些模型和任务期望某种输入提示格式才能正常工作。当未应用此格式时，您将获得悄然的性能下降：模型能工作，但不如预期提示那样好。有关提示的更多信息，包括哪些模型和任务需要小心，可在<a href="tasks/prompting">指南</a>中找到。让我们看一个使用<a href="chat_templating">聊天模板</a>的聊天LLM示例：</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-meta">>>> </span>tokenizer = AutoTokenizer.from_pretrained(<span class="hljs-string">"HuggingFaceH4/zephyr-7b-alpha"</span>)
	<span class="hljs-meta">>>> </span>model = AutoModelForCausalLM.from_pretrained(
	<span class="hljs-meta">... </span> <span class="hljs-string">"HuggingFaceH4/zephyr-7b-alpha"</span>, device_map=<span class="hljs-string">"auto"</span>, load_in_4bit=<span class="hljs-literal">True</span>
	<span class="hljs-meta">... </span>)
	<span class="hljs-meta">>>> </span>set_seed(<span class="hljs-number">0</span>)
	<span class="hljs-meta">>>> </span>prompt = <span class="hljs-string">"""How many helicopters can a human eat in one sitting? Reply as a thug."""</span>
	<span class="hljs-meta">>>> </span>model_inputs = tokenizer([prompt], return_tensors=<span class="hljs-string">"pt"</span>).to(<span class="hljs-string">"cuda"</span>)
	<span class="hljs-meta">>>> </span>input_length = model_inputs.input_ids.shape[<span class="hljs-number">1</span>]
	<span class="hljs-meta">>>> </span>generated_ids = model.generate(**model_inputs, max_new_tokens=<span class="hljs-number">20</span>)
	<span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>])
	<span class="hljs-string">"I'm not a thug, but i can tell you that a human cannot eat"</span>
	<span class="hljs-meta">>>> </span><span class="hljs-comment"># Oh no, it did not follow our instruction to reply as a thug! Let's see what happens when we write</span>
	<span class="hljs-meta">>>> </span><span class="hljs-comment"># a better prompt and use the right template for this model (through `tokenizer.apply_chat_template`)</span>

	<span class="hljs-meta">>>> </span>set_seed(<span class="hljs-number">0</span>)
	<span class="hljs-meta">>>> </span>messages = [
	<span class="hljs-meta">... </span> {
	<span class="hljs-meta">... </span> <span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>,
	<span class="hljs-meta">... </span> <span class="hljs-string">"content"</span>: <span class="hljs-string">"You are a friendly chatbot who always responds in the style of a thug"</span>,
	<span class="hljs-meta">... </span> },
	<span class="hljs-meta">... </span> {<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"How many helicopters can a human eat in one sitting?"</span>},
	<span class="hljs-meta">... </span>]
	<span class="hljs-meta">>>> </span>model_inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=<span class="hljs-literal">True</span>, return_tensors=<span class="hljs-string">"pt"</span>).to(<span class="hljs-string">"cuda"</span>)
	<span class="hljs-meta">>>> </span>input_length = model_inputs.shape[<span class="hljs-number">1</span>]
	<span class="hljs-meta">>>> </span>generated_ids = model.generate(model_inputs, do_sample=<span class="hljs-literal">True</span>, max_new_tokens=<span class="hljs-number">20</span>)
	<span class="hljs-meta">>>> </span><span class="hljs-built_in">print</span>(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=<span class="hljs-literal">True</span>)[<span class="hljs-number">0</span>])
	<span class="hljs-string">'None, you thug. How bout you try to focus on more useful questions?'</span>
	<span class="hljs-meta">>>> </span><span class="hljs-comment"># As we can see, it followed a proper thug style 😎</span><!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="更多资源" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#更多资源"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>更多资源</span></h2> <p data-svelte-h="svelte-d2e6yq">虽然自回归生成过程相对简单，但要充分利用LLM可能是一个具有挑战性的任务，因为很多组件复杂且密切关联。以下是帮助您深入了解LLM使用和理解的下一步：</p> <h3 class="relative group"><a id="高级生成用法" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#高级生成用法"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>高级生成用法</span></h3> <ol data-svelte-h="svelte-18r5ovd"><li><a href="generation_strategies">指南</a>，介绍如何控制不同的生成方法、如何设置生成配置文件以及如何进行输出流式传输；</li> <li><a href="chat_templating">指南</a>，介绍聊天LLMs的提示模板；</li> <li><a href="tasks/prompting">指南</a>，介绍如何充分利用提示设计；</li> <li>API参考文档，包括<a href="/docs/transformers/main/zh/main_classes/text_generation#transformers.GenerationConfig">GenerationConfig</a>、<a href="/docs/transformers/main/zh/main_classes/text_generation#transformers.GenerationMixin.generate">generate()</a>和<a href="internal/generation_utils">与生成相关的类</a>。</li></ol> <h3 class="relative group"><a id="llm排行榜" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#llm排行榜"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>LLM排行榜</span></h3> <ol data-svelte-h="svelte-e1h2fe"><li><a href="https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard" rel="nofollow">Open LLM Leaderboard</a>, 侧重于开源模型的质量;</li> <li><a href="https://huggingface.co/spaces/optimum/llm-perf-leaderboard" rel="nofollow">Open LLM-Perf Leaderboard</a>, 侧重于LLM的吞吐量.</li></ol> <h3 class="relative group"><a id="延迟吞吐量和内存利用率" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#延迟吞吐量和内存利用率"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>延迟、吞吐量和内存利用率</span></h3> <ol data-svelte-h="svelte-1aq8t2a"><li><a href="llm_tutorial_optimization">指南</a>,如何优化LLMs以提高速度和内存利用；</li> <li><a href="main_classes/quantization">指南</a>, 关于<code>quantization</code>，如bitsandbytes和autogptq的指南，教您如何大幅降低内存需求。</li></ol> <h3 class="relative group"><a id="相关库" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#相关库"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>相关库</span></h3> <ol data-svelte-h="svelte-1ubnemc"><li><a href="https://github.com/huggingface/text-generation-inference" rel="nofollow"><code>text-generation-inference</code></a>, 一个面向生产的LLM服务器；</li> <li><a href="https://github.com/huggingface/optimum" rel="nofollow"><code>optimum</code></a>, 一个🤗 Transformers的扩展，优化特定硬件设备的性能</li></ol> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/transformers/blob/main/docs/source/zh/llm_tutorial.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>

	<script>
	{
	__sveltekit_173uja2 = {
	assets: "/docs/transformers/main/zh",
	base: "/docs/transformers/main/zh",
	env: {}
	};

	const element = document.currentScript.parentElement;

	const data = [null,null];

	Promise.all([
	import("/docs/transformers/main/zh/_app/immutable/entry/start.a61b9c50.js"),
	import("/docs/transformers/main/zh/_app/immutable/entry/app.99775688.js")
	]).then(([kit, app]) => {
	kit.start(app, element, {
	node_ids: [0, 25],
	data,
	form: null,
	error: null
	});
	});
	}
	</script>

Xet Storage Details

Size:: 51.9 kB
Xet hash:: 4c0ee6fc0658564fe59ad32dd2b2919289f858b2155533d508ce28ee9a215735

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.