Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"El Trainer","local":"el-trainer","sections":[{"title":"Uso básico","local":"uso-básico","sections":[{"title":"Los puntos de control","local":"los-puntos-de-control","sections":[],"depth":3}],"depth":2},{"title":"Personaliza el Trainer","local":"personaliza-el-trainer","sections":[{"title":"Callbacks","local":"callbacks","sections":[],"depth":3}],"depth":2},{"title":"Logging","local":"logging","sections":[],"depth":2},{"title":"NEFTune","local":"neftune","sections":[],"depth":2},{"title":"Accelerate y Trainer","local":"accelerate-y-trainer","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/transformers/main/es/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/entry/start.b4301699.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/scheduler.36a0863c.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/singletons.a6ab9d4a.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/index.733708bb.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/paths.815a7736.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/entry/app.eaf9d9af.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/index.f891bdb2.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/nodes/0.e9e5a018.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/nodes/43.6fea1fa8.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/Tip.a8272f7f.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/CodeBlock.3ec784ea.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/EditOnGithub.a58e27a9.js"> | |
| <link rel="modulepreload" href="/docs/transformers/main/es/_app/immutable/chunks/stores.300cf1d0.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"El Trainer","local":"el-trainer","sections":[{"title":"Uso básico","local":"uso-básico","sections":[{"title":"Los puntos de control","local":"los-puntos-de-control","sections":[],"depth":3}],"depth":2},{"title":"Personaliza el Trainer","local":"personaliza-el-trainer","sections":[{"title":"Callbacks","local":"callbacks","sections":[],"depth":3}],"depth":2},{"title":"Logging","local":"logging","sections":[],"depth":2},{"title":"NEFTune","local":"neftune","sections":[],"depth":2},{"title":"Accelerate y Trainer","local":"accelerate-y-trainer","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="el-trainer" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#el-trainer"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>El Trainer</span></h1> <p data-svelte-h="svelte-1kg49yy">El <code>Trainer</code> es un bucle completo de entrenamiento y evaluación para modelos de PyTorch implementado en la biblioteca Transformers. Solo necesitas pasarle las piezas necesarias para el entrenamiento (modelo, tokenizador, conjunto de datos, función de evaluación, hiperparámetros de entrenamiento, etc.), y la clase <code>Trainer</code> se encarga del resto. Esto facilita comenzar a entrenar más rápido sin tener que escribir manualmente tu propio bucle de entrenamiento. Pero al mismo tiempo, <code>Trainer</code> es muy personalizable y ofrece una gran cantidad de opciones de entrenamiento para que puedas adaptarlo a tus necesidades exactas de entrenamiento.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-14g3i1f">Además de la clase <code>Trainer</code>, Transformers también proporciona una clase <code>Seq2SeqTrainer</code> para tareas de secuencia a secuencia como traducción o resumen. También está la clase [~trl.SFTTrainer] de la biblioteca <a href="https://hf.co/docs/trl" rel="nofollow">TRL</a> que envuelve la clase <code>Trainer</code> y está optimizada para entrenar modelos de lenguaje como Llama-2 y Mistral con técnicas autoregresivas. <code>SFTTrainer</code> también admite funciones como el empaquetado de secuencias, LoRA, cuantización y DeepSpeed para escalar eficientemente a cualquier tamaño de modelo.</p> <br> <p data-svelte-h="svelte-uuzxeg">Siéntete libre de consultar <a href="./main_classes/trainer">la referencia de API</a> para estas otras clases tipo <code>Trainer</code> para aprender más sobre cuándo usar cada una. En general, <code>Trainer</code> es la opción más versátil y es apropiada para una amplia gama de tareas. <code>Seq2SeqTrainer</code> está diseñado para tareas de secuencia a secuencia y <code>SFTTrainer</code> está diseñado para entrenar modelos de lenguaje.</p></div> <p data-svelte-h="svelte-1y4mdbs">Antes de comenzar, asegúrate de tener instalado <a href="https://hf.co/docs/accelerate" rel="nofollow">Accelerate</a>, una biblioteca para habilitar y ejecutar el entrenamiento de PyTorch en entornos distribuidos.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install accelerate | |
| <span class="hljs-comment"># upgrade</span> | |
| pip install accelerate --upgrade<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-tddi8r">Esta guía proporciona una visión general de la clase <code>Trainer</code>.</p> <h2 class="relative group"><a id="uso-básico" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#uso-básico"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Uso básico</span></h2> <p data-svelte-h="svelte-58ly8j"><code>Trainer</code> incluye todo el código que encontrarías en un bucle de entrenamiento básico:</p> <ol data-svelte-h="svelte-jtunc9"><li>Realiza un paso de entrenamiento para calcular la pérdida</li> <li>Calcula los gradientes con el método [~accelerate.Accelerator.backward]</li> <li>Actualiza los pesos basados en los gradientes</li> <li>Repite este proceso hasta alcanzar un número predeterminado de épocas</li></ol> <p data-svelte-h="svelte-e0nc9r">La clase <code>Trainer</code> abstrae todo este código para que no tengas que preocuparte por escribir manualmente un bucle de entrenamiento cada vez o si estás empezando con PyTorch y el entrenamiento. Solo necesitas proporcionar los componentes esenciales requeridos para el entrenamiento, como un modelo y un conjunto de datos, y la clase <code>Trainer</code> maneja todo lo demás.</p> <p data-svelte-h="svelte-lmsc47">Si deseas especificar opciones de entrenamiento o hiperparámetros, puedes encontrarlos en la clase <code>TrainingArguments</code>. Por ejemplo, vamos a definir dónde guardar el modelo en output_dir y subir el modelo al Hub después del entrenamiento con <code>push_to_hub=True</code>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> TrainingArguments | |
| training_args = TrainingArguments( | |
| output_dir=<span class="hljs-string">"your-model"</span>, | |
| learning_rate=<span class="hljs-number">2e-5</span>, | |
| per_device_train_batch_size=<span class="hljs-number">16</span>, | |
| per_device_eval_batch_size=<span class="hljs-number">16</span>, | |
| num_train_epochs=<span class="hljs-number">2</span>, | |
| weight_decay=<span class="hljs-number">0.01</span>, | |
| eval_strategy=<span class="hljs-string">"epoch"</span>, | |
| save_strategy=<span class="hljs-string">"epoch"</span>, | |
| load_best_model_at_end=<span class="hljs-literal">True</span>, | |
| push_to_hub=<span class="hljs-literal">True</span>, | |
| )<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1h5wkew">Pase <code>training_args</code> al <code>Trainer</code> con un modelo, un conjunto de datos o algo para preprocesar el conjunto de datos (dependiendo en el tipo de datos pueda ser un tokenizer, extractor de caracteristicas o procesor del imagen), un recopilador de datos y una función para calcular las métricas que desea rastrear durante el entrenamiento.</p> <p data-svelte-h="svelte-1glhf6g">Finalmente, llame <code>train()</code> para empezar entrenamiento!</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> Trainer | |
| trainer = Trainer( | |
| model=model, | |
| args=training_args, | |
| train_dataset=dataset[<span class="hljs-string">"train"</span>], | |
| eval_dataset=dataset[<span class="hljs-string">"test"</span>], | |
| tokenizer=tokenizer, | |
| data_collator=data_collator, | |
| compute_metrics=compute_metrics, | |
| ) | |
| trainer.train()<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="los-puntos-de-control" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#los-puntos-de-control"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Los puntos de control</span></h3> <p data-svelte-h="svelte-1s1k7yc">La clase <code>Trainer</code> guarda los puntos de control del modelo en el directorio especificado en el parámetro <code>output_dir</code> de <code>TrainingArguments</code>. Encontrarás los puntos de control guardados en una subcarpeta checkpoint-000 donde los números al final corresponden al paso de entrenamiento. Guardar puntos de control es útil para reanudar el entrenamiento más tarde.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># resume from latest checkpoint</span> | |
| trainer.train(resume_from_checkpoint=<span class="hljs-literal">True</span>) | |
| <span class="hljs-comment"># resume from specific checkpoint saved in output directory</span> | |
| trainer.train(resume_from_checkpoint=<span class="hljs-string">"your-model/checkpoint-1000"</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1mc41j3">Puedes guardar tus puntos de control (por defecto, el estado del optimizador no se guarda) en el Hub configurando <code>push_to_hub=True</code> en <code>TrainingArguments</code> para confirmar y enviarlos. Otras opciones para decidir cómo se guardan tus puntos de control están configuradas en el parámetro <a href="https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.hub_strategy" rel="nofollow"><code>hub_strategy</code></a>:</p> <ul data-svelte-h="svelte-912qys"><li><p>hub_strategy=“checkpoint” envía el último punto de control a una subcarpeta llamada “last-checkpoint” desde la cual puedes reanudar el entrenamiento.</p></li> <li><p>hub_strategy=“all_checkpoints” envía todos los puntos de control al directorio definido en <code>output_dir</code> (verás un punto de control por carpeta en tu repositorio de modelos).</p></li></ul> <p data-svelte-h="svelte-146p59t">Cuando reanudas el entrenamiento desde un punto de control, el <code>Trainer</code> intenta mantener los estados de los generadores de números aleatorios (RNG) de Python, NumPy y PyTorch iguales a como estaban cuando se guardó el punto de control. Pero debido a que PyTorch tiene varias configuraciones predeterminadas no determinísticas, no se garantiza que los estados de RNG sean los mismos. Si deseas habilitar la plena determinismo, echa un vistazo a la guía <a href="https://pytorch.org/docs/stable/notes/randomness#controlling-sources-of-randomness" rel="nofollow">“Controlling sources of randomness”</a> para aprender qué puedes habilitar para hacer que tu entrenamiento sea completamente determinista. Sin embargo, ten en cuenta que al hacer ciertas configuraciones deterministas, el entrenamiento puede ser más lento.</p> <h2 class="relative group"><a id="personaliza-el-trainer" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#personaliza-el-trainer"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Personaliza el Trainer</span></h2> <p data-svelte-h="svelte-jne7w1">Si bien la clase <code>Trainer</code> está diseñada para ser accesible y fácil de usar, también ofrece mucha capacidad de personalización para usuarios más aventureros. Muchos de los métodos del <code>Trainer</code> pueden ser subclasificados y sobrescritos para admitir la funcionalidad que deseas, sin tener que reescribir todo el bucle de entrenamiento desde cero para adaptarlo. Estos métodos incluyen:</p> <ul data-svelte-h="svelte-6j0203"><li>[~Trainer.get_train_dataloader] crea un entrenamiento de DataLoader</li> <li>[~Trainer.get_eval_dataloader] crea una evaluación DataLoader</li> <li>[~Trainer.get_test_dataloader] crea una prueba de DataLoader</li> <li>[~Trainer.log] anota la información de los objetos varios que observa el entrenamiento</li> <li>[~Trainer.create_optimizer_and_scheduler] crea un optimizador y la tasa programada de aprendizaje si no lo pasaron en <strong>init</strong>; estos pueden ser personalizados independientes con [~Trainer.create_optimizer] y [~Trainer.create_scheduler] respectivamente</li> <li>[~Trainer.compute_loss] computa la pérdida en lote con las aportes del entrenamiento</li> <li>[~Trainer.training_step] realiza el paso del entrenamiento</li> <li>[~Trainer.prediction_step] realiza la predicción y paso de prueba</li> <li>[~Trainer.evaluate] evalua el modelo y da las metricas evaluativas</li> <li>[~Trainer.predict] hace las predicciones (con las metricas si hay etiquetas disponibles) en lote de prueba</li></ul> <p data-svelte-h="svelte-h7bj6z">Por ejemplo, si deseas personalizar el método <code>compute_loss()</code> para usar una pérdida ponderada en su lugar, puedes hacerlo de la siguiente manera:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> torch <span class="hljs-keyword">import</span> nn | |
| <span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> Trainer | |
| <span class="hljs-keyword">class</span> <span class="hljs-title class_">CustomTrainer</span>(<span class="hljs-title class_ inherited__">Trainer</span>): | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">compute_loss</span>(<span class="hljs-params">self, model, inputs, return_outputs=<span class="hljs-literal">False</span></span>): | |
| labels = inputs.pop(<span class="hljs-string">"labels"</span>) | |
| <span class="hljs-comment"># forward pass</span> | |
| outputs = model(**inputs) | |
| logits = outputs.get(<span class="hljs-string">"logits"</span>) | |
| <span class="hljs-comment"># compute custom loss for 3 labels with different weights</span> | |
| loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([<span class="hljs-number">1.0</span>, <span class="hljs-number">2.0</span>, <span class="hljs-number">3.0</span>], device=model.device)) | |
| loss = loss_fct(logits.view(-<span class="hljs-number">1</span>, self.model.config.num_labels), labels.view(-<span class="hljs-number">1</span>)) | |
| <span class="hljs-keyword">return</span> (loss, outputs) <span class="hljs-keyword">if</span> return_outputs <span class="hljs-keyword">else</span> loss<!-- HTML_TAG_END --></pre></div> <h3 class="relative group"><a id="callbacks" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#callbacks"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Callbacks</span></h3> <p data-svelte-h="svelte-ii4ao">Otra opción para personalizar el <code>Trainer</code> es utilizar <a href="callbacks">callbacks</a>. Los callbacks <em>no cambian nada</em> en el bucle de entrenamiento. Inspeccionan el estado del bucle de entrenamiento y luego ejecutan alguna acción (detención anticipada, registro de resultados, etc.) según el estado. En otras palabras, un callback no puede usarse para implementar algo como una función de pérdida personalizada y necesitarás subclasificar y sobrescribir el método <code>compute_loss()</code> para eso.</p> <p data-svelte-h="svelte-1efutac">Por ejemplo, si deseas agregar un callback de detención anticipada al bucle de entrenamiento después de 10 pasos.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> TrainerCallback | |
| <span class="hljs-keyword">class</span> <span class="hljs-title class_">EarlyStoppingCallback</span>(<span class="hljs-title class_ inherited__">TrainerCallback</span>): | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">__init__</span>(<span class="hljs-params">self, num_steps=<span class="hljs-number">10</span></span>): | |
| self.num_steps = num_steps | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">on_step_end</span>(<span class="hljs-params">self, args, state, control, **kwargs</span>): | |
| <span class="hljs-keyword">if</span> state.global_step >= self.num_steps: | |
| <span class="hljs-keyword">return</span> {<span class="hljs-string">"should_training_stop"</span>: <span class="hljs-literal">True</span>} | |
| <span class="hljs-keyword">else</span>: | |
| <span class="hljs-keyword">return</span> {} | |
| <!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1m0tpah">Luego, pásalo al parámetro <code>callback</code> del <code>Trainer</code>:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> Trainer | |
| trainer = Trainer( | |
| model=model, | |
| args=training_args, | |
| train_dataset=dataset[<span class="hljs-string">"train"</span>], | |
| eval_dataset=dataset[<span class="hljs-string">"test"</span>], | |
| tokenizer=tokenizer, | |
| data_collator=data_collator, | |
| compute_metrics=compute_metrics, | |
| callback=[EarlyStoppingCallback()], | |
| )<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="logging" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#logging"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Logging</span></h2> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-13mq4uk">Comprueba el API referencia <a href="./main_classes/logging">logging</a> para mas información sobre los niveles differentes de logging.</p></div> <p data-svelte-h="svelte-1vzhuch">El <code>Trainer</code> está configurado a <code>logging.INFO</code> de forma predeterminada el cual informa errores, advertencias y otra información basica. Un <code>Trainer</code> réplica - en entornos distributos - está configurado a <code>logging.WARNING</code> el cual solamente informa errores y advertencias. Puedes cambiar el nivel de logging con los parametros <a href="https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.log_level" rel="nofollow"><code>log_level</code></a> y <a href="https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.log_level_replica" rel="nofollow"><code>log_level_replica</code></a> en <code>TrainingArguments</code>.</p> <p data-svelte-h="svelte-1thl5vc">Para configurar el nivel de registro para cada nodo, usa el parámetro <a href="https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments.log_on_each_node" rel="nofollow"><code>log_on_each_node</code></a> para determinar si deseas utilizar el nivel de registro en cada nodo o solo en el nodo principal.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1byrfxg"><code>Trainer</code> establece el nivel de registro por separado para cada nodo en el método <code>Trainer.init</code>, por lo que es posible que desees considerar establecer esto antes si estás utilizando otras funcionalidades de Transformers antes de crear el objeto <code>Trainer</code>.</p></div> <p data-svelte-h="svelte-1ily46r">Por ejemplo, para establecer que tu código principal y los módulos utilicen el mismo nivel de registro según cada nodo:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->logger = logging.getLogger(__name__) | |
| logging.basicConfig( | |
| <span class="hljs-built_in">format</span>=<span class="hljs-string">"%(asctime)s - %(levelname)s - %(name)s - %(message)s"</span>, | |
| datefmt=<span class="hljs-string">"%m/%d/%Y %H:%M:%S"</span>, | |
| handlers=[logging.StreamHandler(sys.stdout)], | |
| ) | |
| log_level = training_args.get_process_log_level() | |
| logger.setLevel(log_level) | |
| datasets.utils.logging.set_verbosity(log_level) | |
| transformers.utils.logging.set_verbosity(log_level) | |
| trainer = Trainer(...)<!-- HTML_TAG_END --></pre></div> <div class="flex space-x-2 items-center my-1.5 mr-8 h-7 !pl-0 -mx-3 md:mx-0"><div class="flex items-center border rounded-lg px-1.5 py-1 leading-none select-none text-smd border-gray-800 bg-black dark:bg-gray-700 text-white">single node </div><div class="flex items-center border rounded-lg px-1.5 py-1 leading-none select-none text-smd text-gray-500 cursor-pointer opacity-90 hover:text-gray-700 dark:hover:text-gray-200 hover:shadow-sm">multi-node </div></div> <div class="language-select"><p data-svelte-h="svelte-1sh5o9n">Usa diferentes combinaciones de <code>log_level</code> y <code>log_level_replica</code> para configurar qué se registra en cada uno de los nodos.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->my_app.py ... --log_level warning --log_level_replica error<!-- HTML_TAG_END --></pre></div> </div> <h2 class="relative group"><a id="neftune" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#neftune"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>NEFTune</span></h2> <p data-svelte-h="svelte-11n3370"><a href="https://hf.co/papers/2310.05914" rel="nofollow">NEFTune</a> es una técnica que puede mejorar el rendimiento al agregar ruido a los vectores de incrustación durante el entrenamiento. Para habilitarlo en <code>Trainer</code>, establece el parámetro <code>neftune_noise_alpha</code> en <code>TrainingArguments</code> para controlar cuánto ruido se agrega.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> TrainingArguments, Trainer | |
| training_args = TrainingArguments(..., neftune_noise_alpha=<span class="hljs-number">0.1</span>) | |
| trainer = Trainer(..., args=training_args)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-c2b79h">NEFTune se desactiva después del entrenamiento para restaurar la capa de incrustación original y evitar cualquier comportamiento inesperado.</p> <h2 class="relative group"><a id="accelerate-y-trainer" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#accelerate-y-trainer"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Accelerate y Trainer</span></h2> <p data-svelte-h="svelte-dfwow9">La clase <code>Trainer</code> está impulsada por <a href="https://hf.co/docs/accelerate" rel="nofollow">Accelerate</a>, una biblioteca para entrenar fácilmente modelos de PyTorch en entornos distribuidos con soporte para integraciones como <a href="https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/" rel="nofollow">FullyShardedDataParallel (FSDP)</a> y <a href="https://www.deepspeed.ai/" rel="nofollow">DeepSpeed</a>.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-49ofm">Aprende más sobre las estrategias de fragmentación FSDP, descarga de CPU y más con el <code>Trainer</code> en la guía <a href="fsdp">Paralela de Datos Completamente Fragmentados</a>.</p></div> <p data-svelte-h="svelte-1eskknk">Para usar Accelerate con <code>Trainer</code>, ejecuta el comando <a href="https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-config" rel="nofollow"><code>accelerate.config</code></a> para configurar el entrenamiento para tu entorno de entrenamiento. Este comando crea un <code>config_file.yaml</code> que se utilizará cuando inicies tu script de entrenamiento. Por ejemplo, algunas configuraciones de ejemplo que puedes configurar son:</p> <div class="flex space-x-2 items-center my-1.5 mr-8 h-7 !pl-0 -mx-3 md:mx-0"><div class="flex items-center border rounded-lg px-1.5 py-1 leading-none select-none text-smd border-gray-800 bg-black dark:bg-gray-700 text-white">DistributedDataParallel </div><div class="flex items-center border rounded-lg px-1.5 py-1 leading-none select-none text-smd text-gray-500 cursor-pointer opacity-90 hover:text-gray-700 dark:hover:text-gray-200 hover:shadow-sm">FSDP </div><div class="flex items-center border rounded-lg px-1.5 py-1 leading-none select-none text-smd text-gray-500 cursor-pointer opacity-90 hover:text-gray-700 dark:hover:text-gray-200 hover:shadow-sm">DeepSpeed </div><div class="flex items-center border rounded-lg px-1.5 py-1 leading-none select-none text-smd text-gray-500 cursor-pointer opacity-90 hover:text-gray-700 dark:hover:text-gray-200 hover:shadow-sm">DeepSpeed with Accelerate plugin </div></div> <div class="language-select"><div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-attr">compute_environment:</span> <span class="hljs-string">LOCAL_MACHINE</span> | |
| <span class="hljs-attr">distributed_type:</span> <span class="hljs-string">MULTI_GPU</span> | |
| <span class="hljs-attr">downcast_bf16:</span> <span class="hljs-string">'no'</span> | |
| <span class="hljs-attr">gpu_ids:</span> <span class="hljs-string">all</span> | |
| <span class="hljs-attr">machine_rank:</span> <span class="hljs-number">0</span> <span class="hljs-comment">#change rank as per the node</span> | |
| <span class="hljs-attr">main_process_ip:</span> <span class="hljs-number">192.168</span><span class="hljs-number">.20</span><span class="hljs-number">.1</span> | |
| <span class="hljs-attr">main_process_port:</span> <span class="hljs-number">9898</span> | |
| <span class="hljs-attr">main_training_function:</span> <span class="hljs-string">main</span> | |
| <span class="hljs-attr">mixed_precision:</span> <span class="hljs-string">fp16</span> | |
| <span class="hljs-attr">num_machines:</span> <span class="hljs-number">2</span> | |
| <span class="hljs-attr">num_processes:</span> <span class="hljs-number">8</span> | |
| <span class="hljs-attr">rdzv_backend:</span> <span class="hljs-string">static</span> | |
| <span class="hljs-attr">same_network:</span> <span class="hljs-literal">true</span> | |
| <span class="hljs-attr">tpu_env:</span> [] | |
| <span class="hljs-attr">tpu_use_cluster:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">tpu_use_sudo:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">use_cpu:</span> <span class="hljs-literal">false</span><!-- HTML_TAG_END --></pre></div> </div> <p data-svelte-h="svelte-j3vgm6">El comando <a href="https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-launch" rel="nofollow"><code>accelerate_launch</code></a> es la forma recomendada de lanzar tu script de entrenamiento en un sistema distribuido con Accelerate y <code>Trainer</code> con los parámetros especificados en <code>config_file.yaml</code>. Este archivo se guarda en la carpeta de caché de Accelerate y se carga automáticamente cuando ejecutas <code>accelerate_launch</code>.</p> <p data-svelte-h="svelte-13vmkyy">Por ejemplo, para ejecutar el script de entrenamiento <a href="https://github.com/huggingface/transformers/blob/f4db565b695582891e43a5e042e5d318e28f20b8/examples/pytorch/text-classification/run_glue.py#L4" rel="nofollow"><code>run_glue.py</code></a> con la configuración de FSDP:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->accelerate launch \ | |
| ./examples/pytorch/text-classification/run_glue.py \ | |
| --model_name_or_path bert-base-cased \ | |
| --task_name <span class="hljs-variable">$TASK_NAME</span> \ | |
| --do_train \ | |
| --do_eval \ | |
| --max_seq_length 128 \ | |
| --per_device_train_batch_size 16 \ | |
| --learning_rate 5e-5 \ | |
| --num_train_epochs 3 \ | |
| --output_dir /tmp/<span class="hljs-variable">$TASK_NAME</span>/ \ | |
| --overwrite_output_dir<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-iryjdg">También puedes especificar los parámetros del archivo config_file.yaml directamente en la línea de comandos:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->accelerate launch --num_processes=2 \ | |
| --use_fsdp \ | |
| --mixed_precision=bf16 \ | |
| --fsdp_auto_wrap_policy=TRANSFORMER_BASED_WRAP \ | |
| --fsdp_transformer_layer_cls_to_wrap=<span class="hljs-string">"BertLayer"</span> \ | |
| --fsdp_sharding_strategy=1 \ | |
| --fsdp_state_dict_type=FULL_STATE_DICT \ | |
| ./examples/pytorch/text-classification/run_glue.py | |
| --model_name_or_path bert-base-cased \ | |
| --task_name <span class="hljs-variable">$TASK_NAME</span> \ | |
| --do_train \ | |
| --do_eval \ | |
| --max_seq_length 128 \ | |
| --per_device_train_batch_size 16 \ | |
| --learning_rate 5e-5 \ | |
| --num_train_epochs 3 \ | |
| --output_dir /tmp/<span class="hljs-variable">$TASK_NAME</span>/ \ | |
| --overwrite_output_dir<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-v4s2pp">Consulta el tutorial <a href="https://huggingface.co/docs/accelerate/basic_tutorials/launch" rel="nofollow">Lanzamiento de tus scripts con Accelerate</a> para obtener más información sobre <code>accelerate_launch</code> y las configuraciones personalizadas.</p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/transformers/blob/main/docs/source/es/trainer.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_8nzhgd = { | |
| assets: "/docs/transformers/main/es", | |
| base: "/docs/transformers/main/es", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/transformers/main/es/_app/immutable/entry/start.b4301699.js"), | |
| import("/docs/transformers/main/es/_app/immutable/entry/app.eaf9d9af.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 43], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 54 kB
- Xet hash:
- 2f86919cac22ab6bd54bcec67847ee14dbf396d5d7907996d26f6e8c4b3e8e83
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.