devdz commited on Mar 29, 2025

Commit

5eae308

verified ·

1 Parent(s): 9538b62

Upload 369 files

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +4 -0
t5x-main/.github/workflows/build.yaml +39 -0
t5x-main/CONTRIBUTING.md +1 -0
t5x-main/LICENSE +202 -0
t5x-main/README.md +525 -0
t5x-main/docs/_static/t5x_theme.css +23 -0
t5x-main/docs/_templates/autosummary/t5x_module.rst +23 -0
t5x-main/docs/api_reference/index.rst +100 -0
t5x-main/docs/api_reference/t5x.adafactor.rst +7 -0
t5x-main/docs/api_reference/t5x.binary_search.rst +7 -0
t5x-main/docs/api_reference/t5x.checkpoint_importer.rst +7 -0
t5x-main/docs/api_reference/t5x.checkpoint_utils.rst +7 -0
t5x-main/docs/api_reference/t5x.checkpoints.rst +7 -0
t5x-main/docs/api_reference/t5x.config_utils.rst +7 -0
t5x-main/docs/api_reference/t5x.decoding.rst +7 -0
t5x-main/docs/api_reference/t5x.eval.rst +7 -0
t5x-main/docs/api_reference/t5x.gin_utils.rst +7 -0
t5x-main/docs/api_reference/t5x.infer.rst +7 -0
t5x-main/docs/api_reference/t5x.interactive_model.rst +7 -0
t5x-main/docs/api_reference/t5x.losses.rst +7 -0
t5x-main/docs/api_reference/t5x.main.rst +7 -0
t5x-main/docs/api_reference/t5x.metrics.rst +7 -0
t5x-main/docs/api_reference/t5x.models.rst +7 -0
t5x-main/docs/api_reference/t5x.optimizers.rst +7 -0
t5x-main/docs/api_reference/t5x.partitioning.rst +7 -0
t5x-main/docs/api_reference/t5x.state_utils.rst +7 -0
t5x-main/docs/api_reference/t5x.test_utils.rst +7 -0
t5x-main/docs/api_reference/t5x.train.rst +7 -0
t5x-main/docs/api_reference/t5x.train_state.rst +7 -0
t5x-main/docs/api_reference/t5x.trainer.rst +7 -0
t5x-main/docs/api_reference/t5x.utils.rst +7 -0
t5x-main/docs/conf.py +132 -0
t5x-main/docs/conf_sphinx_patch.py +202 -0
t5x-main/docs/contributions.md +64 -0
t5x-main/docs/index.md +65 -0
t5x-main/docs/index.rst +24 -0
t5x-main/docs/models.md +318 -0
t5x-main/docs/overview.md +2 -0
t5x-main/docs/requirements.txt +8 -0
t5x-main/docs/t5x.png +3 -0
t5x-main/docs/tutorials.md +51 -0
t5x-main/docs/usage/auxiliary.md +204 -0
t5x-main/docs/usage/decoding.md +199 -0
t5x-main/docs/usage/eval.md +226 -0
t5x-main/docs/usage/finetune.md +286 -0
t5x-main/docs/usage/gin.md +395 -0
t5x-main/docs/usage/gpu-usage.md +87 -0
t5x-main/docs/usage/index.rst +16 -0
t5x-main/docs/usage/infer-files.md +217 -0
t5x-main/docs/usage/infer-seqio.md +241 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+t5x-main/docs/t5x.png filter=lfs diff=lfs merge=lfs -text
+t5x-main/t5x/testdata/mtf_tiny_t5/model.ckpt-0.data-00001-of-00002 filter=lfs diff=lfs merge=lfs -text
+t5x-main/t5x/testdata/mtf_tiny_t5/model.ckpt-0.meta filter=lfs diff=lfs merge=lfs -text
+t5x-main/t5x/testdata/test_t5_tiny.checkpoint_0 filter=lfs diff=lfs merge=lfs -text

t5x-main/.github/workflows/build.yaml ADDED Viewed

	@@ -0,0 +1,39 @@

+name: build
+on: [push]
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python
+      uses: actions/setup-python@v4
+      with:
+        python-version: '3.10.x'
+        cache: 'pip'
+        cache-dependency-path: setup.py
+    - name: Install dependencies
+      run: |
+        pip install -e .[test] -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
+    - name: Test with pytest
+      run: |
+        pytest
+    # The below step just reports the success or failure of tests as a "commit status".
+    # This is needed for copybara integration.
+    - name: Report success or failure as github status
+      if: always()
+      shell: bash
+      run: |
+        status="${{ job.status }}"
+        lowercase_status=$(echo $status | tr '[:upper:]' '[:lower:]')
+        curl -sS --request POST \
+        --url https://api.github.com/repos/${{ github.repository }}/statuses/${{ github.sha }} \
+        --header 'authorization: Bearer ${{ secrets.GITHUB_TOKEN }}' \
+        --header 'content-type: application/json' \
+        --data '{
+            "state": "'$lowercase_status'",
+            "target_url": "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}",
+            "description": "'$status'",
+            "context": "github-actions/build"
+            }'

t5x-main/CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ External contributions are not accepted, sorry!

t5x-main/LICENSE ADDED Viewed

	@@ -0,0 +1,202 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

t5x-main/README.md ADDED Viewed

	@@ -0,0 +1,525 @@

+# T5X
+*Go to [T5X ReadTheDocs Documentation Page](https://t5x.readthedocs.io/).*
+T5X is a modular, composable, research-friendly framework for high-performance,
+configurable, self-service training, evaluation, and inference of sequence
+models (starting with language) at many scales.
+It is essentially a new and improved implementation of the
+[T5 codebase](https://github.com/google-research/text-to-text-transfer-transformer)
+(based on [Mesh TensorFlow](https://github.com/tensorflow/mesh)) in [JAX](https://github.com/google/jax) and [Flax](https://github.com/google/flax). To learn
+more, see the [T5X Paper](https://arxiv.org/abs/2203.17189).
+Below is a quick start guide for training models with TPUs on Google Cloud. For
+additional tutorials and background, see the [complete documentation](docs/index.md).
+## Quickstart (Recommended)
+T5X can be run with [XManager](https://github.com/deepmind/xmanager) on
+[Vertex AI](https://cloud.google.com/vertex-ai). Vertex AI is a platform for
+training that creates TPU instances and runs code on the TPUs. Vertex AI will
+also shut down the TPUs when the jobs terminate. This is signifcantly easier
+than managing GCE VMs and TPU VM instances.
+1. Follow the pre-requisites and directions to install [XManager](https://github.com/deepmind/xmanager).
+2. Request TPU quota as required. GCP projects come with 8 cores by default,
+which is enough to run one training experiment on a single TPU host. If you want
+to run multi-host training or run multiple trials in parallel, you will need
+more quota. Navigate to [Quotas](https://console.cloud.google.com/quotas).
+  The quota you want is:
+  * Service: `Vertex AI API`
+  * Dimensions (location): `us-central1`
+  * If you want to run single-host experiments:
+    * `Custom model training TPU V2 cores per region`
+    * `Custom model training TPU V3 cores per region`
+  * If you want to run multi-host experiments:
+    * `Custom model training TPU V2 pod cores per region`
+    * `Custom model training TPU V3 pod cores per region`
+  TIP: You won't be able to run single-host experiments with multi-host quota.
+  (i.e. you can't run `tpu_v2=8` using `TPU V2 pod`)
+3. Launch the xmanager script located at `t5x/scripts/xm_launch.py`.
+As a running example, we use the WMT14 En-De translation which is described in
+more detail in the Examples section below.
+```sh
+export GOOGLE_CLOUD_BUCKET_NAME=...
+export TFDS_DATA_DIR=gs://$GOOGLE_CLOUD_BUCKET_NAME/t5x/data
+export MODEL_DIR=gs://$GOOGLE_CLOUD_BUCKET_NAME/t5x/$(date +%Y%m%d)
+# Pre-download dataset in multi-host experiments.
+tfds build wmt_t2t_translate --data_dir=$TFDS_DATA_DIR
+git clone https://github.com/google-research/t5x
+cd ./t5x/
+python3 ./t5x/scripts/xm_launch.py \
+  --gin_file=t5x/examples/t5/t5_1_1/examples/base_wmt_from_scratch.gin \
+  --model_dir=$MODEL_DIR \
+  --tfds_data_dir=$TFDS_DATA_DIR
+```
+Check `gs://$GOOGLE_CLOUD_BUCKET_NAME/t5x/` for the output artifacts, which can
+be read by TensorBoard.
+## GPU Usage
+Note: NVIDIA has released an updated version of this repository with H100 FP8 support and broad GPU performance improvements. Please visit the [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x) repository for more details and usage instructions.
+T5X can be run easily on GPUs either in single-node configurations or multi-node configurations with a SLURM+pyxis cluster. Further instructions at [t5x/contrib/gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/README.md). The `t5x/contrib/gpu/scripts_gpu` folder contains example scripts for pretraining T5X on [The Pile](https://pile.eleuther.ai/) and for finetuning on SQuAD and MNLI. These scripts and associated `gin` configurations also contain additional GPU optimizations for better throughput. More examples and instructions can be found in the [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x) repository maintained by NVIDIA with H100 FP8 support and broad GPU performance improvements.
+## Installation
+Note that all the commands in this document should be run in the commandline of
+the TPU VM instance unless otherwise stated.
+1.  Follow the
+    [instructions](https://cloud.google.com/tpu/docs/jax-quickstart-tpu-vm#install_the_google_cloud_sdk)
+    to set up a Google Cloud Platform (GCP) account and enable the Cloud TPU
+    API.
+    **Note:** T5X also works with GPU, please follow instructions in [t5x/contrib/gpu](https://github.com/google-research/t5x/blob/main/t5x/contrib/gpu/README.md) if you'd like to use GPU version.
+2.  Create a
+    [Cloud TPU VM instance](https://cloud.google.com/blog/products/compute/introducing-cloud-tpu-vms)
+    following
+    [this instruction](https://cloud.google.com/tpu/docs/jax-quickstart-tpu-vm#create-vm).
+    We recommend that you develop your workflow in a single v3-8 TPU (i.e.,
+    `--accelerator-type=v3-8`) and scale up to pod slices once the pipeline is
+    ready. In this README, we focus on using a single v3-8 TPU. See
+    [here](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm) to
+    learn more about TPU architectures.
+3.  With Cloud TPU VMs, you ssh directly into the host machine of the TPU VM.
+    You can install packages, run your code run, etc. in the host machine. Once
+    the TPU instance is created, ssh into it with
+    ```sh
+    gcloud alpha compute tpus tpu-vm ssh ${TPU_NAME} --zone=${ZONE}
+    ```
+    where `TPU_NAME` and `ZONE` are the name and the zone used in step 2.
+4.  Install T5X and the dependencies.
+    ```sh
+    git clone --branch=main https://github.com/google-research/t5x
+    cd t5x
+    python3 -m pip install -e '.[tpu]' -f \
+      https://storage.googleapis.com/jax-releases/libtpu_releases.html
+    ```
+5.  Create Google Cloud Storage (GCS) bucket to store the dataset and model
+    checkpoints. To create a GCS bucket, see these
+    [instructions](https://cloud.google.com/storage/docs/creating-buckets).
+6.  (optional) If you prefer working with Jupyter/Colab style environment
+    you can setup a custom Colab runtime by following steps from
+    [t5x/notebooks](https://github.com/google-research/t5x/blob/main/t5x/notebooks/README.md).
+## Example: English to German translation
+As a running example, we use the WMT14 En-De translation. The raw dataset is
+available in TensorFlow Datasets as
+["wmt_t2t_translate"](https://www.tensorflow.org/datasets/catalog/wmt_t2t_translate).
+T5 casts the translation task such as the following
+```py
+{'en': 'That is good.', 'de': 'Das ist gut.'}
+```
+to the form called "text-to-text":
+```py
+{'inputs': 'translate English to German: That is good.', 'targets': 'Das ist gut.'}
+```
+This formulation allows many different classes of language tasks to be expressed
+in a uniform manner and a single encoder-decoder architecture can handle them
+without any task-specific parameters. For more detail, refer to the [T5 paper
+(Raffel et al. 2019)][t5_paper].
+For a scalable data pipeline and an evaluation framework, we use
+[`SeqIO`](https://github.com/google/seqio), which was factored out of the [T5
+library][t5_github]. A `seqio.Task` packages together the raw dataset, vocabulary,
+preprocessing such as tokenization and evaluation metrics such as
+[BLEU](https://aclanthology.org/P02-1040.pdf) and provides a
+[`tf.data`](https://www.tensorflow.org/guide/data) instance.
+[The T5 library][t5_github] provides a number of `seqio.Task`s that were used in the
+[T5 paper][t5_paper]. In this example, we use [wmt_t2t_ende_v003](https://github.com/google-research/text-to-text-transfer-transformer/blob/d81c0bab2a41b4d5dfbe4971de32f7d67df65f31/t5/data/tasks.py#L212).
+Before training or fine-tuning you need to download ["wmt_t2t_translate"]
+(https://www.tensorflow.org/datasets/catalog/wmt_t2t_translate) dataset first.
+```sh
+# Data dir to save the processed dataset in "gs://data_dir" format.
+TFDS_DATA_DIR="..."
+# Make sure that dataset package is up-to-date.
+python3 -m pip install --upgrade tfds-nightly
+# Pre-download dataset.
+tfds build wmt_t2t_translate ${TFDS_DATA_DIR}
+```
+### Training
+To run a training job, we use the `t5x/train.py` script.
+```sh
+# Model dir to save logs, ckpts, etc. in "gs://model_dir" format.
+MODEL_DIR="..."
+T5X_DIR="..."  # directory where the T5X repo is cloned.
+TFDS_DATA_DIR="..."
+python3 ${T5X_DIR}/t5x/train.py \
+  --gin_file="t5x/examples/t5/t5_1_1/examples/base_wmt_from_scratch.gin" \
+  --gin.MODEL_DIR=\"${MODEL_DIR}\" \
+  --tfds_data_dir=${TFDS_DATA_DIR}
+```
+The configuration for this training run is defined in the Gin file
+[base_wmt_from_scratch.gin](t5x/examples/t5/t5_1_1/examples/base_wmt_from_scratch.gin).
+[Gin-config](https://github.com/google/gin-config) is a library to handle
+configurations based on dependency injection. Among many benefits, Gin allows
+users to pass custom components such as a custom model to the T5X library
+without having to modify the core library. The [custom
+components](#custom-components) section shows how this is done.
+While the core library is independent of Gin, it is central to the examples we
+provide. Therefore, we provide a short [introduction][gin-primer] to Gin in the
+context of T5X.  All the configurations are written to a file "config.gin" in
+`MODEL_DIR`. This makes debugging as well as reproducing the experiment much
+easier.
+In addition to the `config.json`, `model-info.txt` file summarizes the model
+parameters (shape, names of the axes, partitioning info) as well as the
+optimizer states.
+#### TensorBoard
+To monitor the training in [TensorBoard](https://www.tensorflow.org/tensorboard), it is much easier (due to
+authentification issues) to launch the TensorBoard on your own machine and _not_ in
+the TPU VM. So in the commandline where you ssh'ed into the TPU VM, launch the
+TensorBoard with the `logdir` pointing to the `MODEL_DIR`.
+```sh
+# NB: run this on your machine not TPU VM!
+MODEL_DIR="..."  # Copy from the TPU VM.
+tensorboard --logdir=${MODEL_DIR}
+```
+Or you can launch the TensorBoard inside a Colab. In a Colab cell, run
+```python
+from google.colab import auth
+auth.authenticate_user()
+```
+to authorize the Colab to access the GCS bucket and launch the TensorBoard.
+```python
+%load_ext tensorboard
+model_dir = "..."  # Copy from the TPU VM.
+%tensorboard --logdir=model_dir
+```
+### Fine-tuning
+We can leverage the benefits of self-supervised pre-training by initializing
+from one of our pre-trained models. Here we use the T5.1.1 Base checkpoint.
+```sh
+# Model dir to save logs, ckpts, etc. in "gs://model_dir" format.
+MODEL_DIR="..."
+# Data dir to save the processed dataset in "gs://data_dir" format.
+TFDS_DATA_DIR="..."
+T5X_DIR="..."  # directory where the T5X repo is cloned.
+python3 ${T5X_DIR}/t5x/train.py \
+  --gin_file="t5x/examples/t5/t5_1_1/examples/base_wmt_finetune.gin" \
+  --gin.MODEL_DIR=\"${MODEL_DIR}\" \
+  --tfds_data_dir=${TFDS_DATA_DIR}
+```
+**Note:** when supplying a string, dict, list, tuple value, or a bash variable
+via a flag, you must put it in quotes. In the case of strings, it requires
+escaped quotes (`\"<string>\"`). For example:
+`--gin.utils.DatasetConfig.split=\"validation\"` or
+`--gin.MODEL_DIR=\"${MODEL_DIR}\"`.
+Gin makes it easy to change a number of configurations. For example, you can
+change the `partitioning.PjitPartitioner.num_partitions` (overriding
+the value in
+[base_wmt_from_scratch.gin](t5x/examples/t5/t5_1_1/examples/base_wmt_from_scratch.gin))
+to chanage the parallelism strategy and pass it as a commandline arg.
+```sh
+--gin.partitioning.PjitPartitioner.num_partitions=8
+```
+### Evaluation
+To run the offline (i.e. without training) evaluation, you can use `t5x/eval.py`
+script.
+```sh
+EVAL_OUTPUT_DIR="..."  # directory to write eval output
+T5X_DIR="..."  # directory where the t5x is cloned, e.g., ${HOME}"/t5x".
+TFDS_DATA_DIR="..."
+CHECKPOINT_PATH="..."
+python3 ${T5X_DIR}/t5x/eval.py \
+  --gin_file="t5x/examples/t5/t5_1_1/examples/base_wmt_eval.gin" \
+  --gin.CHECKPOINT_PATH=\"${CHECKPOINT_PATH}\" \
+  --gin.EVAL_OUTPUT_DIR=\"${EVAL_OUTPUT_DIR}\" \
+  --tfds_data_dir=${TFDS_DATA_DIR}
+```
+### Inference
+To run inference, you can use `t5x/infer.py` script. Here we use the same
+`seqio.Task`, but for inference we do not use the targets features other than
+logging them alongside the prediction in a JSON file.
+```sh
+INFER_OUTPUT_DIR="..."  # directory to write infer output
+T5X_DIR="..."  # directory where the t5x is cloned, e.g., ${HOME}"/t5x".
+TFDS_DATA_DIR="..."
+CHECKPOINT_PATH="..."
+python3 ${T5X_DIR}/t5x/infer.py \
+  --gin_file="t5x/examples/t5/t5_1_1/examples/base_wmt_infer.gin" \
+  --gin.CHECKPOINT_PATH=\"${CHECKPOINT_PATH}\" \
+  --gin.INFER_OUTPUT_DIR=\"${INFER_OUTPUT_DIR}\" \
+  --tfds_data_dir=${TFDS_DATA_DIR}
+```
+### Exporting as TensorFlow Saved Model
+Pretrained model can be exported as TensorFlow Saved Model, and deployed
+to Vertex AI Prediction service using [Optimized TensorFlow Runtime]
+(https://cloud.google.com/vertex-ai/docs/predictions/optimized-tensorflow-runtime).
+Please note that exported model won't work on OSS based
+[TensorFlow Model Server](https://github.com/tensorflow/serving).
+```sh
+T5X_DIR="..."  # directory where the t5x is cloned, e.g., ${HOME}"/t5x".
+CHECKPOINT_PATH="..."
+BATCH_SIZE=None
+BEAM_SIZE=1
+# Use 'bfloat16' if you plan to run exported model on NVIDIA A100 or newer GPUs,
+# for other GPUs use 'float32'.
+ACTIVATION_DTYPE=bfloat16
+# Version numbers must be numeric. We generate one based on datetime.
+VERSION=$(date +%Y%m%d%H%M%S)
+NAME=t5x_base_${ACTIVATION_DTYPE}  # Model name.
+# Path to export model to. Note that export script is going to add _cpu suffix
+# after model name.
+OUTPUT=${CHECKPOINT_PATH}/saved_model.${NAME}/${VERSION}
+declare -a ARGS=(
+--gin_file=t5x/examples/t5/t5_1_1/base.gin
+--gin_file=t5x/t5x/configs/runs/export.gin
+--gin.TASK_FEATURE_LENGTHS="{'inputs': 256, 'targets': 256}"
+--gin.CHECKPOINT_PATH=\"${CHECKPOINT_PATH}\"
+--gin.MODEL_NAME=\"/ml/${USER}/t5x_base\"
+--gin.MODEL_OUTPUT_DIR=\"${OUTPUT}\"
+--gin.BEAM_SIZE=${BEAM_SIZE}
+--gin.BATCH_SIZE=${BATCH_SIZE}
+--gin.export_lib.save.partitioner=None
+--gin.export_lib.save.warmup_examples="['hello world']"
+--gin.export_lib.ExportableModule.use_batch_function=False
+--gin.export_lib.ExportableModule.use_gpu=False
+--gin.export_lib.ExportableModule.jit_compile=False
+--gin.ACTIVATION_DTYPE=\"${ACTIVATION_DTYPE}\"
+--gin.network.T5Config.dtype=\"${ACTIVATION_DTYPE}\"
+--gin.utils.RestoreCheckpointConfig.dtype=\"${ACTIVATION_DTYPE}\"
+--gin.DROPOUT_RATE=0.0
+)
+(python3 ${T5X_DIR}/t5x/export.py "${ARGS[@]}")
+```
+For detailed arguments definition refer to [export.gin]
+(t5x/configs/runs/export.gin).
+You can run XL and smaller models on NVIDIA A100 40GB, and XXL models on
+NVIDIA A100 80GB.
+## Custom components
+[The translation example](#example-english-to-german-translation) uses the
+encoder-decoder model that T5X provides as well as the dataset from the T5
+library. This section shows how you can use your own dataset and a model and
+pass via Gin.
+### Example: custom dataset in a user directory
+For this example, we have the following directory structure with
+`${HOME}/dir1/user_dir` representing a user directory with custom components.
+```
+${HOME}
+└── dir1
+    └── user_dir
+        ├── t5_1_1_base_de_en.gin
+        └── tasks.py
+```
+As an example, let's define a new dataset. Here we use the same Translation
+dataset but we define the translation task in the opposite direction, i.e.,
+German to English intead of English to German. We define this task in `tasks.py`
+```py
+# ${HOME}/dir1/user_dir/tasks.py
+import functools
+import seqio
+import tensorflow_datasets as tfds
+from t5.evaluation import metrics
+from t5.data import preprocessors
+vocabulary = seqio.SentencePieceVocabulary(
+    'gs://t5-data/vocabs/cc_all.32000/sentencepiece.model', extra_ids=100)
+output_features = {
+    'inputs': seqio.Feature(vocabulary=vocabulary),
+    'targets': seqio.Feature(vocabulary=vocabulary)
+}
+seqio.TaskRegistry.add(
+    'wmt_t2t_de_en_v003',
+    source=seqio.TfdsDataSource(tfds_name='wmt_t2t_translate/de-en:1.0.0'),
+    preprocessors=[
+        functools.partial(
+            preprocessors.translate,
+            source_language='de', target_language='en'),
+        seqio.preprocessors.tokenize,
+        seqio.CacheDatasetPlaceholder(),
+        seqio.preprocessors.append_eos_after_trim,
+    ],
+    metric_fns=[metrics.bleu],
+    output_features=output_features)
+```
+In the Gin file, most of the settings are equivalent to those used in the
+[En->De example](#example-english-to-german-translation). So we include the Gin
+file from that example. To use "wmt_t2t_de_en_v003" task we just defined, we
+need to import the task module "tasks.py". Note that we use a relative path
+defined with respect to the user directory. This will be specified as a
+flag.
+```py
+# ${HOME}/dir1/user_dir/t5_1_1_base_de_en.gin
+from __gin__ import dynamic_registration
+import tasks  # This imports the task defined in dir1/user_dir/tasks.py.
+include "t5x-tmp/t5x/examples/t5/t5_1_1/examples/base_wmt_from_scratch.gin"
+MIXTURE_OR_TASK_NAME = "wmt_t2t_de_en_v003"
+```
+Finally, we launch training passing the user directory as a flag
+`gin_search_paths` such that the Gin file and python modules can be specified
+with relative paths.
+```sh
+PROJECT_DIR=${HOME}"/dir1/user_dir"
+T5X_DIR="..."  # directory where the t5x is cloned.
+TFDS_DATA_DIR="..."
+MODEL_DIR="..."
+export PYTHONPATH=${PROJECT_DIR}
+python3 ${T5X_DIR}/t5x/train.py \
+  --gin_search_paths=${PROJECT_DIR} \
+  --gin_file="t5_1_1_base_de_en.gin" \
+  --gin.MODEL_DIR=\"${MODEL_DIR}\" \
+  --tfds_data_dir=${TFDS_DATA_DIR}
+```
+## Checkpoints
+### Native Checkpoints
+We have released the checkpoints of many of the original T5 models and their
+variants a native T5X format for maximal efficiency.
+See the [complete list](https://github.com/google-research/t5x/blob/main/docs/models.md) including the
+matching Gin configuration files.
+These are converted from the public [Mesh TensorFlow
+checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511)
+.
+### Compatibility with the Mesh TensorFlow checkpoints
+The Mesh TensorFlow checkpoints trained using the [T5 library][t5_github] can be
+directly loaded into T5X. For example, we can rerun the fine-tuning example
+initializing from the MTF checkpoint by changing the `INIT_CHECKPOINT` Gin
+macro.
+```sh
+# Model dir to save logs, ckpts, etc. in "gs://model_dir" format.
+MODEL_DIR="..."
+# Data dir to save the processed dataset in "gs://data_dir" format.
+TFDS_DATA_DIR="..."
+T5X_DIR="..."  # directory where the T5X repo is cloned.
+python3 ${T5X_DIR}/t5x/train.py \
+  --gin_file="t5x/examples/t5/t5_1_1/examples/base_wmt19_ende_train.gin" \
+  --gin.MODEL_DIR=\"${MODEL_DIR}\" \
+  --gin.MIXTURE_OR_TASK_NAME=\"wmt_t2t_ende_v003\" \
+  --gin.INIT_CHECKPOINT=\"gs://t5-data/pretrained_models/t5.1.1.base/model.ckpt-1000000\" \
+  --tfds_data_dir=${TFDS_DATA_DIR}
+```
+Note that restoring directly from the Mesh TensorFlow checkpoints can be
+inefficient if heavy model parallelism is used for large models. This is
+because each host loads the entire copy of the model first and then keep only
+the relevant slices dictated by the model parallelism specification. If you have
+Mesh TensorFlow checkpoints that you run often, we recommend converting the
+checkpoints to T5X native format using the
+[convert_tf_checkpoint script](t5x/scripts/convert_tf_checkpoint.py).
+## Citing T5X
+Please use the following bibtex entry to cite T5X.
+```
+@article{roberts2022t5x,
+  url = {https://arxiv.org/abs/2203.17189},
+  author = {Roberts, Adam and Chung, Hyung Won and Levskaya, Anselm and Mishra, Gaurav and Bradbury, James and Andor, Daniel and Narang, Sharan and Lester, Brian and Gaffney, Colin and Mohiuddin, Afroz and Hawthorne, Curtis and Lewkowycz, Aitor and Salcianu, Alex and van Zee, Marc and Austin, Jacob and Goodman, Sebastian and Soares, Livio Baldini and Hu, Haitang and Tsvyashchenko, Sasha and Chowdhery, Aakanksha and Bastings, Jasmijn and Bulian, Jannis and Garcia, Xavier and Ni, Jianmo and Chen, Andrew and Kenealy, Kathleen and Clark, Jonathan H. and Lee, Stephan and Garrette, Dan and Lee-Thorp, James and Raffel, Colin and Shazeer, Noam and Ritter, Marvin and Bosma, Maarten and Passos, Alexandre and Maitin-Shepard, Jeremy and Fiedel, Noah and Omernick, Mark and Saeta, Brennan and Sepassi, Ryan and Spiridonov, Alexander and Newlan, Joshua and Gesmundo, Andrea},
+  title = {Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$},
+  journal={arXiv preprint arXiv:2203.17189},
+  year = {2022},
+}
+```
+## Note
+This is not an officially supported Google product
+[t5_paper]: https://arxiv.org/abs/1910.10683
+[t5_github]: https://github.com/google-research/text-to-text-transfer-transformer
+[gin-primer]: docs/usage/gin.md

t5x-main/docs/_static/t5x_theme.css ADDED Viewed

	@@ -0,0 +1,23 @@

+@import url("theme.css");
+.wy-nav-content {
+  max-width: 1290px;
+}
+.rst-content table.docutils {
+  width: 100%;
+}
+.rst-content table.docutils td {
+  vertical-align: top;
+  padding: 0;
+}
+.rst-content table.docutils td p {
+  padding: 8px;
+}
+.rst-content div[class^=highlight] {
+  border: 0;
+  margin: 0;
+}

t5x-main/docs/_templates/autosummary/t5x_module.rst ADDED Viewed

	@@ -0,0 +1,23 @@

+{{ fullname | escape | underline}}
+.. currentmodule:: {{ module }}
+.. autoclass:: {{ objname }}
+   :exclude-members:
+   {% block methods %}
+   .. automethod:: __call__
+   {% if methods %}
+   .. rubric:: Methods
+   .. autosummary::
+   {% for item in methods %}
+   {%- if item not in inherited_members and item not in annotations and not item in ['__init__'] %}
+       ~{{ name }}.{{ item }}
+   {%- endif %}
+   {%- endfor %}
+   {% endif %}
+   {% endblock %}

t5x-main/docs/api_reference/index.rst ADDED Viewed

	@@ -0,0 +1,100 @@

+API Reference
+=============
+Binaries
+--------
+.. toctree::
+   :maxdepth: 3
+   t5x.train
+   t5x.infer
+   t5x.eval
+   t5x.main
+Training
+---------
+.. toctree::
+   :maxdepth: 3
+   t5x.trainer
+   t5x.optimizers
+   t5x.interactive_model
+   t5x.train_state
+   t5x.state_utils
+   t5x.losses
+   t5x.metrics
+   t5x.utils
+   t5x.adafactor
+Inference
+---------
+.. toctree::
+   :maxdepth: 3
+   t5x.decoding
+Models
+------
+.. toctree::
+   :maxdepth: 3
+   t5x.models
+Checkpointing
+-------------
+.. toctree::
+   :maxdepth: 3
+   t5x.checkpoints
+   t5x.checkpoint_utils
+   t5x.checkpoint_importer
+Paritioning
+-----------
+.. toctree::
+   :maxdepth: 3
+   t5x.partitioning
+Config
+------
+.. toctree::
+   :maxdepth: 3
+   t5x.config_utils
+   t5x.gin_utils
+Utils
+-----
+.. toctree::
+   :maxdepth: 3
+   t5x.test_utils
+   t5x.binary_search

t5x-main/docs/api_reference/t5x.adafactor.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.adafactor package
+========================
+.. currentmodule:: t5x.adafactor
+.. automodule:: t5x.adafactor
+  :members:

t5x-main/docs/api_reference/t5x.binary_search.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.binary_search package
+========================
+.. currentmodule:: t5x.binary_search
+.. automodule:: t5x.binary_search
+  :members:

t5x-main/docs/api_reference/t5x.checkpoint_importer.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.checkpoint_importer package
+========================
+.. currentmodule:: t5x.checkpoint_importer
+.. automodule:: t5x.checkpoint_importer
+  :members:

t5x-main/docs/api_reference/t5x.checkpoint_utils.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.checkpoint_utils package
+========================
+.. currentmodule:: t5x.checkpoint_utils
+.. automodule:: t5x.checkpoint_utils
+  :members:

t5x-main/docs/api_reference/t5x.checkpoints.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.checkpoints package
+========================
+.. currentmodule:: t5x.checkpoints
+.. automodule:: t5x.checkpoints
+  :members:

t5x-main/docs/api_reference/t5x.config_utils.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.config_utils package
+========================
+.. currentmodule:: t5x.config_utils
+.. automodule:: t5x.config_utils
+  :members:

t5x-main/docs/api_reference/t5x.decoding.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.decoding package
+========================
+.. currentmodule:: t5x.decoding
+.. automodule:: t5x.decoding
+  :members:

t5x-main/docs/api_reference/t5x.eval.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.eval binary
+========================
+.. currentmodule:: t5x.eval
+.. automodule:: t5x.eval
+  :members:

t5x-main/docs/api_reference/t5x.gin_utils.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.gin_utils package
+========================
+.. currentmodule:: t5x.gin_utils
+.. automodule:: t5x.gin_utils
+  :members:

t5x-main/docs/api_reference/t5x.infer.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.infer binary
+========================
+.. currentmodule:: t5x.infer
+.. automodule:: t5x.infer
+  :members:

t5x-main/docs/api_reference/t5x.interactive_model.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.interactive_model package
+========================
+.. currentmodule:: t5x.interactive_model
+.. automodule:: t5x.interactive_model
+  :members:

t5x-main/docs/api_reference/t5x.losses.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.losses package
+========================
+.. currentmodule:: t5x.losses
+.. automodule:: t5x.losses
+  :members:

t5x-main/docs/api_reference/t5x.main.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.main binary
+========================
+.. currentmodule:: t5x.main
+.. automodule:: t5x.main
+  :members:

t5x-main/docs/api_reference/t5x.metrics.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.metrics package
+========================
+.. currentmodule:: t5x.metrics
+.. automodule:: t5x.metrics
+  :members:

t5x-main/docs/api_reference/t5x.models.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.models package
+========================
+.. currentmodule:: t5x.models
+.. automodule:: t5x.models
+  :members:

t5x-main/docs/api_reference/t5x.optimizers.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.optimizers package
+========================
+.. currentmodule:: t5x.optimizers
+.. automodule:: t5x.optimizers
+  :members:

t5x-main/docs/api_reference/t5x.partitioning.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.partitioning package
+========================
+.. currentmodule:: t5x.partitioning
+.. automodule:: t5x.partitioning
+  :members:

t5x-main/docs/api_reference/t5x.state_utils.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.state_utils package
+========================
+.. currentmodule:: t5x.state_utils
+.. automodule:: t5x.state_utils
+  :members:

t5x-main/docs/api_reference/t5x.test_utils.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.test_utils package
+========================
+.. currentmodule:: t5x.test_utils
+.. automodule:: t5x.test_utils
+  :members:

t5x-main/docs/api_reference/t5x.train.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.train binary
+========================
+.. currentmodule:: t5x.train
+.. automodule:: t5x.train
+  :members:

t5x-main/docs/api_reference/t5x.train_state.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.train_state package
+========================
+.. currentmodule:: t5x.train_state
+.. automodule:: t5x.train_state
+  :members:

t5x-main/docs/api_reference/t5x.trainer.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.trainer package
+========================
+.. currentmodule:: t5x.trainer
+.. automodule:: t5x.trainer
+  :members:

t5x-main/docs/api_reference/t5x.utils.rst ADDED Viewed

	@@ -0,0 +1,7 @@

+t5x.utils package
+========================
+.. currentmodule:: t5x.utils
+.. automodule:: t5x.utils
+  :members:

t5x-main/docs/conf.py ADDED Viewed

	@@ -0,0 +1,132 @@

+# Copyright 2024 The T5X Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Configuration file for the Sphinx documentation builder.
+This file only contains a selection of the most common options. For a full
+list see the documentation:
+https://www.sphinx-doc.org/en/master/usage/configuration.html
+"""
+# pylint:disable=all
+# -- Path setup --------------------------------------------------------------
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+sys.path.insert(0, os.path.abspath('..'))
+# patch sphinx
+import docs.conf_sphinx_patch
+# -- Project information -----------------------------------------------------
+project = 'T5X'
+copyright = '2023, The T5X authors'  # pylint: disable=redefined-builtin
+author = 'The T5X authors'
+# -- General configuration ---------------------------------------------------
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.autosummary',
+    'sphinx.ext.autosectionlabel',
+    'sphinx.ext.doctest',
+    'sphinx.ext.intersphinx',
+    'sphinx.ext.mathjax',
+    'sphinx.ext.napoleon',
+    'sphinx.ext.viewcode',
+    'myst_nb',
+    'sphinx_design',
+]
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+source_suffix = ['.rst', '.ipynb', '.md']
+autosummary_generate = True
+master_doc = 'index'
+autodoc_typehints = 'none'
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+# -- Options for HTML output -------------------------------------------------
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+# html_theme = 'pydata_sphinx_theme'
+html_theme = 'sphinx_book_theme'
+html_css_files = ['css/t5x_theme.css']
+# The name of an image file (relative to this directory) to place at the top
+# of the sidebar.
+html_logo = './t5x.png'
+html_favicon = './t5x.png'
+# title of the website
+html_title = ''
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named 'default.css' will overwrite the builtin 'default.css'.
+html_static_path = ['_static']
+html_theme_options = {
+    'repository_url': 'https://github.com/google-research/t5x',
+    'use_repository_button': True,  # add a 'link to repository' button
+    'use_issues_button': False,  # add an 'Open an Issue' button
+    'path_to_docs': (
+        'docs'
+    ),  # used to compute the path to launch notebooks in colab
+    'launch_buttons': {
+        'colab_url': 'https://colab.research.google.com/',
+    },
+    'prev_next_buttons_location': None,
+    'show_navbar_depth': 1,
+}
+# -- Options for myst ----------------------------------------------
+# uncomment line below to avoid running notebooks during development
+# nb_execution_mode = 'off'
+# Notebook cell execution timeout; defaults to 30.
+nb_execution_timeout = 100
+# List of patterns, relative to source directory, that match notebook
+# files that will not be executed.
+myst_enable_extensions = ['dollarmath']
+# raise exceptions on execution so CI can catch errors
+nb_execution_allow_errors = False
+nb_execution_raise_on_error = True
+# -- Extension configuration -------------------------------------------------
+# Tell sphinx-autodoc-typehints to generate stub parameter annotations including
+# types, even if the parameters aren't explicitly documented.
+always_document_param_types = True

t5x-main/docs/conf_sphinx_patch.py ADDED Viewed

	@@ -0,0 +1,202 @@

+# Copyright 2024 The T5X Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Patch Sphinx to improve documentation aesthetics."""
+# TODO(cgarciae): Send a PR to sphinx to upstream this fix.
+# Issue: https://github.com/google/flax/issues/2196
+# This patch is needed to make autosummary provide the "annotations"
+# variable so we can exclude function attributes from the methods list
+# in flax_module.rst. The patch as such only adds this single line:
+#
+#     ns['annotations'] = list(getattr(obj, '__annotations__', {}).keys())'
+#
+# We should consider sending a PR to sphinx so we can get rid of this.
+# Original source:
+# https://github.com/sphinx-doc/sphinx/blob/0aedcc9a916daa92d477226da67d33ce1831822e/sphinx/ext/autosummary/generate.py#L211-L351
+from typing import Any, Dict, List, Set, Tuple
+import sphinx.ext.autodoc
+import sphinx.ext.autosummary.generate as ag
+# pylint:disable=all
+def generate_autosummary_content(
+    name: str,
+    obj: Any,
+    parent: Any,
+    template: ag.AutosummaryRenderer,
+    template_name: str,
+    imported_members: bool,
+    app: Any,
+    recursive: bool,
+    context: Dict,
+    modname: str = None,
+    qualname: str = None,
+) -> str:
+  doc = ag.get_documenter(app, obj, parent)
+  def skip_member(obj: Any, name: str, objtype: str) -> bool:
+    try:
+      return app.emit_firstresult(
+          'autodoc-skip-member', objtype, name, obj, False, {}
+      )
+    except Exception as exc:
+      ag.logger.warning(
+          __(
+              'autosummary: failed to determine %r to be documented, '
+              'the following exception was raised:\n%s'
+          ),
+          name,
+          exc,
+          type='autosummary',
+      )
+      return False
+  def get_class_members(obj: Any) -> Dict[str, Any]:
+    members = sphinx.ext.autodoc.get_class_members(
+        obj, [qualname], ag.safe_getattr
+    )
+    return {name: member.object for name, member in members.items()}
+  def get_module_members(obj: Any) -> Dict[str, Any]:
+    members = {}
+    for name in ag.members_of(obj, app.config):
+      try:
+        members[name] = ag.safe_getattr(obj, name)
+      except AttributeError:
+        continue
+    return members
+  def get_all_members(obj: Any) -> Dict[str, Any]:
+    if doc.objtype == 'module':
+      return get_module_members(obj)
+    elif doc.objtype == 'class':
+      return get_class_members(obj)
+    return {}
+  def get_members(
+      obj: Any,
+      types: Set[str],
+      include_public: List[str] = [],
+      imported: bool = True,
+  ) -> Tuple[List[str], List[str]]:
+    items: List[str] = []
+    public: List[str] = []
+    all_members = get_all_members(obj)
+    for name, value in all_members.items():
+      documenter = ag.get_documenter(app, value, obj)
+      if documenter.objtype in types:
+        # skip imported members if expected
+        if imported or getattr(value, '__module__', None) == obj.__name__:
+          skipped = skip_member(value, name, documenter.objtype)
+          if skipped is True:
+            pass
+          elif skipped is False:
+            # show the member forcedly
+            items.append(name)
+            public.append(name)
+          else:
+            items.append(name)
+            if name in include_public or not name.startswith('_'):
+              # considers member as public
+              public.append(name)
+    return public, items
+  def get_module_attrs(members: Any) -> Tuple[List[str], List[str]]:
+    """Find module attributes with docstrings."""
+    attrs, public = [], []
+    try:
+      analyzer = ag.ModuleAnalyzer.for_module(name)
+      attr_docs = analyzer.find_attr_docs()
+      for namespace, attr_name in attr_docs:
+        if namespace == '' and attr_name in members:
+          attrs.append(attr_name)
+          if not attr_name.startswith('_'):
+            public.append(attr_name)
+    except ag.PycodeError:
+      pass  # give up if ModuleAnalyzer fails to parse code
+    return public, attrs
+  def get_modules(obj: Any) -> Tuple[List[str], List[str]]:
+    items: List[str] = []
+    for _, modname, _ispkg in ag.pkgutil.iter_modules(obj.__path__):
+      fullname = name + '.' + modname
+      try:
+        module = ag.import_module(fullname)
+        if module and hasattr(module, '__sphinx_mock__'):
+          continue
+      except ImportError:
+        pass
+      items.append(fullname)
+    public = [x for x in items if not x.split('.')[-1].startswith('_')]
+    return public, items
+  ns: Dict[str, Any] = {}
+  ns.update(context)
+  if doc.objtype == 'module':
+    scanner = ag.ModuleScanner(app, obj)
+    ns['members'] = scanner.scan(imported_members)
+    ns['functions'], ns['all_functions'] = get_members(
+        obj, {'function'}, imported=imported_members
+    )
+    ns['classes'], ns['all_classes'] = get_members(
+        obj, {'class'}, imported=imported_members
+    )
+    ns['exceptions'], ns['all_exceptions'] = get_members(
+        obj, {'exception'}, imported=imported_members
+    )
+    ns['attributes'], ns['all_attributes'] = get_module_attrs(ns['members'])
+    ispackage = hasattr(obj, '__path__')
+    if ispackage and recursive:
+      ns['modules'], ns['all_modules'] = get_modules(obj)
+  elif doc.objtype == 'class':
+    ns['members'] = dir(obj)
+    ns['inherited_members'] = set(dir(obj)) - set(obj.__dict__.keys())
+    ns['methods'], ns['all_methods'] = get_members(
+        obj, {'method'}, ['__init__']
+    )
+    ns['attributes'], ns['all_attributes'] = get_members(
+        obj, {'attribute', 'property'}
+    )
+    ns['annotations'] = list(getattr(obj, '__annotations__', {}).keys())
+  if modname is None or qualname is None:
+    modname, qualname = ag.split_full_qualified_name(name)
+  if doc.objtype in ('method', 'attribute', 'property'):
+    ns['class'] = qualname.rsplit('.', 1)[0]
+  if doc.objtype in ('class',):
+    shortname = qualname
+  else:
+    shortname = qualname.rsplit('.', 1)[-1]
+  ns['fullname'] = name
+  ns['module'] = modname
+  ns['objname'] = qualname
+  ns['name'] = shortname
+  ns['objtype'] = doc.objtype
+  ns['underline'] = len(name) * '='
+  if template_name:
+    return template.render(template_name, ns)
+  else:
+    return template.render(doc.objtype, ns)
+ag.generate_autosummary_content = generate_autosummary_content

t5x-main/docs/contributions.md ADDED Viewed

	@@ -0,0 +1,64 @@

+# Contributions
+T5X was developed as part of the T5 Infrastructure effort at Google Research.
+Adam Roberts founded and leads the project, designed and wrote much of `seqio`
+and `t5x`, and co-authored the
+[T5X and SeqIO paper](https://arxiv.org/abs/2203.17189). Hyung Won Chung
+designed and wrote much of `t5x`, led its open sourcing, and co-authored the
+paper. Anselm Levskaya built the initial prototype for `t5x` and wrote much of
+the code. Gaurav Mishra leads `seqio`, implemented deterministic pipelines, and
+co-authored the paper. James Bradbury implemented partitioning in `t5x` and
+co-wrote the paper.
+Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin,
+Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin,
+Sebastian Good-man, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko,
+Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni,
+Andrew Chen, Kathleen Kenealy, Kehang Han, Jonathan H. Clark, Stephan Lee, Dan
+Garrette, and James Lee-Thorp made substantial code contributions.
+Colin Raffel and Noam Shazeer helped design `seqio`. Marvin Ritter advised on
+deterministic pipelines and the use of CLU Metrics. Maarten Bosma helped design
+deterministic pipelines. Jeremy Maitin-Shepard advised on the use of
+TensorStore. Alexandre Passos and Ryan Sepassi advised on overall technical
+design.
+Noah Fiedel is a member of the leadership team, contributed to the high level
+design and roadmap, and co-wrote the paper. Mark Omernick, Brennan Saeta, Ryan
+Sepassi, Alexander Spiridonov (Product Manager), and Josh Newlan (Technical
+Program Manager) are members of the leadership team and co-wrote the paper.
+Andrea Gesmundo is a member of the leadership team and contributed to the
+internal infrastructure component.
+Thanks to the many other contributors to the project: Ian Simon, Reiner Pope,
+Vincent Zhao, Pierre Ruyssen, Linting Xue, Junwhan Ahn, Barret Zoph, David
+Dohan, Masumi Parekh, Chang Lan, Frederick Liu, Julien Amelot, Luheng He, Fede
+Lebron, RebeccaChen, Anosh Raj, Mandy Guo, Ethan Dyer, Mihai Tiuca, Hongkun Yu,
+Kevin Brooks, David Soergel, Kelvin Guu, Joshua Ainslie, Luyao Xu, Ji Ma, Josh
+Gardner, Daphne Ippolito, Peter Hawkins, Bo Pang, Marc Rasi, Wei Li, Wenhu Chen,
+Iulia Turc, John Wieting, Alex Passos, Zonglin Li, Katie Everett, Olivier
+Bachem, Francesco Piccinno, Jakub Adamek, Jonathan Heek, Parker Schuh, Hexiang
+Hu, Du Phan, Max Moroz, David Miller, Ryan Doherty, David Elworthy, Alfonso
+Casta ̃no, Julian Eisenschlos, Vlad-Doru Ion, Lucas Dixon, Ron Shapiro, Dinghua
+Li, Aaron Parisi, Xi Chen, Nan Ding, Chung-ching Chang, Timothy Dozat, Natalia
+Ponomareva, Delesley Hutchins, Ankush Garg, Yu-Han Liu, Mehrdad Khatir, Costanza
+Conforti, Philipp Keck, Rapha ̈el Marinier, Marie Pellat, Raghuram Vadapalli,
+Joshua Maynez, Yi Tay, Xihui Wu, David Belanger, Luke Metz, Dan Zheng, Deepti
+Bhatia, Hariharan Shanmugavadivel, Rewon Child, Rigel Swavely, Mihir Sanjay
+Kale, Arash Afkanpour, Roberto Rama, Juro Gottweis, Jonathan Herzig, Yilei Yang,
+Elias Mizan, Pedram Pejman, Jiayu Ye, Smit Sanghavi, Rahul Joshi, Ziqiang Feng,
+Charles Sutton, Weikang Zhou, Liam Fedus, Shanqing Cai, Ginger Perng, Yash
+Katariya, Urvashi Khandelwal, Sebastian Gehrmann, Edward Loper, Tianze Shi, Luke
+Vilnis, Amelia Archer, Tom Weingarten, David Zats, Murtaza Dhuliawala, Xin Xie,
+Sahil Dua, Andr ́e SusanoPinto, Piotr Padlewski, Sascha Rothe, Erik Aas, Felix
+Stahlberg, Ken Durden, Christina Sorokin, Jaehoon Lee, Roy Frostig, Jacob
+Devlin, Jorge Gonzalez Mendez, Deepak Ramachandran, Santiago Ontanon, Karthik
+Raman, Yi Sun, Ali Elqursh, Reuben La Haye,Adam Fahrenkopf, Alex Polozov, Vinay
+Ramasesh, Ian Tenney.
+Thanks to NVIDIA for GPU contributions: Sahil Jain, Terry Kong, Yu-Hang Tang,
+Ming Huang, Frederic Bastien, Sharath Turuvekere Sreenivas, Xiaowei Ren, Ryan Jeng,
+ Reese Wang
+Thanks to Douglas Eck and Zoubin Ghahramani for sponsoring the project.

t5x-main/docs/index.md ADDED Viewed

	@@ -0,0 +1,65 @@

+# T5X
+Note: T5X is community-supported since ~2023. For critical use cases, consider
+using libraries like TuneLab (go/tunelab) and Gemax Prod (go/gemax-prod). See
+https://github.com/google-research/text-to-text-transfer-transformer/blob/main/README.mdx-to-gemax-prod for useful tips on transitioning.
+## Overview
+T5X is a modular, composable, research-friendly framework for high-performance,
+configurable, self-service training, evaluation, and inference of sequence
+models (starting with language) at many scales.
+It is essentially a new and improved implementation of the
+[T5 codebase](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/README.md) (based on Mesh TensorFlow) in JAX and Flax. To learn
+more, see the [T5X Paper](https://arxiv.org/abs/2203.17189).
+## Getting Started
+Here are some quick tutorials to help you get started with common use-cases on
+T5X:
+#### [Introductory Colabs](tutorials.md)
+If you are new to T5X, we recommend starting with our introductory Colab series,
+which introduces core concepts of both T5X and SeqIO. More colabs will be added
+to this series regularly!
+#### [Fine-tuning a model](usage/finetune.md)
+This tutorial outlines the steps to fine-tune an existing pre-trained model with
+T5X on common downstream Tasks/Mixtures available on SeqIO. This is one of the
+simplest and most common use cases of T5X. If you're new to T5X, this tutorial
+is the recommended starting point.
+#### [Running evaluation on a model](usage/eval.md)
+This tutorial outlines the steps to evaluate a model with T5X on downstream
+Tasks/Mixtures defined in SeqIO.
+#### [Running inference on a model](usage/infer.md)
+This tutorial outlines the steps to run inference on a model with T5X.
+#### [Training a model from scratch](usage/pretrain.md)
+This tutorial outlines the steps to pretrain a model with T5X on Tasks/Mixtures
+defined in SeqIO.
+#### [Gin Primer](usage/gin.md)
+This tutorial provides a quick introduction to Gin, a lightweight configuration
+framework for Python that is used to configure training, eval and inference jobs
+on T5X.
+#### [Partitioning Primer](usage/partitioning.md)
+This tutorial provides background on what model and data partitioning are and
+how it can be configured in T5X.
+#### [Metrics Overview](usage/metrics.md)
+This tutorial provides an overview of how metrics can be used and customized to
+evaluate T5X models.

t5x-main/docs/index.rst ADDED Viewed

	@@ -0,0 +1,24 @@

+******************************
+T5X
+******************************
+T5X is a modular, composable, research-friendly framework for high-performance,
+configurable, self-service training, evaluation, and inference of sequence
+models (starting with language) at many scales.
+It is essentially a new and improved implementation of the
+`T5 codebase <https://github.com/google-research/text-to-text-transfer-transformer/blob/main/README.md>`__
+(based on Mesh TensorFlow) in JAX and Flax. To learn more, see the
+`T5X Paper <https://arxiv.org/abs/2203.17189>`__.
+.. toctree::
+   :maxdepth: 2
+   :caption: Table of Contents
+   Quick Start <overview>
+   Tutorials <tutorials>
+   Usage Guides <usage/index>
+   Models <models>
+   api_reference/index
+   contributions

t5x-main/docs/models.md ADDED Viewed

	@@ -0,0 +1,318 @@

+# Models
+This page lists the available pre-trained T5 models. To use a pre-trained model,
+you need a Gin config file that defines the model params, and the model
+checkpoint to load from. For your convenience, TensorFlow checkpoints and Gin
+configs for common T5 pre-trained models have been made available for use in
+T5X. Following is a list of these pre-trained models and their Gin and
+checkpoint locations.
++   All checkpoints:
+    [`gs://t5-data/pretrained_models/t5x/`](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/)
++   All Gin files:
+    [`t5x/configs/models/`](https://github.com/google-research/t5x/blob/main/t5x/configs/)
+### Selecting a model:
+Publicly Available Models:
+Model                                                | Use Case
+---------------------------------------------------- | --------
+[T5 1.1](#t5-11-checkpoints)                         | Improved T5, recommended for most research. English only.
+[T5](#t5-checkpoints)                                | The original T5 work for reproducibility. English only.
+[T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints)   | Trained for 100k additional steps on the LM objective, per [prompt tuning paper](https://arxiv.org/abs/2104.08691).
+[mT5](#mt5-checkpoints)                              | Multilingual T5. Recommended for multilingual research. Note that at smaller scales (at least through XL), mT5 performance is lower than T5 on English tasks.
+[mT5 LM-Adapted](#mt5-lm-adapted-checkpoints)        | Trained for 100k additional steps on the LM objective, per [zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647).
+[umT5](#umt5-checkpoints)                            | umT5, an updated mT5 model trained using a more uniform language distribution, per [the UniMax paper](https://openreview.net/forum?id=kXwdL1cWOAi).
+[ByT5](#byt5-checkpoints)                            | ByT5. A "token-free" model that uses UTF-8 bytes for input and output. Recommended for tasks involving word-internal phenomena such as spelling, pronunciation, or morphology.
+[LongT5](#longt5-checkpoints)                        | Recommended checkpoints to fine-tune for long input sequence tasks
+[MoE](#mixture-of-experts-moe-checkpoints)           | Useful for MoE experimentation.
+[Flan-T5](#flan-t5-checkpoints)                      | General purpose T5 checkpoints for few-shot and finetuning. We recommend Flan-T5 over vanilla T5 and T5 LM-adapted
+[UL2](#ul2-checkpoints)                              | Checkpoints for 20B pretrained and FLAN-based instruction-tuned models using the UL2 objective from [UL2 paper](https://arxiv.org/abs/2205.05131)
+[BigScience](#bigscience-checkpoints)                | Checkpoints from the [BigScience paper](https://arxiv.org/abs/2204.05832)
+[FLIP](#flip-checkpoints)                            | Language-Image models trained with an alternative to CLIP, presented in the [FLIP paper](https://arxiv.org/abs/2212.00794)
+[RankGen](#rankgen-checkpoints)                      | 1.2B parameter encoder model for English to score model generations given a prefix for decoding from the [RankGen paper](https://arxiv.org/abs/2205.09726)
+[Dipper](#dipper-checkpoints)                        | 11B parameter paraphrase generation model from the [Dipper paper](https://arxiv.org/abs/2303.13408)
+### Public Research Models
+#### T5 Checkpoints
+These are the checkpoints used in the paper [Exploring the Limits of Transfer
+Learning with a Unified Text-to-Text
+Transformer](https://arxiv.org/abs/1910.10683). They are encoder-decoder models
+pre-trained on [C4](https://www.tensorflow.org/datasets/catalog/c4) with a "span
+corruption" denoising objective, in addition to a mixture of downstream tasks
+including: GLUE, SuperGLUE, CNN/Daily Mail, SQuAD, and WMT.
+**Vocabulary:**
+[cc_all.32000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/cc_all.32000.100extra)
+Model    | Gin File Location                                                              | Checkpoint Location
+-------- | ------------------------------------------------------------------------------ | -------------------
+T5 Small | [t5_small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/small.gin) | [gs://t5-data/pretrained_models/t5x/t5_small/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_small)
+T5 Base  | [t5_base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/base.gin)   | [gs://t5-data/pretrained_models/t5x/t5_base/checkpoint_999900](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_base)
+T5 Large | [t5_large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/large.gin) | [gs://t5-data/pretrained_models/t5x/t5_large/checkpoint_1000700](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_large)
+T5 3B    | [t5_3B.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/3B.gin)       | [gs://t5-data/pretrained_models/t5x/t5_3B/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_3B)
+T5 11B   | [t5_11B.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_0/11B.gin)     | [gs://t5-data/pretrained_models/t5x/t5_11B/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_11B)
+#### T5 1.1 Checkpoints
+These are similar to the models from [Exploring the Limits of Transfer Learning
+with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683), but
+with the following improvements:
+*   GEGLU activation in feed-forward hidden layer, rather than ReLU - see
+    https://arxiv.org/abs/2002.05202 .
+*   Dropout was turned off in pre-training (quality win). Dropout should be
+    re-enabled during fine-tuning.
+*   Pre-trained on C4 only without mixing in the downstream tasks.
+*   no parameter sharing between embedding and classifier layer
+*   "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit
+    different - larger d_model and smaller num_heads and d_ff.
+For English-language, sequence-to-sequence-style tasks (ones where the goal is
+to map from an input text sequence to a target sequence) these are usually the
+best models to fine-tune.
+**Vocabulary:**
+[cc_all.32000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/cc_all.32000.100extra)
+Model        | Gin File Location                                                                  | Checkpoint Location
+------------ | ---------------------------------------------------------------------------------- | -------------------
+T5 1.1 Small | [t5_1_1/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin) | [gs://t5-data/pretrained_models/t5x/t5_1_1_small/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_small)
+T5 1.1 Base  | [t5_1_1/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/base.gin)   | [gs://t5-data/pretrained_models/t5x/t5_1_1_base/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_base)
+T5 1.1 Large | [t5_1_1_large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/large.gin) | [gs://t5-data/pretrained_models/t5x/t5_1_1_large/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_large)
+T5 1.1 XL    | [t5_1_1_xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xl.gin)       | [gs://t5-data/pretrained_models/t5x/t5_1_1_xl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_xl)
+T5 1.1 XXL   | [t5_1_1_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xxl.gin)     | [gs://t5-data/pretrained_models/t5x/t5_1_1_xxl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_xxl)
+#### T5 1.1 LM-Adapted Checkpoints
+These "LM-adapted" models are initialized from T5 1.1 (above) and trained for an
+additional 100K steps on the LM objective discussed in the
+[T5 paper](https://arxiv.org/abs/1910.10683). This adaptation improves the
+ability of the model to be used for
+[prompt tuning](https://arxiv.org/abs/2104.08691). These checkpoints were also
+used within the BigScience [T0](https://arxiv.org/abs/2110.08207) project.
+**Vocabulary:**
+[cc_all.32000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/cc_all.32000.100extra)
+Model                | Gin File Location                                                                                                   | Checkpoint Location
+-------------------- | ------------------------------------------------------------------------------------------------------------------- | -------------------
+T5 1.1 LM-100K Small | [t5_1_1_small.gin](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin) | [t5_1_1_lm100k_small/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_lm100k_small)
+T5 1.1 LM-100K Base  | [t5_1_1_base.gin](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_base.gin)   | [t5_1_1_lm100k_base/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_lm100k_base)
+T5 1.1 LM-100K Large | [t5_1_1_large.gin](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_large.gin) | [t5_1_1_lm100k_large/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_lm100k_large)
+T5 1.1 LM-100K XL    | [t5_1_1_xl.gin](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_xl.gin)       | [t5_1_1_lm100k_xl/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_lm100k_xl)
+T5 1.1 LM-100K XXL   | [t5_1_1_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_xxl.gin)     | [t5_1_1_lm100k_xxl/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_lm100k_xxl)
+#### mT5 Checkpoints
+These are the checkpoints used in the paper
+[mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer](https://aclanthology.org/2021.naacl-main.41/).
+They are encoder-decoder models trained on
+[multilingual C4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual)
+with a denoising objective. These are the best checkpoints to fine-tune for
+non-English sequence-to-sequence tasks.
+**Vocabulary:**
+[mc4.250000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/mc4.250000.100extra)
+Model     | Gin File Location                                                            | Checkpoint Location
+--------- | ---------------------------------------------------------------------------- | -------------------
+mT5 Small | [mt5/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/small.gin) | [gs://t5-data/pretrained_models/t5x/mt5_small/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_small)
+mT5 Base  | [mt5/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/base.gin)   | [gs://t5-data/pretrained_models/t5x/mt5_base/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_base)
+mT5 Large | [mt5/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/large.gin) | [gs://t5-data/pretrained_models/t5x/mt5_large/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_large)
+mT5 XL    | [mt5/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xl.gin)       | [gs://t5-data/pretrained_models/t5x/mt5_xl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_xl)
+mT5 XXL   | [mt5/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xxl.gin)     | [gs://t5-data/pretrained_models/t5x/mt5_xxl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_xxl)
+#### mT5 LM-Adapted Checkpoints
+These are the checkpoints released as part of the
+[zero-shot cross-lingual generation (XGen) paper](https://arxiv.org/abs/2205.12647).
+These "LM-adapted" models are initialized from mT5 (above) and trained for an
+additional 100K steps on the LM objective discussed in the
+[T5 paper](https://arxiv.org/abs/1910.10683).
+This adaptation improves the ability of the model to be used for
+[prompt tuning](https://arxiv.org/abs/2104.08691).
+**Vocabulary:**
+[mc4.250000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/mc4.250000.100extra)
+Model                | Gin File Location                                                            | Checkpoint Location
+-------------------- | ---------------------------------------------------------------------------- | -------------------
+mT5 LM-Adapted Small | [mt5/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/small.gin) | [mt5_lm_adapted/small/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/small/checkpoint_1100000)
+mT5 LM-Adapted Base  | [mt5/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/base.gin)   | [mt5_lm_adapted/base/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/base/checkpoint_1100000)
+mT5 LM-Adapted Large | [mt5/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/large.gin) | [mt5_lm_adapted/large/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/large/checkpoint_1100000)
+mT5 LM-Adapted XL    | [mt5/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xl.gin)       | [mt5_lm_adapted/xl/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/xl/checkpoint_1100000)
+mT5 LM-Adapted XXL   | [mt5/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/mt5/xxl.gin)     | [mt5_lm_adapted/xxl/checkpoint_1100000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/mt5_lm_adapted/xxl/checkpoint_1100000)
+#### umT5 Checkpoints
+These are the checkpoints described in the paper [UniMax: Fairer and More
+Effective Language Sampling for Large-Scale Multilingual
+Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi). umT5 is similar to
+mT5 (see above); both are multilingual encoder-decoder models ranging from 300M
+to 13B parameters, trained on the mC4 corpus using a denoising objective. umT5
+is trained on a fresher version of the mC4 corpus (3.1.0), and with a more
+uniform language balancing strategy.
+**Vocabulary:** [umt5.256000](https://console.cloud.google.com/storage/browser/t5-data/vocabs/umt5.256000)
+Model      | Gin File Location                                                                                         | Checkpoint Location
+---------- | --------------------------------------------------------------------------------------------------------- | -------------------
+umT5 Small | [umt5/pretrain_small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_small.gin) | [umt5/small/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/small/checkpoint_1000000)
+umT5 Base  | [umt5/pretrain_base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_base.gin)   | [umt5/base/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/base/checkpoint_1000000)
+umT5 XL    | [umt5/pretrain_xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_xl.gin)       | [umt5/xl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/xl/checkpoint_1000000)
+umT5 XXL   | [umt5/pretrain_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/scalable_t5/umt5/pretrain_xxl.gin)     | [umt5/xxl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/umt5/xxl/checkpoint_1000000)
+#### ByT5 Checkpoints
+These are the checkpoints used in the paper
+[ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models](https://aclanthology.org/2022.tacl-1.17/).
+They are similar to mT5 (above), but are "token-free", processing text as raw
+UTF-8 bytes, as opposed to using a pretrained subword vocabulary. These models
+are more robust to character-level noise, and outperform parameter-matched mT5
+models in many settings, particularly on word-level tasks sensitive to spelling,
+pronunciation, or morphology. However inference is significantly slower, up to
+10x depending on the task.
+**Vocabulary:** None
+Model      | Gin File Location                                                              | Checkpoint Location
+---------- | ------------------------------------------------------------------------------ | -------------------
+ByT5 Small | [byt5/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/byt5/small.gin) | [gs://t5-data/pretrained_models/t5x/byt5_small/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/byt5_small)
+ByT5 Base  | [byt5/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/byt5/base.gin)   | [gs://t5-data/pretrained_models/t5x/byt5_base/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/byt5_base)
+ByT5 Large | [byt5/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/byt5/large.gin) | [gs://t5-data/pretrained_models/t5x/byt5_large/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/byt5_large)
+ByT5 XL    | [byt5/xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/byt5/xl.gin)       | [gs://t5-data/pretrained_models/t5x/byt5_xl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/byt5_xl)
+ByT5 XXL   | [byt5/xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/byt5/xxl.gin)     | [gs://t5-data/pretrained_models/t5x/byt5_xxl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/byt5_xxl)
+#### LongT5 Checkpoints
+These are the checkpoints used in the paper
+[LongT5: Efficient Text-to-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916).
+They are encoder-decoder models trained on
+[C4](https://www.tensorflow.org/datasets/catalog/c4) using the PEGASUS Principle
+Sentences Generation objective. These are the recommended checkpoints to
+fine-tune for long input sequence tasks.
+##### LongT5 Local Attention Checkpoints
+The checkpoints below use local attention, which uses a sliding window to reduce
+training time from quadratic (with regards to input length) to linear. These are
+the recommended checkpoints to use for faster training/inference time.
+**Vocabulary:**
+[cc_all.32000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/cc_all.32000.100extra)
+Model                        | Gin File Location                                                                                                                     | Checkpoint Location
+---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | -------------------
+LongT5 Local Attention Base  | [longt5/models/longt5_1_1_base.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/longt5/models/longt5_1_1_base.gin)   | [gs://t5-data/pretrained_models/t5x/longt5/local_base/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/longt5/local_base)
+LongT5 Local Attention Large | [longt5/models/longt5_1_1_large.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/longt5/models/longt5_1_1_large.gin) | [gs://t5-data/pretrained_models/t5x/longt5/local_large/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/longt5/local_large)
+##### LongT5 Transient Global Attention Checkpoints
+The checkpoints below use transient global attention, which introduces global
+tokens at each encoder layer to allow tokens to interact with each other at
+longer distances. These are the recommended checkpoints to use for increased
+performance on long input sequence tasks.
+**Vocabulary:**
+[cc_all.32000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/cc_all.32000.100extra)
+Model        | Gin File Location                                                                                                                                                | Checkpoint Location
+------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------
+LongT5 Base  | [longt5/models/longt5_1_1_transient_base.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/longt5/models/longt5_1_1_transient_global_base.gin)   | [gs://t5-data/pretrained_models/t5x/longt5/tglobal_base/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/longt5/tglobal_base)
+LongT5 Large | [longt5/models/longt5_1_1_transient_large.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/longt5/models/longt5_1_1_transient_global_large.gin) | [gs://t5-data/pretrained_models/t5x/longt5/tglobal_large/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/longt5/tglobal_large)
+LongT5 XL    | [longt5/models/longt5_1_1_transient_xl.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/longt5/models/longt5_1_1_transient_global_xl.gin)       | [gs://t5-data/pretrained_models/t5x/longt5/tglobal_xl/checkpoint_1000000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/longt5/tglobal_xl)
+#### Mixture of Experts (MoE) Checkpoints
+These MoE checkpoints need to be used with T5X MoE overrides -- specifically,
+the MoeTrainer and the MoePjitPartitioner. For example, for fine-tuning, use the
+[MoE fine-tune run config](https://github.com/google-research/t5x/blob/main/t5x/contrib/moe/configs/runs/finetune.gin).
+##### Converted Mesh Tensorflow checkpoints
+[Switch Transformer model](https://arxiv.org/abs/2101.03961).
+**Vocabulary:**
+[cc_all.32000.100extra](https://console.cloud.google.com/storage/browser/t5-data/vocabs/cc_all.32000.100extra)
+Model                                    | Gin File Location                                                                                            | Checkpoint Location
+---------------------------------------- | ------------------------------------------------------------------------------------------------------------ | -------------------
+Switch Transformer Base 8 Experts        | [switch_base.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_base.gin)   | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/base/e8/checkpoint_500100](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/base/e8)
+Switch Transformer Base 16 Experts       | [switch_base.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_base.gin)   | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/base/e16/checkpoint_550000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/base/e16)
+Switch Transformer Base 32 Experts       | [switch_base.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_base.gin)   | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/base/e32/checkpoint_550000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/base/e32)
+Switch Transformer Base 64 Experts       | [switch_base.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_base.gin)   | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/base/e64/checkpoint_550000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/base/e64)
+Switch Transformer Base 128 Experts      | [switch_base.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_base.gin)   | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/base/e128/checkpoint_550000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/base/e128)
+Switch Transformer Base 256 Experts      | [switch_base.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_base.gin)   | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/base/e256/checkpoint_550000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/base/e256)
+Switch Transformer Large 128 Experts     | [switch_large.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_large.gin) | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/large/e128/checkpoint_483100](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/large/e128)
+Switch Transformer XXL 128 Experts       | [switch_xxl.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_xxl.gin)     | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/xxl/e128/checkpoint_634600](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/xxl/e128)
+Switch Transformer C 2048 Experts (1.6T) | [switch_c.gin](https://github.com/google/flaxformer/tree/main/flaxformer/t5x/configs/moe/models/switch_c.gin)         | [gs://t5-data/pretrained_models/t5x/moe/switch_classic/c/e2048/checkpoint_611800](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/moe/switch_classic/c/e2048)
+#### Flan-T5 Checkpoints
+These are the checkpoints released as part of the paper
+[Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416).
+They were initialized from the
+[T5 1.1 LM-Adapted](#t5-11-lm-adapted-checkpoints) and instruction-finetuned.
+They significantly outperform the LM-adapted checkpoints. For example,
+Flan-T5-XXL outperforms T5-LM-XXL by 26.6% absolute on the normalized average
+score. It even outperforms a much larger PaLM 62B model on
+[BigBench Hard](https://arxiv.org/abs/2210.09261) a set of challenging BigBench
+benchmark.
+Unlike the vanilla T5 checkpoints, these can be directly used for few-shot
+prompting as well as standard finetuning. See
+[Chung et al. 2022](https://arxiv.org/abs/2210.11416) for details.
+Model         | Gin File Location                                                                  | Checkpoint Location
+------------- | ---------------------------------------------------------------------------------- | -------------------
+Flan-T5 Small | [t5_1_1/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin) | [gs://t5-data/pretrained_models/t5x/flan_t5_small/checkpoint_1198000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_small/checkpoint_1198000)
+Flan-T5 Base  | [t5_1_1/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/base.gin)   | [gs://t5-data/pretrained_models/t5x/flan_t5_base/checkpoint_1184000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_base/checkpoint_1184000)
+Flan-T5 Large | [t5_1_1_large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/large.gin) | [gs://t5-data/pretrained_models/t5x/flan_t5_large/checkpoint_1164000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_large/checkpoint_1164000)
+Flan-T5 XL    | [t5_1_1_xl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xl.gin)       | [gs://t5-data/pretrained_models/t5x/flan_t5_xl/checkpoint_1138000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_xl/checkpoint_1138000)
+Flan-T5 XXL   | [t5_1_1_xxl.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/xxl.gin)     | [gs://t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/flan_t5_xxl/checkpoint_1114000)
+#### UL2 Checkpoints
+Checkpoints for 20B pretrained and FLAN-based instruction-tuned models using the
+UL2 objective from [UL2 paper](https://arxiv.org/abs/2205.05131). Checkpoints
+are released at
+https://github.com/google-research/google-research/tree/master/ul2#checkpoints.
+#### BigScience Checkpoints
+Checkpoints from the [BigScience paper](https://arxiv.org/abs/2204.05832),
+released at
+https://github.com/bigscience-workshop/architecture-objective/tree/main#checkpoints.
+#### FLIP Checkpoints
+Language-Image models trained with an alternative to CLIP, presented in the
+[FLIP paper](https://arxiv.org/abs/2212.00794). Checkpoints are released at
+https://github.com/facebookresearch/flip#results-and-pre-trained-flip-models.
+#### RankGen Checkpoints
+1.2B parameter encoder model for English to score model generations given a
+prefix for decoding from the [RankGen paper](https://arxiv.org/abs/2205.09726).
+Checkpoints are released at
+https://github.com/google-research/google-research/tree/master/rankgen.
+#### Dipper Checkpoints
+11B parameter paraphrase generation model from the
+[Dipper paper](https://arxiv.org/abs/2303.13408). Checkpoints are released at
+https://github.com/google-research/google-research/tree/master/dipper.

t5x-main/docs/overview.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ```{include} ../README.md
2	+ ```

t5x-main/docs/requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+sphinx>=4.4.0
+myst_parser>=0.16.1
+myst_nb
+sphinx-design
+sphinx-book-theme
+# Must install t5x itself for notebook execution and autodocs to work.
+.

t5x-main/docs/t5x.png ADDED Viewed

Git LFS Details

SHA256: 5e903d6a7cb99b192a23b895cd30157d5661cd0e895b3f1d6f2027fdfb1b66dd
Pointer size: 132 Bytes
Size of remote file: 1.84 MB

t5x-main/docs/tutorials.md ADDED Viewed

	@@ -0,0 +1,51 @@

+# T5X Introductory Tutorial Series
+## Overview
+This series of guides is a self-contained introduction to T5X, a modular,
+composable, research-friendly framework for high-performance, configurable,
+self-service training, evaluation, and inference of sequence models (starting
+with language) at many scales.
+## How to Use These Guides
+Most entries in this series are colab notebooks (click the blue banners to the
+right of each heading below), allowing you to run our tutorial code
+interactively. We encourage you to do that! Play around, change things, see what
+happens!
+## T5X Guides
+### Codelab 1: An Introduction to T5X
+<a href="https://colab.research.google.com/github/google-research/t5x/blob/main/t5x/notebooks/introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in colab" style="float:left"/></a><br>
+In this colab, you will learn about some of the basic T5X components and put
+them to use to run training, inference, and evaluation on natural text inputs.
+### Codelab 2: Training Deep Dive
+<a href="https://colab.research.google.com/github/google-research/t5x/blob/main/t5x/notebooks/training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in colab" style="float:left"/></a><br>
+In this colab, you will dive into how to restore T5X models from checkpoints and
+run training, while also getting an introduction to the T5X trainer.
+### Codelab 3: Inference Deep Dive
+<a href="https://colab.research.google.com/github/google-research/t5x/blob/main/t5x/notebooks/inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in colab" style="float:left"/></a><br>
+In this colab, you will dive into how the Interactive Model does decoding to
+generate predictions and scores for a given input.
+### Codelab 4: Evaluation Deep Dive
+<a href="https://colab.research.google.com/github/google-research/t5x/blob/main/t5x/notebooks/evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in colab" style="float:left"/></a><br>
+In this colab, you will dive into how the InteractiveModel takes a batch of
+inputs and targets and runs evaluation to produce various metrics.
+### More Colabs coming soon!

t5x-main/docs/usage/auxiliary.md ADDED Viewed

	@@ -0,0 +1,204 @@

+# Auxiliary Job
+## Introduction
+This page outlines the steps needed to use the auxiliary job capabilities
+available in T5X.
+## Overview
+There are a variety of situations in which running a single job is insufficient
+or suboptimal. For example, consider the following scenarios:
++   You want to keep track of evaluation (`infer_eval` or `train_eval`) metrics
+    per checkpoint, but evaluation takes a very long time due to having a large
+    eval dataset, slow decoding, or multiple tasks to evaluate.
++   You want to finetune every checkpoint on a downstream task as you train.
++   You have customized evaluation code that you want to run on every checkpoint
+    as you train, but that does not naturally fit within a `seqio.Evaluator`
+    framework.
+In cases like these, users can make use of the auxiliary job functionality. At a
+high-level, the auxiliary job will launch a new job every time a new checkpoint
+is saved. This new job can either re-use the `train.py` binary (e.g. for
+continuous finetuning) or a different one. For example, this allows users to
+perform continuous evaluation (using `eval.py`) without slowing down the
+training job. We will provide detailed examples showing how to use the auxiliary
+job for these use-cases.
+When this new job is launched, the controller will replace four gin macros:
+`MODEL_DIR`, `MIXTURE_OR_TASK_NAME`,`INITIAL_CHECKPOINT_PATH`, `TRAIN_STEPS`.
+The second of these is set by the user-controlled flag (more on this below), and
+the third one is equal to the last checkpoint seen. Aside from this, users are
+free to modify the configuration as needed. Beyond gin macros, the auxiliary job
+can also have different resource requirements, priority, and even cell placement
+from the train job.
+## Example 1: Separate evaluation job.
+### Step 1: Choose a model architecture.
+Similar to pretraining, we will need some gin configuration. For this example,
+we will use the T5-1.1-Base model.
+### Step 2: Choose a SeqIO Task/Mixture for training and evaluation.
+In this example, we will use the classic task of English-French translation from
+WMT14, which is conveniently available as a SeqIO task in the tasks file from
+the T5 tasks under the name `'wmt_enfr14_v003'`.
+### Step 3: Write a Gin config.
+Unlike pretraining or finetuning, we will need two gin files for this setup: one
+for the training job, and one for the auxiliary job. The train gin file will
+have the same requirements as the gin file for pretraining or finetuning. The
+auxiliary job gin file can leverage these gin files or be its own independent
+gin file, depending on the user’s choice. For this example, we will make a new
+gin which is mostly a wrapper around `pretrain.gin` with some additional
+hardcoded features. We will use this gin file for the train job and `eval.gin`
+for the auxiliary job.
+### Step 4: Launch your experiment.
+Our sample script will be quite similar to the one used in pretraining and
+finetuning, but with a few additional flags which we describe below.
++   `auxiliary_job_mixtures`: This is a comma-separated list of mixtures. A
+    separate auxiliary job will be run for each mixture and will replace the gin
+    macro `MIXTURE_OR_TASK_NAME`. Note that you need this flag even if you are
+    using a custom binary, which does not need a mixture since otherwise no
+    auxiliary job will run.
++   `auxiliary_job_gin_file`: This is identical to `gin_file`, except it is used
+    for the auxiliary job instead of the train job.
++   `replace_gin_file`: If True, this auxiliary launcher will not use any of the
+    gin files from train job. This is necessary when using a binary different
+    from `train.py`, since the top-level functions will not match.
++   `auxiliary_job_cell`: The cell in which to run your job. Note that this can
+    be different from the training cell.
++   `auxiliary_job_platform`: The platform to use for the auxiliary. Note that
+    this can be different from the one use for the train job, allowing users to
+    use smaller configurations for evaluation than needed for training.
++   `auxiliary_job_build_target`: The binary to use for auxiliary job.
++   `final_auxiliary_job_steps`: This flag controls how many additional steps to
+    take when using the auxiliary job for finetuning. Setting to 0 enables
+    continuous evaluation.
+We provide the sample script below.
+```sh
+declare -a ARGS=(
+--cell=iz
+--platform=jd=2x2
+--final_auxiliary_job_steps=0
+--replace_gin_file=True
+--auxiliary_job_mixtures=wmt14_enfr_v003
+--auxiliary_job_gin_file=t5x/examples/t5/t5_1_1/examples/base_wmt14enfr_eval.gin
+--auxiliary_job_cell=iz
+--auxiliary_job_platform=jd=2x2
+--auxiliary_job_build_target_path=//t5x:eval
+--gin_file=t5x/examples/t5/t5_1_1/examples/base_wmt14enfr_train.gin
+)
+gxm t5x/google/xm_launch.py "${ARGS[@]}"
+```
+## Example 2: Continuous finetuning job.
+In this example, we will be pretraining a model on a span corruption task on the
+C4 dataset, and finetuning it on the WMT'14 English-French translation task. As
+before, we will launch a new auxiliary job once every checkpoint is saved.
+However, instead of using the `eval.py` binary, we will use the `train.py`
+binary.
+### Step 1: Choose a model architecture.
+We will use the T5-1.1-Base model as in the previous example.
+### Step 2: Choose a SeqIO Task/Mixture for training and evaluation.
+For pretraining, we re-use the span coprruption task `c4_v220_span_corruption`
+available in the T5 mixtures `tasks.py` file.
+### Step 3: Write a Gin config.
+As before, we need our gin files to contain all the desired macros in them. We
+thus create two new gin files: `base_c4_pretrain.gin` for the train job and
+`base_wmtenfr14_finetune.gin` for the auxiliary job.
+### Step 4: Launch your experiment.
+Our script is quite similar to the first example, with the same flags as before
+but with the appropiate changes. The main distinction is that we must change the
+flag `final_auxiliary_job_steps` to be non-zero to start finetuning. We will
+settle for a modest 200 steps for the sake of demonstration (and evaluate every
+100 steps), but users should use larger steps in realistic scenarios. We also
+use `train.py` binary instead of `eval.py`.
+We provide the sample script below.
+```sh
+declare -a ARGS=(
+--cell=iz
+--platform=jd=2x2
+--final_auxiliary_job_steps=200
+--replace_gin_file=True
+--auxiliary_job_mixtures=wmt14_enfr_v003
+--auxiliary_job_gin_file=t5x/examples/t5/t5_1_1/examples/base_wmt14enfr_finetune.gin
+--auxiliary_job_cell=iz
+--auxiliary_job_platform=jd=2x2
+--auxiliary_job_build_target_path=//t5x:train
+--gin_file=t5x/examples/t5/t5_1_1/examples/base_c4_pretrain.gin
+)
+gxm t5x/google/xm_launch.py "${ARGS[@]}"
+```
+## Common Gotchas.
+We outline a few common error patterns that we have encountered.
++   **Not passing a value for the `auxiliary_mixtures` flag.** Even if you have
+    the desired task in your gin file, or you use a differently named macro, you
+    should still pass a value for this flag, since launch script will launch a
+    new job per value of this flag.
++   **Not setting `replace_gin_file=True` when using a different binary from
+    train.py.** This will usually yield an error that there is no `train`
+    function.
++   **No metrics being logged.** It can be tempting to use gin files usually
+    used for evaluation. However, one must ensure that the corresponding SeqIO
+    evaluators still log to the tensorboard, otherwise you won’t see the
+    metrics.
++   **Slow `train_eval`.** While the approach outlined above separates out the
+    infer_eval job, it may be that even train_eval is too slow. In these
+    situations, we suggest adding the metrics from train_eval into the
+    `metrics_fn` argument of the SeqIO task and have them be computed in the
+    auxiliary job as well. To do this with teacher forcing, you will have to use
+    `train.py` instead of `eval.py`.
++   **Using `CHECKPOINT_PATH` rather `INITIAL_CHECKPOINT_PATH`.** For legacy
+    reasons, the auxiliary job uses the macro `INITIAL_CHECKPOINT_PATH` rather
+    than `CHECKPOINT_PATH` as found in `eval.gin`. Make sure to use the latter
+    macro building your gin scripts.
++   **Gin macros being ignored when passed through the format
+    `gin.{MACRO}={VAL}`.** In the current setup, you must include all gin macros
+    in the gin script. Attempting to pass them as additional flags will usually
+    not work.
++   **Not setting `final_auxiliary_job_steps=0` when performing continuous
+    evaluation.** The current parameter controller uses this as a check. When
+    this is true, it will replace the `EVAL_OUTPUT_DIR` folder with the current
+    `MODEL_DIR`, so that the evaluation metrics are saved in the right place and
+    the metrics are showed correctly on the tensorboard.

t5x-main/docs/usage/decoding.md ADDED Viewed

	@@ -0,0 +1,199 @@

+# Decoding
+This page outlines the decoding functions that T5X provides out-of-the-box and
+how custom decoding functions can be used for a Transformer model, i.e., an
+instance of
+[`BaseTransformerModel`](https://github.com/google-research/t5x/blob/main/t5x/models.py?q=symbol:%5CbBaseTransformerModel%5Cb).
+Here we refer to decoding as a process of generating a sequence of items from a
+fixed alphabet (e.g., generating token ids from the vocabulary).
+There are two major ways to configure the decoding routine. The first method is
+to define a decode function that follows the `DecodeFnCallable` signature. This
+is more restrictive as it enforces the call signature but users don't need to
+modify the model code.
+The second method is to subclass a model class and override
+`predict_batch_with_aux` method. While this provides more flexibility, it
+requires rewriting the method.
+## Option 1: defining a decoding function
+If a desired decoding process can follow `DecodeFnCallable`, it can be
+registered as a private attribute of a
+[`BaseTransformerModel`](https://github.com/google-research/t5x/blob/main/t5x/models.py?q=symbol:%5CbBaseTransformerModel%5Cb)
+by passing it as a `decode_fn` argument to its constructor.
+### Decoding function call signature
+`DecodeFnCallable` has the following call signature
+It takes in `inputs`, which is an int32 array with a shape `[batch_size,
+max_decode_len]`. This is an input tokens to the decoder. For the standard
+encoder-decoder models like T5, this is initialized as zeros with a desired
+decoding length. The decoding function will populate the array with the sampled
+token ids and return.
+For a decoder-only architectures such as a Prefix Language Model, `inputs` can
+be a concatenated sequence of "inputs" and "targets" tokens ids.
+`tokens_to_logits` is a callable that takes in a batch of token ids and the
+current autoregressive cache, performs the forward pass and returns the
+resulting logits resulting and an updated cache. Note that for incremental
+decoding, this function operates with a single token, i.e., the length dimension
+is assumed to be 1.
+`DecodeFnCallable` is designed to be as general as possible. This results in
+some of the arguments being somewhat generic for a specialized decoding
+algorithm. For example, `num_decodes` refers to the number of decoded samples to
+be returned. In the case of beam search, `num_decodes` corresponds to what is
+commonly known as `beam_size`, with returned sequences sorted by the beam
+scores. For temperature sampling, we perform `num_decodes` *independent*
+sampling procedures with different random seeds and sort them by the log
+probability of the generated sequences.
+For custom decoding functions, there might be additional arguments. To support
+these, we provide `**kwargs`.
+Another usage of `**kwargs` is calling `decoding_fn` multiple times without
+recompiling the model. This pattern is used in
+[Prediction Service](https://github.com/google-research/t5x/blob/main/t5x/google/prediction_service/README.md).
+For a compiled model, different values of `alpha` can be passed e.g.,
+`decoder_params = {"alpha": 0.7}` where `decoder_params` is the argument to
+`predict_batch_with_aux`. It is unpacked and passed to `beam_search` function.
+Note that the Prediction Service uses
+[`predict_batch_with_aux`](https://github.com/google-research/t5x/blob/main/t5x/models.py?q=func:%5Cbpredict_batch_with_aux%5Cb),
+which is one of the two public methods. This method is useful if auxiliary
+outputs (e.g., scores of the predictions) are to be returned. The other method
+is
+[`predict_batch`](https://github.com/google-research/t5x/blob/main/t5x/models.py?q=func:%5Cbpredict_batch%5Cb),
+which simply returns the predictions.
+### Beam search
+The following lines can be added to a gin file in order to use
+[beam search](https://github.com/google-research/t5x/blob/main/t5x/decoding.py;l=881;rcl=446762159)
+as a decoding function for an encoder-decoder model.
+```gin
+models.EncoderDecoderModel.predict_batch_with_aux.num_decodes = 4
+models.EncoderDecoderModel.decode_fn = @decoding.beam_search
+decode.beam_search.alpha = 0.6
+```
+Note that we skip the gin boilerplate code such as gin dynamic registration.
+Please refer to [T5X Gin Primer](gin.md) for more details.
+The beam search behavior is controlled by the arguments passed to `beam_search`.
+We provide details for a few of them below.
+#### `num_decodes`
+If `num_decodes` are configured with `gin.register`, it is overridden by the
+value explicitly passed by the caller e.g.,
+`models.EncoderDecoderModel.predict_batch_with_aux`. This is because the
+information about `num_decodes` is needed to prepare the encoder inputs and
+outputs expanded by `num_decodes` times in the batch dimension.
+We recommend that `num_decodes` be specified *only* in
+`models.EncoderDecoderModel.predict_batch_with_aux`.
+#### `alpha`
+This is the brevity penalty introduced in
+[Wu et al. 2016](https://arxiv.org/abs/1609.08144) to penalize short sequences.
+#### `max_decode_len`
+For evaluation, we typically don't want to truncate the examples by a specified
+sequence length. Therefore, we dynamically obtain the length information from
+the batch of examples. The default behavior of `seqio.Evaluator` is to use the
+maximum length of a task but, this can be overridden.
+Since the length information is provided dynamically, we don't set
+`max_decode_len` in gin. Instead we pass the relevant `inputs` array to
+`beam_search` whose length is the dynamically determined maximum length.
+If `max_decode_len` is explicitly specified via gin, this will override the
+implicitly determined length information unless it is passed by
+`predict_batch_with_aux`.
+### Temperature sampling
+[Temperature sampling](https://github.com/google-research/t5x/blob/main/t5x/decoding.py;l=37;rcl=446762159)
+can be used for multiple decoding strategies. The following lines configures
+temperature sampling as a `decode_fn`.
+```gin
+models.EncoderDecoderModel.predict_batch_with_aux.num_decodes = 1
+models.EncoderDecoderModel.decode_fn = @decoding.temperature_sample
+decoding.temperature_sample:
+  temperature = 0.5
+  topk = 20
+```
+Similar specification can be used for other model types by replacing
+`models.EncoderDecoderModel` with the relevant model class, e.g.
+`models.PrefixLanguageModel`.
+The sampling behavior is controlled by the arguments passed to
+`temperature_sample`. We provide details for a few of them below.
+#### `temperature`
+A probabilistic model outputs a probability distribution over a pre-defined
+alphabet. For example, a language model outputs *logits*, which are unnormalized
+probability values for each item in the vocabulary. We use a language model as a
+running example. A sampling process involves *sampling* from the predicted
+distribution one item at a time conditioned on the previously generated items
+until a given number of items are generated or a sentinel token that represents
+the end of sequence is generated.
+Temperature modifies the unnormalized probability distribution at each step. For
+each item $$i$$ in the vocabulary, its probability predicted by the model is
+given by
+$$p_i \propto \exp\left(\frac{x_i}{T} \right)$$
+where $$T$$ is the temperature and $$x_i$$ is the logits value corresponding to
+item $$i$$. As $$T \to 0$$, the distribution puts all probability mass to the
+item with the highest probability. In other words, the sampling process becomes
+a greedy search.
+In the other extreme, as $$T \to \infty$$, the predicted distribution becomes
+uniform.
+#### `topk`
+By specifying strictly positive integer value for `topk`, the sampling process
+in each step is limited to the `k` items with highest probabilities. `topk` also
+uses `temperature` to modify the logits corresponding to the top `k` items.
+#### `topp`
+By specifying non-zero positive float value for `topp`, the sampling process is
+limited to a subset of the vocabulary $$V^{(p)} \subset V$$, which is defined by
+the smallest set such that
+$$\sum_{i \in V^{(p)}} p_i \ge p$$
+where $$p_i$$ is the conditional distribution at each time step for item $$i$$.
+This is called "Nucleus sampling", which was introduced by
+[Holtzman et al. ICLR 2020](https://openreview.net/forum?id=rygGQyrFvH).
+IMPORTANT: Only one of `topk` or `topp` can be used.
+## Option 2: subclassing a model class
+If `DecodeFnCallable` is not flexible enough for your custom decoding function,
+you can subclass the model class and override `predict_batch_with_aux` method.
+While the model class can be any instance of
+[`BaseTransformerModel`](https://github.com/google-research/t5x/blob/main/t5x/models.py?q=symbol:%5CbBaseTransformerModel%5Cb),
+we recommend that you subclass the existing models such as
+[`EncoderDecoderModel`](https://github.com/google-research/t5x/blob/main/t5x/models.py?q=symbol:%5CbEncoderDecoderModel%5Cb)
+and only override `predict_batch_with_aux` method.
+`predict_batch_with_aux` method also has a required call signature, but it is
+significantly more flexible. It should return a tuple of predicted sequence
+array and auxiliary outputs such as score.

t5x-main/docs/usage/eval.md ADDED Viewed

	@@ -0,0 +1,226 @@

+# Evaluating a Model
+## Introduction
+This page outlines the steps to evaluate a model with T5X on downstream tasks
+defined with [SeqIO](https://github.com/google/seqio/blob/main/README.md).
+Refer to this tutorial when you have an existing model that you want to
+evaluate. If you would like to fine-tune your model before evaluation, please
+refer to the [fine-tuning](finetune.md) tutorial. You can run evals as part of
+your fine-tuning run as well.
+## Overview
+Evaluating a model with T5X consists of the following steps:
+1.  Choose the model to evaluate.
+1.  Choose the SeqIO Task/Mixture to evaluate the model on.
+1.  Write a Gin file that configures the model, SeqIO Task/Mixture and other
+    details of your eval run.
+1.  Launch your experiment locally or on XManager.
+1.  Monitor your experiment and parse metrics.
+These steps are explained in detail in the following sections. An example run
+that evaluates a fine-tuned T5-1.1-Small checkpoint on the
+[(Open Domain) Natural Questions benchmark](https://ai.google.com/research/NaturalQuestions/)
+is also showcased.
+## Step 1: Choose a model
+To evaluate a model, you need a Gin config file that defines the model params,
+and the model checkpoint to load from. For this example, a T5-1.1-Small model
+fine-tuned on the
+[`natural_questions_open_test`](https://github.com/google-research/google-research/tree/master/t5_closed_book_qa/t5_cbqa/tasks.py?l=141&rcl=370261021)
+SeqIO Task will be used:
++   Model checkpoint -
+    [`cbqa/small_ssm_nq/model.ckpt-1110000`](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/cbqa/small_ssm_nq/)
++   Model Gin file -
+    [`t5x/configs/models/t5_1_1_small.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin).
+If you would like to fine-tune your model before evaluation, please follow the
+[fine-tuning](finetune.md) tutorial, and continue to Step 2. A list of all
+available pre-trained models (with model checkpoints and Gin config files) are
+available in the [Models](https://github.com/google-research/t5x/blob/main/docs/models.md) documentation.
+## Step 2: Choose a SeqIO Task/Mixture
+A SeqIO Task encapsulates the data source, the preprocessing logic to be
+performed on the data before querying the model, the postprocessing logic to be
+performed on model outputs, and the metrics to be computed given the
+postprocessed outputs and targets. A SeqIO Mixture denotes a collection of Tasks
+and enables fine-tuning a model on multiple Tasks simultaneously.
+Many common datasets and benchmarks, e.g. [GLUE](https://gluebenchmark.com/),
+[SuperGLUE](https://super.gluebenchmark.com/),
+[WMT](https://www.tensorflow.org/datasets/catalog/wmt_t2t_translate),
+[SQUAD](https://rajpurkar.github.io/SQuAD-explorer/),
+[CNN/Daily Mail](https://github.com/abisee/cnn-dailymail), etc. have been
+implemented as SeqIO Tasks/Mixtures and can be used directly. These
+Tasks/Mixtures are defined in
+[`t5/data/tasks.py`](https://github.com/google-research/text-to-text-transfer-transformer/tree/main/t5/data/tasks.py) and
+[`t5/data/mixtures.py`](https://github.com/google-research/text-to-text-transfer-transformer/tree/main/t5/data/mixtures.py).
+For the example run, you will evaluate the model on the Natural Questions
+benchmark, which has been implemented as the `natural_questions_open` Task in
+[`/third_party/google_research/google_research/t5_closed_book_qa/t5_cbqa/tasks.py`](https://github.com/google-research/google-research/tree/master/t5_closed_book_qa/t5_cbqa/tasks.py?l=98&rcl=370261021).
+Here's an example of a single row of preprocessed data from this Task:
+```python
+{
+    'inputs_pretokenized': 'nq question: what was the main motive of salt march',
+    'inputs': [3, 29, 1824, 822, 10, 125, 47,  8, 711, 10280, 13, 3136, 10556, 1]
+    'targets_pretokenized': 'challenge to British authority',
+    'targets': [1921, 12, 2390, 5015, 1],
+    'answers': ['challenge to British authority']
+}
+```
+## Step 3: Write a Gin Config
+After choosing the model and SeqIO Task/Mixture for your run, the next step is
+to configure your run using Gin. If you're not familiar with Gin, reading the
+[T5X Gin Primer](gin.md) is recommended.
+T5X provides a Gin file that configures the T5X eval job (located at
+[`t5x/configs/runs/eval.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/eval.gin)),
+and expects a few params from you. These params can be specified in a separate
+Gin file, or via commandline flags. Following are the required params:
++   `CHECKPOINT_PATH`: This is the path to the model checkpoint (from Step 1).
+    For the example run, set this to
+    `'gs://t5-data/pretrained_models/cbqa/small_ssm_nq/model.ckpt-1110000'`.
++   `MIXTURE_OR_TASK_NAME`: This is the SeqIO Task or Mixture name to run eval
+    on (from Step 2). For the example run, set this to
+    `'natural_questions_open'`.
++   `EVAL_OUTPUT_DIR`: A path to write eval outputs to. When launching using
+    XManager, this path is automatically set and can be accessed from the
+    XManager Artifacts page. When running locally using Blaze, you can
+    explicitly pass a directory using a flag. Launch commands are provided in
+    the next step.
+In addition to the above params, you will need to import
+[`eval.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/eval.gin) and the
+Gin file for the model, which for the example run is
+[`t5_1_1_small.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin).
+```gin
+include 'runs/eval.gin'
+include 'models/t5_small.gin'
+```
+Note that the `include` statements use relative paths in this example. You will
+pass an appropriate `gin_search_paths` flag to locate these files when launching
+your run. Absolute paths to Gin files can also be used, e.g.
+```gin
+include 't5x/configs/runs/eval.gin'
+include 't5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin'
+```
+You will also need to import the Python module(s) that register SeqIO Tasks and
+Mixtures used in your run. For the example run, we add `import
+google_research.t5_closed_book_qa.t5_cbqa.tasks`
+since it is where 'glue_v002_proportional' is registered.
+If you choose a module that is not included as a dependency in the T5X trainer
+[binary](https://github.com/google-research/t5x/blob/main/t5x/BUILD;l=76;rcl=398627055), or if you
+have defined your gin config file in a location other than the
+[T5X config directory](https://github.com/google-research/t5x/blob/main/t5x/configs/), you will
+need to follow the instructions in the
+[Advanced Topics section](#custom-t5x-binaries) to link in the custom gin file
+and/or task definition.
+Note that for most common Task/Mixtures, such as the `glue_v002_proportional`
+used in this tutorial, the necessary modules are already included. It is also
+possible to skip writing a Gin file and instead pass the params as flags when
+launching the eval job (see instructions in Step 4).
+Finally, your Gin file should look like this:
+```gin
+include 't5x/configs/runs/eval.gin'
+include 't5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin'
+# Register necessary SeqIO Tasks/Mixtures.
+import google_research.t5_closed_book_qa.t5_cbqa.tasks
+CHECKPOINT_PATH = 'gs://t5-data/pretrained_models/cbqa/small_ssm_nq/model.ckpt-1110000'
+MIXTURE_OR_TASK_NAME = 'natural_questions_open'
+```
+See
+[`t5_1_1_small_cbqa_natural_questions.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/examples/eval/t5_1_1_small_cbqa_natural_questions.gin)
+for this example.
+In this example, we run the evaluation on one checkpoint. It is common to
+evaluate with multiple checkpoints. We provide an easy way to do so *without*
+having to recompile the model graph for each checkpoints. This is simply done by
+adding `utils.RestoreCheckpointConfig.mode = "all"` to a gin file. Our
+`t5x/configs/runs/eval.gin` uses "specific" mode.
+## Step 4: Launch your experiment
+To launch your experiment locally (for debugging only; larger checkpoints may
+cause issues), run the following on commandline:
+```sh
+EVAL_OUTPUT_DIR="/tmp/model-eval/"
+python -m t5x.eval_unfragmented \
+  --gin_file=t5x/google/examples/flaxformer_t5/configs/examples/eval/t5_1_1_small_cbqa_natural_questions.gin \
+  --gin.EVAL_OUTPUT_DIR=\"${EVAL_OUTPUT_DIR}\" \
+  --alsologtostderr
+```
+Note that relative paths can be used to locate the gin files. For that, multiple
+comma-separated paths can be passed to the `gin_search_paths` flag, and these
+paths should contain all Gin files used or included in your experiment.
+You can have a look inside
+[`eval.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/eval.gin) to see
+other useful parameters that it is possible to pass in, including dataset split,
+batch size, and random seed.
+## Step 5: Monitor your experiment and parse metrics
+After evaluation has completed, you can parse metrics into CSV format using the
+following script:
+```sh
+EVAL_OUTPUT_DIR= # from Step 4 if running locally, from XManager Artifacts otherwise
+VAL_DIR="$EVAL_OUTPUT_DIR/inference_eval"
+python -m t5.scripts.parse_tb \
+  --summary_dir="$VAL_DIR" \
+  --seqio_summaries \
+  --out_file="$VAL_DIR/results.csv" \
+  --alsologtostderr
+```
+## Next Steps
+Now that you have successfully evaluated a model on the Natural Questions
+benchmark, here are some topics you might want to explore next:
++   [Running inference on a model.](infer.md)
++   [Fine-tuning a model.](finetune.md)
++   [Training a model from scratch.](pretrain.md)
+We also touch upon a few advanced topics related to evaluations below that might
+be useful, especially when customizing your eval job.
+## Advanced Topics
+### Defining a custom SeqIO Task/Mixture to evaluate on {.no-toc}
+Refer to [SeqIO documentation](https://github.com/google/seqio/blob/main/README.md).
+### Defining a custom metric to evaluate
+The best way to define a custom metric is to define a new SeqIO Task/Mixture
+that contains this custom metric. Please refer to the SeqIO Documentation on
+[custom metrics](https://github.com/google/seqio/blob/main/README.md#metrics).

t5x-main/docs/usage/finetune.md ADDED Viewed

	@@ -0,0 +1,286 @@

+# Fine Tuning a Model
+## Introduction
+This page outlines the steps to fine-tune an existing pre-trained model with T5X
+on common downstream tasks defined with [SeqIO](https://github.com/google/seqio/blob/main/README.md). This is one of
+the simplest and most common use cases of T5X. If you're new to T5X, this
+tutorial is the recommended starting point.
+## Overview
+Fine-tuning a model with T5X consists of the following steps:
+1.  Choose the pre-trained model to fine-tune.
+2.  Choose the SeqIO Task/Mixture to fine-tune the model on.
+3.  Write a Gin file that configures the pre-trained model, SeqIO Task/Mixture
+    and other details of your fine-tuning run.
+4.  Launch your experiment locally or on XManager.
+5.  Monitor your experiment and parse metrics.
+These steps are explained in detail in the following sections. An example run
+that fine-tunes a T5-small checkpoint on WMT14 English to German translation
+benchmark is also showcased.
+## Step 1: Choose a pre-trained model
+To use a pre-trained model, you need a Gin config file that defines the model
+params, and the model checkpoint to load from. For your convenience, TensorFlow
+checkpoints and Gin configs for common T5 pre-trained models have been made
+available for use in T5X. A list of all the available pre-trained models (with
+model checkpoints and Gin config files) are available in the
+[Models](https://github.com/google-research/t5x/blob/main/docs/models.md) documentation.
+For the example run, you will use the T5 1.1 Small model. The Gin file for this
+model is located at
+[`/t5x/examples/t5/t5_1_1/small.gin`](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin),
+and the checkpoint is located at
+[`gs://t5-data/pretrained_models/t5x/t5_1_1_small`](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/t5x/t5_1_1_small).
+## Step 2: Choose a SeqIO Task/Mixture
+A SeqIO Task encapsulates the data source, the preprocessing logic to be
+performed on the data before querying the model, the postprocessing logic to be
+performed on model outputs, and the metrics to be computed given the
+postprocessed outputs and targets. A SeqIO Mixture denotes a collection of Tasks
+and enables fine-tuning a model on multiple Tasks simultaneously.
+### Standard Tasks
+Many common datasets and benchmarks, e.g. [GLUE](https://gluebenchmark.com/),
+[SuperGLUE](https://super.gluebenchmark.com/),
+[WMT](https://www.tensorflow.org/datasets/catalog/wmt_t2t_translate),
+[SQUAD](https://rajpurkar.github.io/SQuAD-explorer/),
+[CNN/Daily Mail](https://github.com/abisee/cnn-dailymail), etc. have been
+implemented as SeqIO Tasks/Mixtures and can be used directly. These
+Tasks/Mixtures are defined in
+[`third_party/py/t5/data/tasks.py`](https://github.com/google-research/text-to-text-transfer-transformer/tree/main/t5/data/tasks.py)
+and
+[`third_party/py/t5/data/mixtures.py`](https://github.com/google-research/text-to-text-transfer-transformer/tree/main/t5/data/mixtures.py).
+For the example run, you will fine-tune the model on the WMT14 English to German
+translation benchmark, which has been implemented as the
+[`wmt_t2t_ende_v003`](https://github.com/google-research/text-to-text-transfer-transformer/tree/main/t5/data/tasks.py;l=209;rcl=417815592)
+Task.
+### Custom Tasks
+It is also possible to define your own custom task. See the
+[SeqIO documentation](https://github.com/google/seqio/blob/main/README.md) for how to do this. As a note, Tasks
+defined using the
+[old T5 codebase](https://github.com/google-research/text-to-text-transfer-transformer/tree/main/t5/data/dataset_providers.py)
+may also be used by T5X. If using a custom Task, you will need to follow the
+instructions in the [Advanced Topics section](#custom-t5x-binaries) at the end
+of this tutorial to make sure the module containing your task is included.
+When defining a custom task, you have the option to cache it on disk before
+fine-tuning. The instructions for this are
+[here](https://github.com/google/seqio/blob/main/README.md#optional-offline-caching). Caching may improve
+performance for tasks with expensive pre-processing. By default, T5X expects
+tasks to be cached. To finetune on a task that has not been cached, set
+`--gin.USE_CACHED_TASKS=False`.
+## Step 3: Write a Gin Config
+After choosing the pre-trained model and SeqIO Task/Mixture for your run, the
+next step is to configure your run using Gin. If you're not familiar with Gin,
+reading the [T5X Gin Primer](gin.md) is recommended.
+T5X provides a Gin file that configures the T5X trainer for fine-tuning (located
+at
+[`t5x/configs/runs/finetune.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/finetune.gin)),
+and expects a few params from you. These params can be specified in a separate
+Gin file, or via commandline flags. Following are the required params:
++   `INITIAL_CHECKPOINT_PATH`: This is the path to the pre-trained checkpoint
+    (from Step 1). For the example run, set this to
+    `'gs://t5-data/pretrained_models/t5x/t5_1_1_small/checkpoint_1000000'`.
++   `TRAIN_STEPS`: Number of fine-tuning steps. This includes the number of
+    steps that the model was pre-trained for, so make sure to add the step
+    number from the `INITIAL_CHECKPOINT_PATH`. For the example run, to fine-tune
+    for `20_000` steps, set this to `1_020_000`, since the initial checkpoint is
+    the `1_000_000`th step.
++   `MIXTURE_OR_TASK_NAME`: This is the SeqIO Task or Mixture name to run (from
+    Step 2). For the example run, set this to `'wmt_t2t_ende_v003'`.
++   `TASK_FEATURE_LENGTHS`: This is a dict mapping feature key to maximum int
+    length for that feature. After preprocessing, features are truncated to the
+    provided value. For the example run, set this to `{'inputs': 256, 'targets':
+    256}`.
++   `MODEL_DIR`: A path to write fine-tuned checkpoints to. When launching using
+    XManager, this path is automatically set and can be accessed from the
+    XManager Artifacts page. When running locally using Blaze, you can
+    explicitly pass a directory using a flag. Launch commands are provided in
+    the next step.
++   `LOSS_NORMALIZING_FACTOR`: When fine-tuning a model that was pre-trained
+    using Mesh Tensorflow (e.g. the public T5 / mT5 / ByT5 models), this should
+    be set to `pretraining batch_size` * `pretrained target_token_length`. For
+    T5 and T5.1.1: `2048 * 114`. For mT5: `1024 * 229`. For ByT5: `1024 * 189`.
+In addition to the above params, you will need to include
+[`finetune.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/finetune.gin)
+and the Gin file for the pre-trained model, which for the example run is
+[`t5_1_1/small.gin`](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin).
+```gin
+include 't5x/configs/runs/finetune.gin'
+include 't5x/examples/t5/t5_1_1/small.gin'
+```
+You will also need to import the Python module(s) that register SeqIO Tasks and
+Mixtures used in your run. For the example run, we add `import t5.data.tasks`
+since it is where `wmt_t2t_ende_v003` is registered.
+Finally, your Gin file should look like this:
+```gin
+include 't5x/configs/runs/finetune.gin'
+include 't5x/examples/t5/t5_1_1/small.gin'
+# Register necessary SeqIO Tasks/Mixtures.
+import t5.data.tasks
+MIXTURE_OR_TASK_NAME = "wmt_t2t_ende_v003"
+TASK_FEATURE_LENGTHS = {"inputs": 256, "targets": 256}
+TRAIN_STEPS = 1_020_000  # 1000000 pre-trained steps + 20000 fine-tuning steps.
+DROPOUT_RATE = 0.0
+INITIAL_CHECKPOINT_PATH = "gs://t5-data/pretrained_models/t5x/t5_1_1_small/checkpoint_1000000"
+LOSS_NORMALIZING_FACTOR = 233472
+```
+See
+[`t5x/examples/t5/t5_1_1/examples/small_wmt_finetune.gin`](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/small_wmt_finetune.gin)
+for this example.
+## Step 4: Launch your experiment
+To launch your experiment locally (for debugging only; larger checkpoints may
+cause issues), run the following on commandline:
+```sh
+MODEL_DIR="/tmp/finetune-model/"
+python -m t5x.train_unfragmented \
+  --gin_file=t5x/examples/t5/t5_1_1/examples/small_wmt_finetune.gin \
+  --gin.MODEL_DIR=\"${MODEL_DIR}\" \
+  --alsologtostderr
+```
+Note that multiple comma-separated paths can be passed to the `gin_search_paths`
+flag, and these paths should contain all Gin files used or included in your
+experiment.
+After fine-tuning has completed, you can parse metrics into CSV format using the
+following script:
+```sh
+MODEL_DIR= # from Step 4 if running locally, from XManager Artifacts otherwise
+VAL_DIR="$MODEL_DIR/inference_eval"
+python -m t5.scripts.parse_tb \
+  --summary_dir="$VAL_DIR" \
+  --seqio_summaries \
+  --out_file="$VAL_DIR/results.csv" \
+  --alsologtostderr
+```
+### Metric Explanations
+By default, t5x logs many metrics to TensorBoard, many of these seem similar but
+have important distinctions.
+The first two graphs you will see are the `accuracy` and `cross_ent_loss`
+graphs. These are the *token-level teacher-forced* accuracy and cross entropy
+loss respectively. Each of these graphs can have multiple curves on them. The
+first curve is the `train` curve. This is calculated as a running sum than is
+then normalized over the whole training set. The second class of curves have the
+form `training_eval/${task_name}`. These curves are created by running a subset
+(controlled by the `eval_steps` parameter of the main train function) of the
+validation split of `${task_name}` through the model and calculating these
+metrics using teacher-forcing. These graphs can commonly be used to find
+"failure to learn" cases and as a warning sign of overfitting, but these are
+often not the final metrics one would report on.
+The second set of graphs are the ones under the collapsible `eval` section in
+TensorBoard. These graphs are created based on the `metric_fns` defined in the
+SeqIO task. The curves on these graphs have the form
+`inference_eval/${task_name}`. Values are calculated by running the whole
+validation split through the model in inference mode, commonly auto-regressive
+decoding or output scoring. Most likely these are the metrics that will be
+reported.
+More information about the configuration of the datasets used for these
+different metrics can be found [here](#train-train-eval-and-infer-eval).
+In summary, the metric you actually care about most likely lives under the
+`eval` tab rather, than in the `accuracy` graph.
+## Next Steps
+Now that you have successfully fine-tuned a pre-trained model on WMT, here are
+some topics you might want to explore next:
++   [Evaluating a fine-tuned model.](eval.md)
++   [Running inference on a fine-tuned model.](infer.md)
++   [Training a model from scratch.](pretrain.md)
+We also touch upon a few advanced topics related to fine-tuning below that might
+be useful, especially when customizing your fine-tuning job.
+## Advanced Topics
+### `train`, `train_eval` and `infer_eval` {.no-toc}
+A
+[`DatasetConfig`](https://github.com/google-research/t5x/blob/main/t5x/utils.py?l=113&rcl=375475889)
+object is used to configure loading SeqIO Tasks/Mixtures for training and eval.
+If you take a closer look at
+[`runs/finetune.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/finetune.gin),
+you will see that there are three `DatasetConfig` objects defined and passed to
+the train function: `train_dataset_cfg`, `train_eval_dataset_cfg`,
+`infer_eval_dataset_cfg`. Here's a brief description of these configs:
++   `train`: This configures the Task/Mixture that the model will be fine-tuned
+    on.
++   `train_eval`: This configures the Task/Mixture that is used to compute
+    training metrics on the eval split, e.g. perplexity. These metrics are
+    defined in the
+    [`Model`](https://github.com/google-research/t5x/blob/main/t5x/models.py;l=257-267;rcl=394045248)
+    class and the eval fn is located
+    [here](https://github.com/google-research/t5x/blob/main/t5x/trainer.py;l=257;rcl=398487394).
++   `infer_eval`: This configures the Task/Mixture that is used to compute
+    metrics on inferred model outputs (e.g., comparing decoded model outputs and
+    targets). These metrics are defined in the SeqIO Task/Mixture and the eval
+    fn is located
+    [here](https://github.com/google/seqio/tree/main/seqio/evaluation.py?l=423&rcl=373643592)
+### Using separate SeqIO Tasks/Mixtures for fine-tuning and eval {.no-toc}
+Commonly, the same SeqIO Task/Mixture is used for training and eval. It is set
+by the `MIXTURE_OR_TASK_NAME` macro in your fine-tune Gin file from Step 3
+above, and is passed to `train_dataset_cfg`, `train_eval_dataset_cfg`,
+`infer_eval_dataset_cfg`. The `train` split is used for training and the
+`validation` split is used for evals. However, you can override these params in
+your fine-tune Gin config. For example, if you want to fine-tune on all GLUE
+tasks but evaluate only on GLUE STS benchmark, you can override the SeqIO
+Task/Mixture used for `infer_eval` in your fine-tune Gin file as follows:
+```gin
+include 'runs/finetune.gin'
+include 'models/t5_small.gin'
+MIXTURE_OR_TASK_NAME = 'glue_v002_proportional'
+MIXTURE_OR_TASK_MODULE = 't5.data.tasks'
+TASK_FEATURE_LENGTHS =  {'inputs': 512, 'targets': 84}
+TRAIN_STEPS = 1_500_000  # includes 1_000_000 pretrain steps
+INITIAL_CHECKPOINT_PATH = 'gs://t5-data/pretrained_models/t5x/t5_small/checkpoint_1000000'
+infer_eval/utils.DatasetConfig.mixture_or_task_name = 'glue_stsb_v002'
+```
+Other params in `finetune.gin` can be overridden in the same way.
+### Defining a custom SeqIO Task/Mixture to fine-tune on {.no-toc}
+Refer to [SeqIO documentation](https://github.com/google/seqio/blob/main/README.md).

t5x-main/docs/usage/gin.md ADDED Viewed

	@@ -0,0 +1,395 @@

+# Gin Primer
+[Gin](https://github.com/google/gin-config/blob/main/README.md) is a lightweight configuration framework for Python,
+based on dependency injection. While T5X does not employ gin in its core
+libraries, it is used to configure runs of the `train`, `eval`, and `infer`
+scripts. This usage is a bit different (and more limited) than how gin is
+typically applied, so this primer should be useful even for those who may be
+familiar with gin from other libaries (e.g., T5 or Mesh TensorFlow).
+Nevertheless, you may still find it helpful to refer to the
+[gin documentation](https://github.com/google/gin-config/blob/main/README.md) for more background.
+[TOC]
+## Gin in T5X Scripts
+Rather than plumbing run arguments and hyperparameters through via limited set
+of command-line flags or a flat configuration schema, T5X's gin integration
+allows you to parameterize the top-level run functions (`train`, `evaluate`, and
+`infer`) as well as any object or function that is passed to them. This enables
+a vast amount of flexibility over your runs without needing to modify any code
+within the core T5X library.
+For example, you can implement a Python class in your own codebase (e.g., a
+custom model or trainer) and use gin to pass an instance of it to the T5X XM
+launcher without having to fork any code. Previously you needed to implement
+every experimental idea in the core library (no matter how widely used it would
+be) and add a ConfigDict flag to enable/disable it, resulting in significant
+code debt over time.
+On the other hand, gin can sometimes be too powerful, allowing users the ability
+to bind arguments throughout a codebase, which makes it difficult or impossible
+to update "private" internal interfaces. However, by limiting configurability to
+a single top-level function and its arguments we can better control the
+configurable surface to public interfaces and user-owned code, and also avoid
+unintended side effects.
+### An Example
+Let's look at the `evaluate` call signature from
+[eval.py](https://github.com/google-research/t5x/blob/main/t5x/eval.py) as an example:
+```py
+def evaluate(*,
+             model: models.BaseModel,
+             dataset_cfg: utils.DatasetConfig,
+             restore_checkpoint_cfg: utils.RestoreCheckpointConfig,
+             partitioner: partitioning.BasePartitioner,
+             output_dir: str):
+  """Evaluation function.
+  Args:
+    model: The model object to use for inference.
+    dataset_cfg: Specification for the dataset to infer based on.
+    restore_checkpoint_cfg: Specification for the model parameter checkpoint to
+      load.
+    partitioner: The partitioner for the model parameters and
+      data across devices.
+    output_dir: Path to directory to write temporary files and final results.
+  """
+  ...
+```
+In the binary, the user-provided gin configuration file will be parsed. It
+specifies which values should be bound to the `evaluate` argument, after which
+we can directly call the fully-bound function without any arguments. Basically,
+we are creating a custom closure of `evaluate` (a la `functools.partial`) but
+specifying the arguments via gin instead of Python.
+Furthermore, this ability to bind custom arguments is recursive. Not only can we
+bind the arguments of `evaluate`, but we can also bind the constructor and
+method arguments of the instance of `models.BaseModel` that we pass to
+`evaluate`.
+Let's now look at an example of a gin configuration for parameterizing
+`evaluate`, specifically evaluating a
+[T5 model fine-tuned for closed book question answering](http://goo.gle/t5-cbqa)
+on [Natural Questions Open](https://ai.google.com/research/NaturalQuestions):
+```py
+from __gin__ import dynamic_registration
+import __main__ as eval_script
+from t5x import models
+from t5x import partitioning
+from t5x import utils
+MODEL = %gin.REQUIRED
+eval_script.evaluate:
+  model = %MODEL
+  output_dir = '/tmp/t5x_eval'
+  dataset_cfg = @utils.DatasetConfig()
+  partitioner = @partitioning.PjitPartitioner()
+  restore_checkpoint_cfg = @utils.RestoreCheckpointConfig()
+# Load model with overrides.
+include 'models/t5_large.gin'
+models.EncoderDecoderModel.predict_batch_with_aux.num_decodes = 1
+utils.DatasetConfig:
+  mixture_or_task_name = 'natural_questions_open'
+  split = 'test'
+  task_feature_lengths = None
+  batch_size = 32
+  shuffle = False
+  seed = 0
+  use_cached = False
+  pack = False
+  use_custom_packing_ops = False
+  module = 'google_research.t5_closed_book_qa.t5_cbqa.tasks'
+partitioning.PjitPartitioner:
+  num_partitions = 1
+utils.RestoreCheckpointConfig:
+  mode = 'specific'
+  path = 'gs://t5-data/pretrained_models/cbqa/large_ssm_nqo'
+  assignment_map = None
+  strict = True
+  dtype = None
+```
+Let's go through this block-by-block.
+```py
+from __gin__ import dynamic_registration
+```
+The first line imports a new gin feature (see cl/372624800 for more details) to
+allow us to register functions and objects for configuration from within the gin
+file itself without having to modify or decorate functions from the imported
+packages.
+```py
+import __main__ as eval_script
+from t5x import models
+from t5x import utils
+```
+The second block imports the modules containing the components we plan to
+configure in this file and is required for dynamic registration. Note that only
+those functions and objects that we specify below will actually be configured,
+not everything in the module. Also, as is the case in Python, the binary module
+is referred as `__main__`, although we rename it to `eval_script` for clarity in
+the rest of the config.
+```py
+MODEL = %gin.REQUIRED
+```
+The third block creates a
+[gin macro](https://github.com/google/gin-config/tree/master/docs/index.md#gin-macros)
+(essentially a lazy reference) and for now sets it to refer to the special macro
+`gin.REQUIRED`, which will cause a failure during parsing of the configuration
+if not updated via a later assignment in the config file or command-line flags
+(see [below](#command-line-usage)).
+```py
+eval_script.evaluate:
+  model = %MODEL
+  output_dir = '/tmp/t5x_eval'
+  dataset_cfg = @utils.DatasetConfig()
+  partitioner = @partitioning.PjitPartitioner()
+  restore_checkpoint_cfg = @utils.RestoreCheckpointConfig()
+```
+The fourth block specifies the binding for the `evaluate` function. For `model`,
+we pass the value of the `MODEL` macro (to be defined later). For `output_dir`
+we pass a string path. For `dataset_cfg`, `restore_checkpoint_cfg`, and
+`partitioner`, we pass instantiations of `DatasetConfig`,
+`RestoreCheckpointConfig`, and `PjitPartitioner`, which are defined in
+[utils.py](https://github.com/google-research/t5x/blob/main/t5x/utils.py) and
+[partitioning.py](https://github.com/google-research/t5x/blob/main/t5x/partitioning.py)
+respectively. The '@' prefix tells gin that the following is a configured
+function or class, and the '()' suffix signifies that it should be called (in
+the cases of class, this means calling the constructor). If we wanted to pass in
+the closure (or a partially bound) function instead of its return value, we
+would leave off the parentheses.
+The remainder of the file deals with defining the `MODEL` macro and fully
+binding these constructors.
+```py
+# Load model with overrides.
+include 't5x/examples/t5/t5_1_1/large.gin'
+models.EncoderDecoderModel.predict_batch_with_aux.num_decodes = 1
+```
+Although we could define `MODEL = model.EncoderDecoderModel()` here, we prefer
+to create a separate gin file that defines it. This makes it easier to reuse
+parts of the common configurations. All of the bindings in the newly included
+file are read and override any conflicting ones defined so far in this file.
+It's equivalent to copy and pasting the contents of the included file at this
+location in the config. If you want to see how the model itself is instantiated,
+you can refer to
+[t5_1_1/large.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/large.gin)
+(which simply overrides a few values from
+[t5_1_1/base.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/base.gin)).
+The final line of this block shows an example of how you can modify the default
+arguments of the `EncoderDecoderModel` instance referenced by `%MODEL`, in this
+case changing the default beam size it will use during prediction. Notice that
+since we are only binding one argument here, we choose to write it on a single
+line instead of using the block binding syntax used elsewhere in the file.
+```py
+utils.DatasetConfig:
+  mixture_or_task_name = 'natural_questions_open'
+  split = 'test'
+  task_feature_lengths = None
+  batch_size = 32
+  shuffle = False
+  seed = 0
+  use_cached = False
+  pack = False
+  use_custom_packing_ops = False
+  module = 'google_research.t5_closed_book_qa.t5_cbqa.tasks'
+partitioning.PjitPartitioner:
+  num_partitions = 1
+utils.RestoreCheckpointConfig:
+  mode = 'specific'
+  path = 'gs://t5-data/pretrained_models/cbqa/large_ssm_nqo'
+  assignment_map = None
+  strict = True
+  dtype = None
+```
+The last 3 blocks are fairly straightforward. They are effectively setting the
+attributes of these dataclasses by binding values to their constructors that
+will be used when they are instantiated and passed to `evaluate`, as specified
+in the fourth block.
+### Scoping
+The above example lacks one key component of gin:
+[scopes](https://github.com/google/gin-config/blob/main/README.md#4-configuring-the-same-function-in-different-ways-scopes).
+What happens if you need to use a class or function multiple times but with
+different bound values?
+A clear example of this is in the top-level `train` function (in
+[train.py](https://github.com/google-research/t5x/blob/main/t5x/train.py)). The call signature
+includes 3 different instances of `utils.DatasetConfig`: one for the train
+dataset, one for the "train-eval" dataset (used for evaluation with teacher
+forcing), and one for the "infer-eval" dataset (used for evaluation with
+inference/decoding).
+The solution is to prefix each instance with a unique identifier both when
+specifying where it is to be passed to `train` and when binding its arguments.
+For example, the gin file might look like the following (skipping the irrelevant
+bits):
+```py
+...
+train_script.train:
+  train_dataset_cfg = @train/utils.DatasetConfig()
+  train_eval_dataset_cfg = @train_eval/utils.DatasetConfig()
+  infer_eval_dataset_cfg = @infer_eval/utils.DatasetConfig()
+  ...
+train/utils.DatasetConfig:
+  mixture_or_task_name = 'train_mixture'
+  split = 'train'
+  ...
+train_eval/utils.DatasetConfig:
+  mixture_or_task_name = 'eval_mixture'
+  split = 'validation'
+  ...
+infer_eval/utils.DatasetConfig:
+  mixture_or_task_name = 'eval_mixture'
+  split = 'test'
+  ...
+```
+We have therefore configured 3 different scoped-versions of
+`utils.DatasetConfig` producing 3 separate instances that are passed to `train`.
+Note that these three scopes will all inherit from the base scope, so if you
+want to set a shared binding, you may directly configure `utils.DatasetConfig`
+without a scope prefix.
+## Command-Line Usage
+So now that you have a gin config, how do you pass it to the script? There are
+two ways: gin files and override flags.
+1.  **Gin Files** You have already seen an example of a gin file above. You can
+    specify the gin file(s) to use in your script via the `--gin_file` flag. If
+    you want to load multiple gin files, you can set the flag multiple times and
+    the files will be loaded in order, with the second potentially overriding
+    the first when there are conflicts. It is possible to supply a
+    comma-separate list of search prefixes via `--gin_search_paths` and then
+    only specify the relative path to the `--gin_file` flags. However, we
+    strongly recommend against using `--gin_search_paths`. Using absolute paths
+    via the `--gin_file` flags will reduce sources of ambiguity and improve the
+    consistency of your scripts.
+1.  **Override Flags** Gin flags allow for more fine-grained overrides of any
+    configurable aspect of your run. These flags follow the single-line binding
+    format from the above example with the addition of a `--gin.` prefix. For
+    example, if you want to override the dataset shuffling, you can set
+    `--gin.utils.DatasetConfig.shuffle=False`. In the train setting where there
+    are multiple datasets, you must supply the appropriate scope, e.g.,
+    `--gin.train/utils.DatasetConfig.shuffle=False`. These bindings are
+    processed in order *after* the gin files are loaded, and therefore overwrite
+    any previously assigned value in the gin files.
+**Note:** when supplying a string, dict, list, or tuple value via a flag, you
+must put it in quotes. In the case of strings, it requires escaped quotes
+(`\"<string>\"`). For example: `--gin.utils.DatasetConfig.split=\"validation\"`,
+`--gin.utils.DatasetConfig.task_feature_lengths="{'inputs': 512, 'targets':
+84}"`, and `--gin.dense.MlpBlock.activations="('dense', 'gelu')"`
+### An Example
+An example where you may need multiple files is with the `train` script.
+You can first specify which model you want to train by supplying a gin file
+containing its definition, for example:
+[t5_1_1/small.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin).
+You may then specify a run config that supplies some of the common defaults. For
+example, if you are doing pretraining you can use
+[runs/pretrain.gin](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/pretrain.gin),
+and if you are doing finetuning, you can use
+[runs/finetune.gin](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/finetune.gin).
+We can apply these two files with the following command:
+```sh
+python -m t5x.train_unfragmented \
+  --gin_file=t5x/examples/t5/t5_1_1/small.gin \
+  --gin_file=t5x/configs/runs/finetune.gin \
+  --logtostderr
+```
+However, running this command will give you an error like the following:
+```sh
+ValueError: MODEL_DIR/macro.value set to `%gin.REQUIRED` but not subsequently overridden.
+```
+This is because the config still includes some `gin.REQUIRED` macros that you'll
+need to override with the details of your run. At the top of
+[runs/finetune.gin](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/finetune.gin)
+you'll see the list of required overrides, which we will populate for finetuning
+on WMT in the updated launch command here:
+```sh
+python -m t5x.train_unfragmented \
+  --gin_file=t5x/examples/t5/t5_1_1/small.gin \
+  --gin_file=t5x/configs/runs/finetune.gin \
+  --gin.MIXTURE_OR_TASK_NAME=\"wmt_t2t_ende_v003\" \
+  --gin.MIXTURE_OR_TASK_MODULE=\"t5.data.mixtures\" \
+  --gin.TASK_FEATURE_LENGTHS="{'inputs': 256, 'targets': 256}" \
+  --gin.TRAIN_STEPS=1_020_000 \
+  --gin.MODEL_DIR=\"/tmp/t5_1_1_base_finetune_gin\" \
+  --gin.INITIAL_CHECKPOINT_PATH=\"gs://t5-data/pretrained_models/t5x/t5_1_1_small/checkpoint_1000000\" \
+  --logtostderr
+```
+Note you may still override any registered bindings. For example, to disable
+inference evaluation you may add `--gin.train.infer_eval_dataset_cfg=None`.
+### A File-only Example
+At the beginning of the primer, we saw a fully-specified run config. We can do
+something similar with the previous example to create a self-contained run
+configuration.
+[t5_1_1/examples/small_wmt_finetune.gin](https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/small_wmt_finetune.gin)
+is just such an example that allows you to exactly duplicate the previous launch
+command simply by calling:
+```sh
+python -m t5x.train_unfragmented \
+  --gin_file=t5x/examples/t5/t5_1_1/examples/small_wmt_finetune.gin \
+  --gin.MODEL_DIR=\"/tmp/t5_1_1_small_finetune_gin\" \
+  --logtostderr
+```
+## Logging
+After your gin files and flag overrides are parsed, the complete configuration
+will be logged to INFO, written to `config.gin` in the output directory, and
+added to a TensorBoard summary.
+It is highly recommended that you review this generated config to ensure that
+your overrides are working as expected.

t5x-main/docs/usage/gpu-usage.md ADDED Viewed

	@@ -0,0 +1,87 @@

+# GPU Scripts
+# Warning!
+An updated version of T5x with optimized GPU performance (18-80% perf gains!) and new features, including FP8 with [Transformer Engine](https://github.com/NVIDIA/TransformerEngine) and H100 support can be found here: [NVIDIA Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x).
+-----
+**NVIDIA no longer recommends using this repository and won't be updating it further.**
+-----
+The [t5x/contrib/gpu](../../t5x/contrib/gpu) directory contains scripts optimized for GPU usage.
+Install with `pip install -r pile_requirements.txt` to get all pile dependencies.
+## Building the container
+The Dockerfile in `t5x/contrib/gpu` given will build a container with all gpu/pile dependencies. It can be built with `t5x/contrib/gpu/docker/build.sh <name>`
+## Running interactively
+Note: this should only be done with singlenode jobs and/or for downloading the pile. Use `t5x/contrib/gpu/docker/interactive_pull_and_launch.sh`. This takes arguments for the URL to pull a container from and the location of the dataset directory to mount. For example:
+`t5x/contrib/gpu/docker/interactive_pull_and_launch.sh [URL] /my/dataset/dir`
+## Downloading The Pile
+Run `download_the_pile.py` to download the pile. It will download to the directory set in the environment variable: `TFDS_DATA_DIR`. After that, set the `TFDS_DATA_DIR` to the same directory in your scripts to use.
+## Single Node runs
+Pretraining and Finetuning can be done with `singlenode_*.sh`. These will build a T5X model with the Adam optimizer and relevant parameters. These will allow multi-gpu on one host.
+## Multi Node runs
+For a SLURM+pyxis cluster, `example*.sub` files provide example slurm submit files (edit with your details), which call `multiprocess*.sh` to execute training. You can add a binding script in the `.sub` file for your cluster, or remove it entirely (dropping some throughput)
+## Convergence
+For our Pile convergence runs, we used a Global batch size of 2304 for XXL and 2048 for all other models, where GBS is defined as #GPUs * BS/GPU / Tensor Parallel(TP). Below are example (tested) hardware topologies on NVIDIA DGX A100 (8x A100 80G) nodes.
+| size | #GPUs |  TP   | BS / GPU | Sequences/Sec | Estimated Walltime | MNLI 2.0 - matched | SQuAD v1.1 (EM/F1) | Convergence Log |
+| ---- | ----- | ----- | -------- | ------------- | ------------------ | ------------------ | ------------------ | --------------- |
+| small| 8     | 1     | 256      | ~3168         | 7.48 days          | 83.06%             | 78.33 / 86.63      | [log](https://tensorboard.dev/experiment/lWnHal7PRnOLeZuewyWVxQ/#scalars&_smoothingWeight=0) |
+| large| 64    | 1     | 32       | ~3886         | 6.10 days          | 90.50%             | 87.31 / 94.04      | [log](https://tensorboard.dev/experiment/aOxJBIvTQBeTJ8XGXxaL6Q/#scalars&_smoothingWeight=0) |
+| xl   | 256   | 1     | 8        | ~3652         | 6.49 days          | 91.15%             | 89.36 / 95.29      | [log](https://tensorboard.dev/experiment/vuRoEYgkRgWiEtbvgxlOqw/#scalars&_smoothingWeight=0) |
+| xxl  | 512   | 8     | 36       | ~1346         | 19.81 days         | N/A(partial run)   | N/A(partial run)   | N/A(partial run)|
+Note: Convergence (as shown in log) was not necessarily done with the hardware topology listed, but the listed topology is tested. Estimated Walltime is calculated assuming full throughput (seq/sec) continuously. In practice, there are compilation overheads at the beginning of each run/restart(in cluster settings) + checkpointing overheads (if any).
+(More perf improvements coming soon!)
+Other hyperparameters are specified in the associated pile `gin` files in the `contrib/gpu/t5/t5_1_1/examples` directory.
+## Pretraining run commands
+### Singlenode
+small:
+`t5x/contrib/gpu/t5/scripts_gpu/singlenode_pretrain_pile.sh small bfloat16 8 256 {LOGDIR - create before running} {MODEL_DIR} {GRADIENT_ACCUMULATION (1 by default)}`
+Finetuning:
+MNLI v2:
+`t5x/contrib/gpu/t5/scripts_gpu/singlenode_ft_frompile.sh mnli2 small bfloat16 8 256 {LOGDIR - create before running} {MODEL_DIR(to restore pretrained checkpoint from)} {GRADIENT_ACCUMULATION}`
+### Multinode
+Arguments are as such:
+`sbatch -N {NODE_CT} t5x/contrib/gpu/t5/scripts_gpu/example_slurm_pretrain_pile.sub {MODEL_SIZE} {MODEL_PREC} {GPU/NODE} {BS/GPU} {MODEL_DIR} {GRADIENT_ACCUMULATION} {TENSOR_PARALLEL}`
+small:
+`sbatch -N 1 t5x/contrib/gpu/t5/scripts_gpu/example_slurm_pretrain_pile.sub small bfloat16 8 256 {MODEL_DIR} 1 1`
+large:
+`sbatch -N 8 t5x/contrib/gpu/t5/scripts_gpu/example_slurm_pretrain_pile.sub large bfloat16 8 32 {MODEL_DIR} 1 1`
+xl:
+`sbatch -N 32 t5x/contrib/gpu/t5/scripts_gpu/example_slurm_pretrain_pile.sub xl bfloat16 8 8 {MODEL_DIR} 1 1`
+Finetuning commands simply change the script and have an additional `{FT_TASK}` as the first argument (along with relevant hyperparameter changes). Your `MODEL_DIR` should contain the pretrained checkpoint to restore from.
+MNLI v2:
+`sbatch -N {NODE_CT} t5x/contrib/gpu/t5/scripts_gpu/example_slurm_ft_frompile.sub mnli2 {MODEL_SIZE} {MODEL_PREC} {GPU/NODE} {BS/GPU} {MODEL_DIR} {GRADIENT_ACCUMULATION} {TENSOR_PARALLEL}`
+SQuAD v1.1
+`sbatch -N {NODE_CT} t5x/contrib/gpu/t5/scripts_gpu/example_slurm_ft_frompile.sub squad1 {MODEL_SIZE} {MODEL_PREC} {GPU/NODE} {BS/GPU} {MODEL_DIR} {GRADIENT_ACCUMULATION} {TENSOR_PARALLEL}`
+On all finetuning runs, we use a Global Batch Size of 128 with bfloat16 precision.
+WARNING: Finetuning is configured by default to save every checkpoint and delete none (to avoid accidentally deleting your pretrained checkpoint). Watch your disk space! This behavior can be changed in `t5x/configs/runs/finetune_{TASK}.gin`, however this puts the pretrained checkpoint at risk unless backed up.

t5x-main/docs/usage/index.rst ADDED Viewed

	@@ -0,0 +1,16 @@

+T5X Usage Guides
+================
+.. toctree::
+   :maxdepth: 2
+   pretrain.md
+   finetune.md
+   eval.md
+   infer.md
+   auxiliary.md
+   decoding.md
+   metrics.md
+   partitioning.md
+   gin.md

t5x-main/docs/usage/infer-files.md ADDED Viewed

	@@ -0,0 +1,217 @@

+# Running inference on a Model
+## Introduction
+This page outlines the steps to run inference a model with T5X on files
+containing
+[TensorFlow Examples](https://www.tensorflow.org/api_docs/python/tf/train/Example).
+## Overview
+Running inference on a model with T5X using TF Example files consists of the
+following steps:
+1.  Choose the model to run inference on.
+1.  Choose the TF Example files to run inference on.
+1.  Write a Gin file that configures the model, file source and other details of
+    your inference run.
+1.  Launch your experiment locally or on XManager.
+1.  Monitor your experiment and access predictions.
+These steps are explained in detail in the following sections. An example run
+that runs inference on a fine-tuned T5-1.1-Small checkpoint on `tfrecord` files
+containing the
+[(Open Domain) Natural Questions benchmark](https://ai.google.com/research/NaturalQuestions/)
+is also showcased.
+## Step 1: Choose a model
+To run inference on a model, you need a Gin config file that defines the model
+params, and the model checkpoint to load from. For this example, a T5-1.1-Small
+model fine-tuned on the
+[`natural_questions_open_test`](https://github.com/google-research/google-research/tree/master/t5_closed_book_qa/t5_cbqa/tasks.py?l=141&rcl=370261021)
+SeqIO Task will be used:
++   Model checkpoint -
+    [`cbqa/small_ssm_nq/model.ckpt-1110000`](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/cbqa/small_ssm_nq/)
++   Model Gin file -
+    [`models/t5_1_1_small.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin).
+If you would like to fine-tune your model before inference, please follow the
+[fine-tuning](finetune.md) tutorial, and continue to Step 2.
+## Step 2: Choose a TF Example file source
+T5X supports running inference on `tfrecord`, `recordio` and `sstable` files
+containing TF Examples. For the example run, you will run inference on
+`tfrecord` files containing the `'natural_questions_open'` dataset located here:
+`/path/to/tfds/data/dir/natural_questions_open/1.0.0/natural_questions_open-validation.tfrecord*`.
+Here's an example of a single row of data from this file (you can explore this
+file further using [GQUI](http://shortn/_oNuDhg7jwN)):
+```json
+{ # (tensorflow.Example) size=101B
+  features: { # (tensorflow.Features) size=99B
+    feature: { # (tensorflow.Features.FeatureEntry) size=27B
+      key: "answer" # size=6
+      value: { # (tensorflow.Feature) size=17B
+        bytes_list: { # (tensorflow.BytesList) size=15B
+          value: [ "Jason Flemyng" ] # size=13
+        } # features.feature[0].value.bytes_list
+      } # features.feature[0].value
+    } # features.feature[0]
+    feature: { # (tensorflow.Features.FeatureEntry) size=68B
+      key: "question" # size=8
+      value: { # (tensorflow.Feature) size=56B
+        bytes_list: { # (tensorflow.BytesList) size=54B
+          value: [ "who played hyde in league of extraordinary gentlemen" ] # size=52
+        } # features.feature[1].value.bytes_list
+      } # features.feature[1].value
+    } # features.feature[1]
+  } # features
+}
+```
+## Step 3: Write a Gin Config
+After choosing the model and file source for your run, the next step is to
+configure your run using Gin. If you're not familiar with Gin, reading the
+[T5X Gin Primer](gin.md) is recommended. T5X provides a Gin file that configures
+the T5X inference job (located at
+[`t5x/configs/runs/infer_from_tfexample_file.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/infer_from_tfexample_file.gin))
+to run inference on TF Example files, and expects a few params from you. These
+params can be specified in a separate Gin file, or via commandline flags.
+Following are the required params:
++   `CHECKPOINT_PATH`: This is the path to the model checkpoint (from Step 1).
+    For the example run, set this to
+    `'gs://t5-data/pretrained_models/cbqa/small_ssm_nq/model.ckpt-1110000'`.
++   `TF_EXAMPLE_FILE_PATHS`: This is a list of paths or glob patterns to read TF
+    Examples from. For the example run, set this to
+    `['/path/to/tfds/data/dir/natural_questions_open/1.0.0/natural_questions_open-validation.tfrecord*']`.
++   `TF_EXAMPLE_FILE_TYPE`: This is the TF Example file format. Currently
+    supported file formats are `tfrecord`, `recordio` and `sstable`. For the
+    example run, set this to `'tfrecord'`.
++   `FEATURE_LENGTHS`: This is a dict mapping feature key to maximum int length
+    for that feature. the TF Example features are truncated to the provided
+    value. For the example run, set this to `{'inputs': 38, 'targets': 18}`,
+    which is the maximum token length for the test set.
++   `INFER_OUTPUT_DIR`: A path to write inference outputs to. When launching
+    using XManager, this path is automatically set and can be accessed from the
+    XManager Artifacts page. When running locally using Blaze, you can
+    explicitly pass a directory using a flag. Launch commands are provided in
+    the next step.
+In addition to the above params, you may also need to override the
+`create_task_from_tfexample_file.inputs_key` param based on the data format (it
+is set to `'inputs'` by default. For the example run, the `'question'` key
+contains the input (see Step 2), so add the following to your Gin config:
+```gin
+create_task_from_tfexample_file.inputs_key = 'question'
+```
+Additionally, you will need to import the
+[`infer_from_tfexample_file.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/infer_from_tfexample_file.gin)
+and the Gin file for the model, which for the example run is
+[`t5_1_1_small.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin).
+```gin
+include 'runs/infer_from_tfexample_file.gin'
+include 'models/t5_1_1_small.gin'
+```
+Note that the `include` statements use relative paths in this example. You will
+pass an appropriate `gin_search_paths` flag to locate these files when launching
+your run. Absolute paths to Gin files can also be used, e.g.
+```gin
+include 't5x/configs/runs/infer_from_tfexample_file.gin'
+include 't5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin'
+```
+Finally, your Gin file should look like this:
+```gin
+include 'runs/infer_from_tfexample_file.gin'
+include 'models/t5_1_1_small.gin'
+CHECKPOINT_PATH = 'gs://t5-data/pretrained_models/cbqa/small_ssm_nq/model.ckpt-1110000'
+TF_EXAMPLE_FILE_PATHS = ['/path/to/tfds/data/dir/natural_questions_open/1.0.0/natural_questions_open-validation.tfrecord*']
+TF_EXAMPLE_FILE_TYPE = 'tfrecord'
+FEATURE_LENGTHS = {'inputs': 38, 'targets': 18}
+create_task_from_tfexample_file.inputs_key = 'question'
+```
+See
+[`t5x/configs/examples/inference/t5_1_1_small_cbqa_natural_questions_tfexample.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/examples/inference/t5_1_1_small_cbqa_natural_questions_tfexample.gin)
+for this example. Make sure that your Gin file is linked as a data dependency to
+the T5X inference
+[binary](https://github.com/google-research/t5x/blob/main/t5x/BUILD;l=74;rcl=398627055). If your
+Gin file is not included, see the
+[Advanced Topics section](#custom-t5x-binaries) at the end of this tutorial for
+instructions to add it, or skip writing a Gin file and pass the above params as
+flags when launching the inference job (see instructions in Step 4).
+## Step 4: Launch your experiment
+To launch your experiment locally (for debugging only; larger checkpoints may
+cause issues), run the following on commandline:
+```sh
+INFER_OUTPUT_DIR="/tmp/model-infer/"
+python -m t5x.infer_unfragmented \
+  --gin_file=t5x/google/examples/flaxformer_t5/configs/examples/inference/t5_1_1_small_cbqa_natural_questions_tfexample.gin \
+  --gin.INFER_OUTPUT_DIR=\"${INFER_OUTPUT_DIR}\" \
+  --alsologtostderr
+```
+Note that multiple comma-separated paths can be passed to the `gin_search_paths`
+flag, and these paths should contain all Gin files used or included in your
+experiment.
+After inference has completed, you can view predictions in the `jsonl` files in
+the output dir. JSON data is written in chunks and combined at the end of the
+inference run. Refer to [Sharding](#sharding) and
+[Checkpointing](#checkpointing) sections for more details.
+## Next Steps
+Now that you have successfully run inference on a model, here are some topics
+you might want to explore next:
++   [Fine-tuning a model.](finetune.md)
++   [Evaluating a model.](eval.md)
++   [Training a model from scratch.](pretrain.md)
+We also touch upon a few advanced topics related to inference below that might
+be useful, especially when customizing your inference job.
+## Advanced Topics
+### Dataset Sharding {#sharding .no-toc}
+You can run inference in parallel across multiple TPU slices by setting the
+`num_shards` flag when running using XManager. When `num_shards > 1`, the
+dataset is interleaved among the shards and the predictions are combined in the
+end; hence the order of examples in the data source and the predictions in the
+output json files will not match (order is guaranteed to match for `num_shards =
+1` or the number of input file shards).
+### Dataset Checkpointing {#checkpointing .no-toc}
+You can control dataset checkpointing frequency by overriding the
+`infer.checkpoint_period` in
+[runs/infer.gin](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/infer.gin),
+which is set to `100` by default. This means that the dataset is checkpointed
+after running inferences on `checkpoint_period` batches (batches, not examples;
+you can control batch size by overriding `utils.DatasetConfig.batch_size` in
+[runs/infer.gin](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/infer.gin), it
+is set to `32` by default).
+### Defining a custom SeqIO Task/Mixture to run inference on {.no-toc}
+Refer to [SeqIO documentation](https://github.com/google/seqio/blob/main/README.md).

t5x-main/docs/usage/infer-seqio.md ADDED Viewed

	@@ -0,0 +1,241 @@

+# Running inference on a Model
+## Introduction
+This page outlines the steps to run inference a model with T5X on Tasks/Mixtures
+defined with [SeqIO](https://github.com/google/seqio/blob/main/README.md).
+## Overview
+Running inference on a model with T5X using SeqIO Task/Mixtures consists of the
+following steps:
+1.  Choose the model to run inference on.
+1.  Choose the SeqIO Task/Mixture to run inference on.
+1.  Write a Gin file that configures the model, SeqIO Task/Mixture and other
+    details of your inference run.
+1.  Launch your experiment locally or on XManager.
+1.  Monitor your experiment and access predictions.
+These steps are explained in detail in the following sections. An example run
+that runs inference on a fine-tuned T5-1.1-Small checkpoint on the
+[(Open Domain) (Open Domain) Natural Questions benchmark](https://ai.google.com/research/NaturalQuestions/)
+is also showcased.
+## Step 1: Choose a model
+To run inference on a model, you need a Gin config file that defines the model
+params, and the model checkpoint to load from. For this example, a T5-1.1-Small
+model fine-tuned on the
+[`natural_questions_open_test`](https://github.com/google-research/google-research/tree/master/t5_closed_book_qa/t5_cbqa/tasks.py?l=141&rcl=370261021)
+SeqIO Task will be used:
++   Model checkpoint -
+    [`cbqa/small_ssm_nq/model.ckpt-1110000`](https://console.cloud.google.com/storage/browser/t5-data/pretrained_models/cbqa/small_ssm_nq/)
++   Model Gin file -
+    [`models/t5_1_1_small.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin).
+If you would like to fine-tune your model before inference, please follow the
+[fine-tuning](finetune.md) tutorial, and continue to Step 2.
+## Step 2: Choose a SeqIO Task/Mixture
+A SeqIO Task encapsulates the data source, the preprocessing logic to be
+performed on the data before querying the model, the postprocessing logic to be
+performed on model outputs, and the metrics to be computed given the
+postprocessed outputs and targets (for inference, post-processing and metrics
+are irrelevant). A SeqIO Mixture denotes a collection of Tasks and enables
+fine-tuning a model on multiple Tasks.
+Many common datasets and benchmarks, e.g. [GLUE](https://gluebenchmark.com/),
+[SuperGLUE](https://super.gluebenchmark.com/),
+[WMT](https://www.tensorflow.org/datasets/catalog/wmt_t2t_translate),
+[SQUAD](https://rajpurkar.github.io/SQuAD-explorer/),
+[CNN/Daily Mail](https://github.com/abisee/cnn-dailymail), etc. have been
+implemented as SeqIO Tasks/Mixtures and can be used directly. These
+Tasks/Mixtures are defined in
+[`third_party/py/t5/data/tasks.py`](https://github.com/google-research/text-to-text-transfer-transformer/tree/main/t5/data/tasks.py)
+and
+[`third_party/py/t5/data/mixtures.py`](https://github.com/google-research/text-to-text-transfer-transformer/tree/main/t5/data/mixtures.py).
+For the example run, you will run inference on the (Open Domain) Natural
+Questions benchmark, which has been implemented as the `natural_questions_open`
+Task in
+[`/third_party/google_research/google_research/t5_closed_book_qa/t5_cbqa/tasks.py`](https://github.com/google-research/google-research/tree/master/t5_closed_book_qa/t5_cbqa/tasks.py?l=98&rcl=370261021).
+Here's an example of a single row of preprocessed data from this Task:
+```json
+{
+    'inputs_pretokenized': 'nq question: what was the main motive of salt march',
+    'inputs': [3, 29, 1824, 822, 10, 125, 47,  8, 711, 10280, 13, 3136, 10556, 1]
+    'targets_pretokenized': 'challenge to British authority',
+    'targets': [1921, 12, 2390, 5015, 1],
+    'answers': ['challenge to British authority']
+}
+```
+## Step 3: Write a Gin Config
+After choosing the model and SeqIO Task/Mixture for your run, the next step is
+to configure your run using Gin. If you're not familiar with Gin, reading the
+[T5X Gin Primer](gin.md) is recommended. T5X provides a Gin file that configures
+the T5X inference job (located at
+[`runs/infer.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/infer.gin)) to
+run inference on SeqIO Task/Mixtures, and expects a few params from you. These
+params can be specified in a separate Gin file, or via commandline flags.
+Following are the required params:
++   `CHECKPOINT_PATH`: This is the path to the model checkpoint (from Step 1).
+    For the example run, set this to
+    `'gs://t5-data/pretrained_models/cbqa/small_ssm_nq/model.ckpt-1110000'`.
++   `MIXTURE_OR_TASK_NAME`: This is the SeqIO Task or Mixture name to run
+    inference on (from Step 2). For the example run, set this to
+    `'natural_questions_open'`.
++   `MIXTURE_OR_TASK_MODULE`: This is the Python module that contains the SeqIO
+    Task or Mixture. For the example run, set this to
+    `'google_research.t5_closed_book_qa.t5_cbqa.tasks'`.
+    Note that this module must be included as a dependency in the T5X inference
+    [binary](https://github.com/google-research/t5x/blob/main/t5x/BUILD;l=74;rcl=398627055). Most
+    common Task modules, including `t5_closed_book_qa`, are already included. If
+    your module is not included, see the
+    [Advanced Topics section](#custom-t5x-binaries) at the end of this tutorial
+    for instructions to add it.
++   `TASK_FEATURE_LENGTHS`: This is a dict mapping feature key to maximum length
+    for that feature. After preprocessing, features are truncated to the
+    provided value. For the example run, set this to `{'inputs': 38, 'targets':
+    18}`, which is the maximum token length for the test set.
++   `INFER_OUTPUT_DIR`: A path to write inference outputs to. When launching
+    using XManager, this path is automatically set and can be accessed from the
+    XManager Artifacts page. When running locally using Blaze, you can
+    explicitly pass a directory using a flag. Launch commands are provided in
+    the next step.
+In addition to the above params, you will need to import
+[`infer.gin`](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/infer.gin) and the
+Gin file for the model, which for the example run is
+[`t5_1_1_small.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin).
+```gin
+include 'runs/infer.gin'
+include 'models/t5_small.gin'
+```
+Note that the `include` statements use relative paths in this example. You will
+pass an appropriate `gin_search_paths` flag to locate these files when launching
+your run. Absolute paths to Gin files can also be used, e.g.
+```gin
+include 't5x/configs/runs/infer.gin'
+include 't5x/google/examples/flaxformer_t5/configs/models/t5_1_1_small.gin'
+```
+Finally, your Gin file should look like this:
+```gin
+include 'runs/infer.gin'
+include 'models/t5_1_1_small.gin'
+CHECKPOINT_PATH = 'gs://t5-data/pretrained_models/cbqa/small_ssm_nq/model.ckpt-1110000'
+MIXTURE_OR_TASK_NAME = 'closed_book_qa'
+MIXTURE_OR_TASK_MODULE = 'google_research.t5_closed_book_qa.t5_cbqa.tasks'
+TASK_FEATURE_LENGTHS = {'inputs': 38, 'targets': 18}
+```
+See
+[`t5_1_1_small_cbqa_natural_questions.gin`](https://github.com/google-research/t5x/blob/main/t5x/google/examples/flaxformer_t5/configs/examples/inference/t5_1_1_small_cbqa_natural_questions.gin)
+for this example. Make sure that your Gin file is linked as a data dependency to
+the T5X inference
+[binary](https://github.com/google-research/t5x/blob/main/t5x/BUILD;l=74;rcl=398627055). If your
+Gin file is not included, see the
+[Advanced Topics section](#custom-t5x-binaries) at the end of this tutorial for
+instructions to add it, or skip writing a Gin file and pass the above params as
+flags when launching the inference job (see instructions in Step 4).
+## Step 4: Launch your experiment
+To launch your experiment locally (for debugging only; larger checkpoints may
+cause issues), run the following on commandline:
+```sh
+INFER_OUTPUT_DIR="/tmp/model-infer/"
+python -m t5x.infer_unfragmented \
+  --gin_file=t5x/google/examples/flaxformer_t5/configs/examples/inference/t5_1_1_small_cbqa_natural_questions.gin \
+  --gin.INFER_OUTPUT_DIR=\"${INFER_OUTPUT_DIR}\" \
+  --alsologtostderr
+```
+Note that multiple comma-separated paths can be passed to the `gin_search_paths`
+flag, and these paths should contain all Gin files used or included in your
+experiment.
+## Step 5: Monitor your experiment and parse results
+After inference has completed, you can view predictions in the `jsonl` files in
+the output dir. JSON data is written in chunks and combined at the end of the
+inference run. Refer to [Sharding](#sharding) and
+[Checkpointing](#checkpointing) sections for more details.
+## Next Steps
+Now that you have successfully run inference on a model, here are some topics
+you might want to explore next:
++   [Fine-tuning a model.](finetune)
++   [Evaluating a model.](eval)
++   [Training a model from scratch.](pretrain)
+We also touch upon a few advanced topics related to inference below that might
+be useful, especially when customizing your inference job.
+## Advanced Topics
+### Dataset Sharding {#sharding .no-toc}
+You can run inference in parallel across multiple TPU slices by setting the
+`num_shards` flag when running using XManager. When `num_shards > 1`, the
+dataset is interleaved among the shards and the predictions are combined in the
+end; hence the order of examples in the data source and the predictions in the
+output json files will not match (order is guaranteed to match for `num_shards =
+1` or the number of input file shards).
+### Dataset Checkpointing {#checkpointing .no-toc}
+You can control dataset checkpointing frequency by overriding the
+`infer.checkpoint_period` in
+[runs/infer.gin](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/infer.gin),
+which is set to `100` by default. This means that the dataset is checkpointed
+after running inferences on `checkpoint_period` batches (batches, not examples;
+you can control batch size by overriding `utils.DatasetConfig.batch_size` in
+[runs/infer.gin](https://github.com/google-research/t5x/blob/main/t5x/configs/runs/infer.gin), it
+is set to `32` by default).
+### Changing Length and Decoding Strategy {#decoding-strategies .no-toc}
+By default, T5X does inference using an arg-max decoding strategy, always
+picking the most likely next token. To use random sampling instead, you may
+change any of the following parameters in your gin config:
+```gin
+decoding.temperature_sample:
+    temperature = 1.0
+    topk = 1
+    topp = 0.0
+```
+You can also control the number of tokens which get generated by specifying:
+```gin
+decoding.temperature_sample:
+    max_decode_steps = 50
+```
+More detailed documentation on defining a decoding stategy can be found
+[here](https://github.com/google-research/t5x/blob/main/docs/usage.md/decoding).
+### Defining a custom SeqIO Task/Mixture to run inference on {.no-toc}
+Refer to [SeqIO documentation](https://github.com/google/seqio/blob/main/README.md).