File size: 7,642 Bytes
f3507ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# Model Garden overview

The TensorFlow Model Garden provides implementations of many state-of-the-art
machine learning (ML) models for vision and natural language processing (NLP),
as well as workflow tools to let you quickly configure and run those models on
standard datasets. Whether you are looking to benchmark performance for a
well-known model, verify the results of recently released research, or extend
existing models, the Model Garden can help you drive your ML research and
applications forward.

The Model Garden includes the following resources for machine learning
developers:

-   [**Official models**](#official) for vision and NLP, maintained by Google
    engineers

-   [**Research models**](#research) published as part of ML research papers

-   [**Training experiment framework**](#training_framework) for fast,

    declarative training configuration of official models

-   [**Specialized ML operations**](#ops) for vision and natural language

    processing (NLP)

-   [**Model training loop**](#orbit) management with Orbit


These resources are built to be used with the TensorFlow Core framework and
integrate with your existing TensorFlow development projects. Model
Garden resources are also provided under an [open
source](https://github.com/tensorflow/models/blob/master/LICENSE) license, so
you can freely extend and distribute the models and tools.

Practical ML models are computationally intensive to train and run, and may
require accelerators such as Graphical Processing Units (GPUs) and Tensor
Processing Units (TPUs). Most of the models in Model Garden were trained on
large datasets using TPUs. However, you can also train and run these models on
GPU and CPU processors.

## Model Garden models

The machine learning models in the Model Garden include full code so you can
test, train, or re-train them for research and experimentation. The Model Garden
includes two primary categories of models: *official models* and *research

models*.

### Official models {:#official}

The [Official Models](https://github.com/tensorflow/models/tree/master/official)
repository is a collection of state-of-the-art models, with a focus on
vision and natural language processing (NLP).
These models are implemented using current TensorFlow 2.x high-level
APIs. Model libraries in this repository are optimized for fast performance and
actively maintained by Google engineers. The official models include additional
metadata you can use to quickly configure experiments using the Model Garden
[training experiment framework](#training_framework).

### Research models {:#research}

The [Research Models](https://github.com/tensorflow/models/tree/master/research)
repository is a collection of models published as code resources for research
papers. These models are implemented using both TensorFlow 1.x and 2.x. Model
libraries in the research folder are supported by the code owners and the
research community.

## Training experiment framework {:#training_framework}



The Model Garden training experiment framework lets you quickly assemble and run

training experiments using its official models and standard datasets. The

training framework uses additional metadata included with the Model Garden's

official models to allow you to configure models quickly using a declarative

programming model. You can define a training experiment using Python commands in

the

[TensorFlow Model library](https://www.tensorflow.org/api_docs/python/tfm/core)

or configure training using a YAML configuration file, like this

[example](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml).



The training framework uses

[`tfm.core.base_trainer.ExperimentConfig`](https://www.tensorflow.org/api_docs/python/tfm/core/base_trainer/ExperimentConfig)

as the configuration object, which contains the following top-level

configuration objects:



-   [`runtime`](https://www.tensorflow.org/api_docs/python/tfm/core/base_task/RuntimeConfig):

    Defines the processing hardware, distribution strategy, and other

    performance optimizations

-   [`task`](https://www.tensorflow.org/api_docs/python/tfm/core/config_definitions/TaskConfig):

    Defines the model, training data, losses, and initialization

-   [`trainer`](https://www.tensorflow.org/api_docs/python/tfm/core/base_trainer/TrainerConfig):

    Defines the optimizer, training loops, evaluation loops, summaries, and

    checkpoints



For a complete example using the Model Garden training experiment framework, see

the [Image classification with Model Garden](vision/image_classification.ipynb)

tutorial. For information on the training experiment framework, check out the

[TensorFlow Models API documentation](https://tensorflow.org/api_docs/python/tfm/core).

If you are looking for a solution to manage training loops for your model

training experiments, check out [Orbit](#orbit).



## Specialized ML operations {:#ops}



The Model Garden contains many vision and NLP operations specifically designed

to execute state-of-the-art models that run efficiently on GPUs and TPUs. Review

the TensorFlow Models Vision library API docs for a list of specialized

[vision operations](https://www.tensorflow.org/api_docs/python/tfm/vision).

Review the TensorFlow Models NLP Library API docs for a list of

[NLP operations](https://www.tensorflow.org/api_docs/python/tfm/nlp). These

libraries also include additional utility functions used for vision and NLP data

processing, training, and model execution.



## Training loops with Orbit {:#orbit}



There are two default options for training TensorFlow models:



* Use the high-level Keras

[Model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit)

function. If your model and training procedure fit the assumptions of Keras'

`Model.fit` (incremental gradient descent on batches of data) method this can

be very convenient.

* Write a custom training loop

[with keras](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch),

or [without](https://www.tensorflow.org/guide/core/logistic_regression_core).

You can write a custom training loop with low-level TensorFlow methods such as

`tf.GradientTape` or `tf.function`. However, this approach requires a lot of

boilerplate code, and doesn't do anything to simplify distributed training.



Orbit tries to provide a third option in between these two extremes.



Orbit is a flexible, lightweight library designed to make it easier to

write custom training loops in TensorFlow 2.x, and works well with the Model

Garden [training experiment framework](#training_framework). Orbit handles

common model training tasks such as saving checkpoints, running model

evaluations, and setting up summary writing. It seamlessly integrates with

`tf.distribute` and supports running on different device types, including CPU,

GPU, and TPU hardware. The Orbit tool is also [open

source](https://github.com/tensorflow/models/blob/master/orbit/LICENSE), so you

can extend and adapt to your model training needs.



The Orbit guide is available [here](orbit/index.ipynb).



Note: You can customize how the Keras API executes training. Mainly you must

override the `Model.train_step` method or use `keras.callbacks` like
`callbacks.ModelCheckpoint` or `callbacks.TensorBoard`. For more information
about modifying the behavior of `train_step`, check out the
[Customize what happens in Model.fit](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit)
page.