| --- |
| title: Loss |
| --- |
| # Loss |
|
|
| In Caffe, as in most of machine learning, learning is driven by a **loss** function (also known as an **error**, **cost**, or **objective** function). |
| A loss function specifies the goal of learning by mapping parameter settings (i.e., the current network weights) to a scalar value specifying the "badness" of these parameter settings. |
| Hence, the goal of learning is to find a setting of the weights that *minimizes* the loss function. |
|
|
| The loss in Caffe is computed by the Forward pass of the network. |
| Each layer takes a set of input (`bottom`) blobs and produces a set of output (`top`) blobs. |
| Some of these layers' outputs may be used in the loss function. |
| A typical choice of loss function for one-versus-all classification tasks is the `SoftmaxWithLoss` function, used in a network definition as follows, for example: |
|
|
| layer { |
| name: "loss" |
| type: "SoftmaxWithLoss" |
| bottom: "pred" |
| bottom: "label" |
| top: "loss" |
| } |
| |
| In a `SoftmaxWithLoss` function, the `top` blob is a scalar (empty shape) which averages the loss (computed from predicted labels `pred` and actuals labels `label`) over the entire mini-batch. |
|
|
| ### Loss weights |
|
|
| For nets with multiple layers producing a loss (e.g., a network that both classifies the input using a `SoftmaxWithLoss` layer and reconstructs it using a `EuclideanLoss` layer), *loss weights* can be used to specify their relative importance. |
|
|
| By convention, Caffe layer types with the suffix `Loss` contribute to the loss function, but other layers are assumed to be purely used for intermediate computations. |
| However, any layer can be used as a loss by adding a field `loss_weight: <float>` to a layer definition for each `top` blob produced by the layer. |
| Layers with the suffix `Loss` have an implicit `loss_weight: 1` for the first `top` blob (and `loss_weight: 0` for any additional `top`s); other layers have an implicit `loss_weight: 0` for all `top`s. |
| So, the above `SoftmaxWithLoss` layer could be equivalently written as: |
|
|
| layer { |
| name: "loss" |
| type: "SoftmaxWithLoss" |
| bottom: "pred" |
| bottom: "label" |
| top: "loss" |
| loss_weight: 1 |
| } |
| |
| However, *any* layer able to backpropagate may be given a non-zero `loss_weight`, allowing one to, for example, regularize the activations produced by some intermediate layer(s) of the network if desired. |
| For non-singleton outputs with an associated non-zero loss, the loss is computed simply by summing over all entries of the blob. |
|
|
| The final loss in Caffe, then, is computed by summing the total weighted loss over the network, as in the following pseudo-code: |
|
|
| loss := 0 |
| for layer in layers: |
| for top, loss_weight in layer.tops, layer.loss_weights: |
| loss += loss_weight * sum(top) |
| |