| --- |
| title: Forward and Backward for Inference and Learning |
| --- |
| # Forward and Backward |
|
|
| The forward and backward passes are the essential computations of a [Net](net_layer_blob.html). |
|
|
| <img src="fig/forward_backward.png" alt="Forward and Backward" width="480"> |
|
|
| Let's consider a simple logistic regression classifier. |
|
|
| The **forward** pass computes the output given the input for inference. |
| In forward Caffe composes the computation of each layer to compute the "function" represented by the model. |
| This pass goes from bottom to top. |
|
|
| <img src="fig/forward.jpg" alt="Forward pass" width="320"> |
|
|
| The data $$x$$ is passed through an inner product layer for $$g(x)$$ then through a softmax for $$h(g(x))$$ and softmax loss to give $$f_W(x)$$. |
| |
| The **backward** pass computes the gradient given the loss for learning. |
| In backward Caffe reverse-composes the gradient of each layer to compute the gradient of the whole model by automatic differentiation. |
| This is back-propagation. |
| This pass goes from top to bottom. |
| |
| <img src="fig/backward.jpg" alt="Backward pass" width="320"> |
| |
| The backward pass begins with the loss and computes the gradient with respect to the output $$\frac{\partial f_W}{\partial h}$$. The gradient with respect to the rest of the model is computed layer-by-layer through the chain rule. Layers with parameters, like the `INNER_PRODUCT` layer, compute the gradient with respect to their parameters $$\frac{\partial f_W}{\partial W_{\text{ip}}}$$ during the backward step. |
|
|
| These computations follow immediately from defining the model: Caffe plans and carries out the forward and backward passes for you. |
|
|
| - The `Net::Forward()` and `Net::Backward()` methods carry out the respective passes while `Layer::Forward()` and `Layer::Backward()` compute each step. |
| - Every layer type has `forward_{cpu,gpu}()` and `backward_{cpu,gpu}()` methods to compute its steps according to the mode of computation. A layer may only implement CPU or GPU mode due to constraints or convenience. |
|
|
| The [Solver](solver.html) optimizes a model by first calling forward to yield the output and loss, then calling backward to generate the gradient of the model, and then incorporating the gradient into a weight update that attempts to minimize the loss. Division of labor between the Solver, Net, and Layer keep Caffe modular and open to development. |
|
|
| For the details of the forward and backward steps of Caffe's layer types, refer to the [layer catalogue](layers.html). |
|
|
|
|