Update README.md
Browse files
README.md
CHANGED
|
@@ -128,7 +128,7 @@ Example 3:
|
|
| 128 |
|
| 129 |
## Model description
|
| 130 |
|
| 131 |
-
The architecture is a modification of a standard decoder-only transformer.
|
| 132 |
|
| 133 |
The llama-2-70b models have been modified from a standard transformer in the following ways:
|
| 134 |
* It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
|
|
|
|
| 128 |
|
| 129 |
## Model description
|
| 130 |
|
| 131 |
+
The architecture is a modification of a standard decoder-only transformer and was trained as a causal language model (clm).
|
| 132 |
|
| 133 |
The llama-2-70b models have been modified from a standard transformer in the following ways:
|
| 134 |
* It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
|