Update README.md
Browse files
README.md
CHANGED
|
@@ -51,7 +51,8 @@ former = svd_transformer(
|
|
| 51 |
# "sigmoid" = Sigmoid, 1 / (1 + exp(-x)), can be effective for certain tasks as it allows for values between 0 and 1 and can capture more complex relationships in the data.
|
| 52 |
# "leaky_relu" = Leaky ReLU, max(0.01 * x, x), can be effective for certain tasks as it allows for a small gradient when the input is negative, which can help prevent dead neurons in the network.
|
| 53 |
# "swilu" = Sigmoid Weighted Linear Unit, x * sigmoid(x), similar to silu but with a slightly different formulation, can also be effective for certain tasks.
|
| 54 |
-
token_out="
|
|
|
|
| 55 |
# "QKV" standard attention token, applies transformer logic internally and can accept rotary behavior
|
| 56 |
# "SUVt" or "SUV" geometric tokens returned only, QKV transformation learning not applied.
|
| 57 |
target="SVD", # "SVD" targets all 3, good for complex tasks.
|
|
|
|
| 51 |
# "sigmoid" = Sigmoid, 1 / (1 + exp(-x)), can be effective for certain tasks as it allows for values between 0 and 1 and can capture more complex relationships in the data.
|
| 52 |
# "leaky_relu" = Leaky ReLU, max(0.01 * x, x), can be effective for certain tasks as it allows for a small gradient when the input is negative, which can help prevent dead neurons in the network.
|
| 53 |
# "swilu" = Sigmoid Weighted Linear Unit, x * sigmoid(x), similar to silu but with a slightly different formulation, can also be effective for certain tasks.
|
| 54 |
+
token_out="all", # the format of token expected out
|
| 55 |
+
# "all" or None will return all tokens, which applies transformer logic automatically.
|
| 56 |
# "QKV" standard attention token, applies transformer logic internally and can accept rotary behavior
|
| 57 |
# "SUVt" or "SUV" geometric tokens returned only, QKV transformation learning not applied.
|
| 58 |
target="SVD", # "SVD" targets all 3, good for complex tasks.
|