AbstractPhil
/

geolip-svd-encoder-sweeps

Model card Files Files and versions

AbstractPhil commited on Apr 18

Commit

91eb06a

·

verified ·

1 Parent(s): eddd60f

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -51,7 +51,8 @@ former = svd_transformer(
                           # "sigmoid" = Sigmoid, 1 / (1 + exp(-x)), can be effective for certain tasks as it allows for values between 0 and 1 and can capture more complex relationships in the data.
                           # "leaky_relu" = Leaky ReLU, max(0.01 * x, x), can be effective for certain tasks as it allows for a small gradient when the input is negative, which can help prevent dead neurons in the network.
                           # "swilu" = Sigmoid Weighted Linear Unit, x * sigmoid(x), similar to silu but with a slightly different formulation, can also be effective for certain tasks.
-  token_out="QKV",        # the format of token expected out
                           # "QKV" standard attention token, applies transformer logic internally and can accept rotary behavior
                           # "SUVt" or "SUV" geometric tokens returned only, QKV transformation learning not applied.
   target="SVD",           # "SVD" targets all 3, good for complex tasks.

                           # "sigmoid" = Sigmoid, 1 / (1 + exp(-x)), can be effective for certain tasks as it allows for values between 0 and 1 and can capture more complex relationships in the data.
                           # "leaky_relu" = Leaky ReLU, max(0.01 * x, x), can be effective for certain tasks as it allows for a small gradient when the input is negative, which can help prevent dead neurons in the network.
                           # "swilu" = Sigmoid Weighted Linear Unit, x * sigmoid(x), similar to silu but with a slightly different formulation, can also be effective for certain tasks.
+  token_out="all",        # the format of token expected out
+                          # "all" or None will return all tokens, which applies transformer logic automatically.
                           # "QKV" standard attention token, applies transformer logic internally and can accept rotary behavior
                           # "SUVt" or "SUV" geometric tokens returned only, QKV transformation learning not applied.
   target="SVD",           # "SVD" targets all 3, good for complex tasks.