Spaces:

TrustSafeAI
/

GradientCuff-Jailbreak-Defense

Running

gregH commited on Feb 27, 2024

Commit

2aae5e7

verified ·

1 Parent(s): c605bca

Update index.html

Files changed (1) hide show

index.html CHANGED Viewed

@@ -81,8 +81,8 @@ Exploring Refusal Loss Landscapes </title>
 <p>Current transformer-based LLMs will return different responses to the same query due to the randomness of
   autoregressive sampling-based generation. With this randomness, it is an
   interesting phenomenon that a malicious user query will sometimes be rejected by the target LLM, but
-  sometimes be able to bypass the safety guardrail. Based on this observation, for a given LLM $$T_\theta$$ parameterized with $\theta$, we
-  define the refusal loss function $\phi_\theta(x)$ for a given input user query $x$ as below:
 </p>
 <div class="container jailbreak-intro-sec">

 <p>Current transformer-based LLMs will return different responses to the same query due to the randomness of
   autoregressive sampling-based generation. With this randomness, it is an
   interesting phenomenon that a malicious user query will sometimes be rejected by the target LLM, but
+  sometimes be able to bypass the safety guardrail. Based on this observation, for a given LLM <p>$T_\theta$</p>
+  parameterized with $\theta$, we define the refusal loss function $\phi_\theta(x)$ for a given input user query $x$ as below:
 </p>
 <div class="container jailbreak-intro-sec">