yizeng-ai commited on
Commit
657b5fa
·
verified ·
1 Parent(s): cc51038

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -17,10 +17,10 @@ Beijing Institute of AI Safety and Governance (Beijing-AISI) is dedicated to bui
17
  - Safety is a core capacity for AI.
18
  - Development and Safety can be simultaneously ensured and achieved.
19
  - Safety and Governance of AI ensure its steady development, empowering global sustainable development and harmonious symbiosis.
20
- ## Publications
21
- - Our paper **[Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models](https://openreview.net/pdf?id=s20W12XTF8)** has been **published at ICLR 2025**!
22
  This work presents **Jailbreak Antidote**, a lightweight and real-time defense mechanism that dynamically adjusts LLM safety levels by modifying only a sparse subset (~5%) of internal states during inference. Without adding token overhead or latency, our method enables fine-grained control over the safety-utility trade-off. Extensive evaluations across 9 LLMs, 10 jailbreak attacks, and 6 defense baselines demonstrate that Antidote achieves strong safety improvements while preserving benign task performance.
23
 
24
- - Our paper **[StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?](https://arxiv.org/abs/2409.17167)** has been **published at AAAI 2025**!
25
  This study introduces **StressPrompt**, a psychologically inspired benchmark for probing how LLMs respond under stress-inducing conditions. Results show that LLMs, like humans, follow the Yerkes-Dodson law—performing best under moderate stress. The findings offer new insights into LLM cognitive alignment, robustness, and deployment in high-stakes environments.
26
 
 
17
  - Safety is a core capacity for AI.
18
  - Development and Safety can be simultaneously ensured and achieved.
19
  - Safety and Governance of AI ensure its steady development, empowering global sustainable development and harmonious symbiosis.
20
+ ## Beijing-AISI Publications
21
+ - **[Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models](https://openreview.net/pdf?id=s20W12XTF8)** has been **published at ICLR 2025**!
22
  This work presents **Jailbreak Antidote**, a lightweight and real-time defense mechanism that dynamically adjusts LLM safety levels by modifying only a sparse subset (~5%) of internal states during inference. Without adding token overhead or latency, our method enables fine-grained control over the safety-utility trade-off. Extensive evaluations across 9 LLMs, 10 jailbreak attacks, and 6 defense baselines demonstrate that Antidote achieves strong safety improvements while preserving benign task performance.
23
 
24
+ - **[StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?](https://arxiv.org/abs/2409.17167)** has been **published at AAAI 2025**!
25
  This study introduces **StressPrompt**, a psychologically inspired benchmark for probing how LLMs respond under stress-inducing conditions. Results show that LLMs, like humans, follow the Yerkes-Dodson law—performing best under moderate stress. The findings offer new insights into LLM cognitive alignment, robustness, and deployment in high-stakes environments.
26