Spaces:

Beijing-AISI
/

README

Running

App Files Files Community

yizeng-ai commited on May 17, 2025

Commit

657b5fa

verified ·

1 Parent(s): cc51038

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -17,10 +17,10 @@ Beijing Institute of AI Safety and Governance (Beijing-AISI) is dedicated to bui
 - Safety is a core capacity for AI.
 - Development and Safety can be simultaneously ensured and achieved.
 - Safety and Governance of AI ensure its steady development, empowering global sustainable development and harmonious symbiosis.
-## Publications
-- Our paper **[Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models](https://openreview.net/pdf?id=s20W12XTF8)** has been **published at ICLR 2025**!
   This work presents **Jailbreak Antidote**, a lightweight and real-time defense mechanism that dynamically adjusts LLM safety levels by modifying only a sparse subset (~5%) of internal states during inference. Without adding token overhead or latency, our method enables fine-grained control over the safety-utility trade-off. Extensive evaluations across 9 LLMs, 10 jailbreak attacks, and 6 defense baselines demonstrate that Antidote achieves strong safety improvements while preserving benign task performance.
-- Our paper **[StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?](https://arxiv.org/abs/2409.17167)** has been **published at AAAI 2025**!
   This study introduces **StressPrompt**, a psychologically inspired benchmark for probing how LLMs respond under stress-inducing conditions. Results show that LLMs, like humans, follow the Yerkes-Dodson law—performing best under moderate stress. The findings offer new insights into LLM cognitive alignment, robustness, and deployment in high-stakes environments.

 - Safety is a core capacity for AI.
 - Development and Safety can be simultaneously ensured and achieved.
 - Safety and Governance of AI ensure its steady development, empowering global sustainable development and harmonious symbiosis.
+## Beijing-AISI Publications
+- **[Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models](https://openreview.net/pdf?id=s20W12XTF8)** has been **published at ICLR 2025**!
   This work presents **Jailbreak Antidote**, a lightweight and real-time defense mechanism that dynamically adjusts LLM safety levels by modifying only a sparse subset (~5%) of internal states during inference. Without adding token overhead or latency, our method enables fine-grained control over the safety-utility trade-off. Extensive evaluations across 9 LLMs, 10 jailbreak attacks, and 6 defense baselines demonstrate that Antidote achieves strong safety improvements while preserving benign task performance.
+- **[StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?](https://arxiv.org/abs/2409.17167)** has been **published at AAAI 2025**!
   This study introduces **StressPrompt**, a psychologically inspired benchmark for probing how LLMs respond under stress-inducing conditions. Results show that LLMs, like humans, follow the Yerkes-Dodson law—performing best under moderate stress. The findings offer new insights into LLM cognitive alignment, robustness, and deployment in high-stakes environments.