Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,6 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
-
- zh
|
| 6 |
tags:
|
| 7 |
- vision-language
|
| 8 |
- safety-audit
|
|
@@ -21,7 +20,7 @@ GuardTrace-VL-3B is a vision-language model fine-tuned on Qwen2.5-VL-3B-Instruct
|
|
| 21 |
- **Input**: Image + Text (user query, AI thinking process, AI response)
|
| 22 |
- **Output**: Safety risk analysis + risk level (0/0.5/1)
|
| 23 |
- **Supported Languages**: English, Chinese
|
| 24 |
-
- **License**:
|
| 25 |
|
| 26 |
## Quick Start (Minimal Demo)
|
| 27 |
### 1. Install Dependencies
|
|
@@ -155,7 +154,7 @@ The model outputs a structured safety analysis including three core parts:
|
|
| 155 |
| 1 | Harmful | AI's reasoning/response contains detailed instructions/guidance that directly encourages harmful actions |
|
| 156 |
|
| 157 |
## Limitations
|
| 158 |
-
- The model is optimized for safety assessment of English
|
| 159 |
- May misclassify highly disguised harmful queries (e.g., educational/hypothetical framing of harmful content)
|
| 160 |
- Low-quality/blurry images may reduce the accuracy of multimodal safety assessment
|
| 161 |
- Does not support real-time streaming inference for long-form content
|
|
@@ -163,10 +162,9 @@ The model outputs a structured safety analysis including three core parts:
|
|
| 163 |
## Citation
|
| 164 |
If you use this model in your research, please cite:
|
| 165 |
```bibtex
|
| 166 |
-
@
|
| 167 |
-
title={GuardTrace-VL
|
| 168 |
-
author={
|
| 169 |
-
|
| 170 |
-
|
| 171 |
}
|
| 172 |
-
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
|
|
|
| 5 |
tags:
|
| 6 |
- vision-language
|
| 7 |
- safety-audit
|
|
|
|
| 20 |
- **Input**: Image + Text (user query, AI thinking process, AI response)
|
| 21 |
- **Output**: Safety risk analysis + risk level (0/0.5/1)
|
| 22 |
- **Supported Languages**: English, Chinese
|
| 23 |
+
- **License**: Apache 2.0
|
| 24 |
|
| 25 |
## Quick Start (Minimal Demo)
|
| 26 |
### 1. Install Dependencies
|
|
|
|
| 154 |
| 1 | Harmful | AI's reasoning/response contains detailed instructions/guidance that directly encourages harmful actions |
|
| 155 |
|
| 156 |
## Limitations
|
| 157 |
+
- The model is optimized for safety assessment of English multimodal inputs only; performance on other languages is untested
|
| 158 |
- May misclassify highly disguised harmful queries (e.g., educational/hypothetical framing of harmful content)
|
| 159 |
- Low-quality/blurry images may reduce the accuracy of multimodal safety assessment
|
| 160 |
- Does not support real-time streaming inference for long-form content
|
|
|
|
| 162 |
## Citation
|
| 163 |
If you use this model in your research, please cite:
|
| 164 |
```bibtex
|
| 165 |
+
@article{xiang2025guardtrace,
|
| 166 |
+
title={GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision},
|
| 167 |
+
author={Xiang, Yuxiao and Chen, Junchi and Jin, Zhenchao and Miao, Changtao and Yuan, Haojie and Chu, Qi and Gong, Tao and Yu, Nenghai},
|
| 168 |
+
journal={arXiv preprint arXiv:2511.20994},
|
| 169 |
+
year={2025}
|
| 170 |
}
|
|
|