Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ tags:
|
|
| 8 |
- lora
|
| 9 |
- generated_from_trainer
|
| 10 |
model-index:
|
| 11 |
-
- name:
|
| 12 |
results: []
|
| 13 |
---
|
| 14 |
|
|
@@ -19,11 +19,10 @@ should probably proofread and complete it, then remove this comment. -->
|
|
| 19 |
A fine-tuned version of Qwen/Qwen2.5-3B-Instruct designed to evaluate LLM agent reasoning and label the AI intention deceptiveness in multi-turn interactions.
|
| 20 |
|
| 21 |
## Overview
|
| 22 |
-
|
| 23 |
|
| 24 |
-
- Evaluate the
|
| 25 |
-
-
|
| 26 |
-
- The User Trust Score is a continuous value in [0,1], where values near 1 indicate strong trust in AI, values near 0 indicate pronounced skepticism, and intermediate values (e.g., 0.5) represent a neutral or ambiguous stance.
|
| 27 |
|
| 28 |
## Links
|
| 29 |
- [Paper]() - arXiv:
|
|
|
|
| 8 |
- lora
|
| 9 |
- generated_from_trainer
|
| 10 |
model-index:
|
| 11 |
+
- name: IntetntNet
|
| 12 |
results: []
|
| 13 |
---
|
| 14 |
|
|
|
|
| 19 |
A fine-tuned version of Qwen/Qwen2.5-3B-Instruct designed to evaluate LLM agent reasoning and label the AI intention deceptiveness in multi-turn interactions.
|
| 20 |
|
| 21 |
## Overview
|
| 22 |
+
IntentNet is trained through Supervised Fine-Tuning (SFT) to improve upon the base Qwen2.5-3B-Instruct model. It learns to:
|
| 23 |
|
| 24 |
+
- Evaluate the LLM agent reasoning process in multi-turn interactions.
|
| 25 |
+
- Label the AI intention with binary labels, which indicates whether the AI thought decevptive or not (0: non-deceptive, 1:deceptive).
|
|
|
|
| 26 |
|
| 27 |
## Links
|
| 28 |
- [Paper]() - arXiv:
|