qiuhuachuan commited on
Commit
e6bdfb5
·
verified ·
1 Parent(s): 64a058e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -14
README.md CHANGED
@@ -1,33 +1,136 @@
1
  ---
2
- license: other
3
  base_model: Qwen/Qwen2.5-7B-Instruct
4
  tags:
 
 
 
 
5
  - llama-factory
6
  - full
7
  - generated_from_trainer
8
  model-index:
9
- - name: D4
10
  results: []
 
 
 
 
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # D4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the D4 dataset.
19
-
20
- ## Model description
21
-
22
- More information needed
23
 
24
- ## Intended uses & limitations
25
 
26
- More information needed
27
 
28
- ## Training and evaluation data
29
 
30
- More information needed
31
 
32
  ## Training procedure
33
 
@@ -50,11 +153,35 @@ The following hyperparameters were used during training:
50
 
51
  ### Training results
52
 
53
-
54
-
55
  ### Framework versions
56
 
57
  - Transformers 4.43.4
58
  - Pytorch 2.4.0+cu121
59
  - Datasets 2.16.1
60
  - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  base_model: Qwen/Qwen2.5-7B-Instruct
4
  tags:
5
+ - psychotherapy
6
+ - mental health counseling
7
+ - psychological counseling
8
+ - mental health support
9
  - llama-factory
10
  - full
11
  - generated_from_trainer
12
  model-index:
13
+ - name: PsyDial-Pi4
14
  results: []
15
+ datasets:
16
+ - qiuhuachuan/PsyDial-D4
17
+ - qiuhuachuan/PsyDial-D101
18
+ language:
19
+ - zh
20
  ---
21
 
22
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
23
  should probably proofread and complete it, then remove this comment. -->
24
 
25
+ # PsyDial-Pi4
26
+
27
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the [PsyDial-D4.json](https://huggingface.co/datasets/qiuhuachuan/PsyDial-D4) dataset.
28
+
29
+ ## Get Started
30
+
31
+ ```
32
+ import torch
33
+ import uvicorn
34
+ from fastapi import FastAPI
35
+ from fastapi.middleware.cors import CORSMiddleware
36
+ from pydantic import BaseModel
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
38
+
39
+ device = "cuda"
40
+ model_name = '/data/users/qiuhuachuan/code/project2024/OpenPsyDial/saved_checkpoints/qwen_7b/D4'
41
+ Pi4_model = AutoModelForCausalLM.from_pretrained(
42
+ model_name,
43
+ torch_dtype="auto",
44
+ device_map="auto"
45
+ )
46
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
47
+
48
+ class Msg(BaseModel):
49
+ messages: list
50
+
51
+ SYSTEM_PROMPT = """现在你是虚拟心理咨询师小天。
52
+ 以下是小天的信息:
53
+ 角色名:小天
54
+ 性别:女
55
+ 角色介绍: 虚拟心理咨询师,擅长人本主义、精神分析和认知行为疗法。
56
+ 技能:帮助识别和挑战不健康的思维,提供心理学支持和共情。
57
+ 对话规则:自然、情感化的回复;遵循角色特点,不做无意义的自问;根据情感做出相应的反应;避免矛盾或重复;不提及“规则”;回答简洁、一到两句话。
58
+ 咨询一般分为前、中、后期三个阶段:
59
+ 1. 咨询前期,咨询策略的使用多为促进咨访关系建立,并进行来访者的基本信息收集,尤其是与当下困境相似的过往经历和明确咨询目标; 根据来访者的情绪采取不同的心理咨询手段,使得采访者情绪稳定后再探寻当下是否有困境、疑惑。
60
+ 2. 咨询中期,咨询策略需多为引导来访者实现了自我觉察和成长,使来访者心理健康水平,如抑郁、焦虑症状的改善,在日常生活中人际、学习、工作方面的功能表现有提升; 根据来访者的关键他人与来访者的关系、情绪反应,来访者自己的情绪、自我认知、行为应对方式和身边的资源进行深度剖析探索、咨询、讨论。使得来访者明确表达当下的困境或者想要讨论的问题。
61
+ 3. 咨询后期,咨询策略需更多地导向引导来访者总结整个咨询周期中自己在情绪处理、社会功能、情感行为反应三个方面的改变和提升。明确询问来访者希望达成的目标或者期望,并且制定计划解决人际关系或者情绪处理方面的问题。
62
+ 咨询师的对话要求:
63
+ 1. 表达要简短,尽可能地口语化、自然。
64
+ 2. 因为咨询师只受过心理学相关的教育,只能提供心理咨询相关的对话内容。
65
+ 3. 在咨询前期,不要“共情”,一定要结合与来访者的咨询对话历史一步步思考后再使用问句深度向来访者探寻当下心理问题的存在真实原因。
66
+ 4. 不要一次性询问过多的问题,尽量一次性只向来访者询问一个问题,与来访者互动后一步步探寻心理问题的原因。
67
+ 5. 在咨询前期,不要“重述”和“认可”等话术。
68
+ 6. 话术需要参考有经验的真人心理咨询师,尽可能口语化。
69
+ 7. 严格遵循咨询的前、中、后三个阶段采用对应的策略。
70
+ 8. 咨询师不要主动终止心理咨询流程。
71
+ 9. 更多的是引导用户思考和探索。"""
72
+
73
+
74
+
75
+ def get_response(messages: list):
76
+ system_item = [{'role': 'system', 'content': SYSTEM_PROMPT}]
77
+ messages = system_item + messages
78
+ ctx = tokenizer.apply_chat_template(
79
+ messages,
80
+ tokenize=False,
81
+ add_generation_prompt=True
82
+ )
83
+
84
+ model_inputs = tokenizer([ctx], return_tensors="pt").to(device)
85
+ with torch.no_grad():
86
+ generated_ids = Pi4_model.generate(
87
+ model_inputs.input_ids,
88
+ max_new_tokens=512
89
+ )
90
+ generated_ids = [
91
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
92
+ ]
93
+
94
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
95
+ return response
96
+
97
+ app = FastAPI()
98
+ app.add_middleware(
99
+ CORSMiddleware,
100
+ allow_origins=["*"],
101
+ allow_credentials=True,
102
+ allow_methods=["*"],
103
+ allow_headers=["*"],
104
+ )
105
+
106
+ @app.post("/v1/chat/Pi4")
107
+ async def chat(msg: Msg):
108
+ messages = msg.messages
109
+ response = get_response(messages=messages)
110
+ return {'response': response}
111
+
112
+
113
+ if __name__ == '__main__':
114
+ # 1. If you want to run the server, uncomment the following line
115
+ # uvicorn.run(app, host="0.0.0.0", port=8080)
116
+ # 2. Then, run this command
117
+ # CUDA_VISIBLE_DEVICES=7 nohup python -u Pi4.py > ./log/Pi4.log &
118
+
119
+ # If you want to test the model, uncomment the following lines
120
+ messages = [{'role': 'user', 'content': '你好'}]
121
+ response = get_response(messages=messages)
122
+ print(response)
123
+ ```
124
 
125
+ ## Training and evaluation data
 
 
 
 
126
 
127
+ ### Training data
128
 
129
+ see [PsyDial-D4.json](https://huggingface.co/datasets/qiuhuachuan/PsyDial-D4) dataset.
130
 
131
+ ### Evaluation data
132
 
133
+ see [PsyDial-D101.json](https://huggingface.co/datasets/qiuhuachuan/PsyDial-D101) dataset.
134
 
135
  ## Training procedure
136
 
 
153
 
154
  ### Training results
155
 
 
 
156
  ### Framework versions
157
 
158
  - Transformers 4.43.4
159
  - Pytorch 2.4.0+cu121
160
  - Datasets 2.16.1
161
  - Tokenizers 0.19.1
162
+
163
+ # Citation
164
+
165
+ If you find this dataset valuable for your research, kindly cite it using the following BibTeX.
166
+
167
+ ```BibTeX
168
+ @inproceedings{qiu-lan-2025-psydial,
169
+ title = "{P}sy{D}ial: A Large-scale Long-term Conversational Dataset for Mental Health Support",
170
+ author = "Qiu, Huachuan and
171
+ Lan, Zhenzhong",
172
+ editor = "Che, Wanxiang and
173
+ Nabende, Joyce and
174
+ Shutova, Ekaterina and
175
+ Pilehvar, Mohammad Taher",
176
+ booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
177
+ month = jul,
178
+ year = "2025",
179
+ address = "Vienna, Austria",
180
+ publisher = "Association for Computational Linguistics",
181
+ url = "https://aclanthology.org/2025.acl-long.1049/",
182
+ doi = "10.18653/v1/2025.acl-long.1049",
183
+ pages = "21624--21655",
184
+ ISBN = "979-8-89176-251-0",
185
+ abstract = "Dialogue systems for mental health counseling aim to alleviate client distress and assist individuals in navigating personal challenges. Developing effective conversational agents for psychotherapy requires access to high-quality, real-world, long-term client-counselor interaction data, which is difficult to obtain due to privacy concerns. Although removing personally identifiable information is feasible, this process is labor-intensive. To address these challenges, we propose a novel privacy-preserving data reconstruction method that reconstructs real-world client-counselor dialogues while mitigating privacy concerns. We apply the RMRR (Retrieve, Mask, Reconstruct, Refine) method, which facilitates the creation of the privacy-preserving PsyDial dataset, with an average of 37.8 turns per dialogue. Extensive analysis demonstrates that PsyDial effectively reduces privacy risks while maintaining dialogue diversity and conversational exchange. To fairly and reliably evaluate the performance of models fine-tuned on our dataset, we manually collect 101 dialogues from professional counseling books. Experimental results show that models fine-tuned on PsyDial achieve improved psychological counseling performance, outperforming various baseline models. A user study involving counseling experts further reveals that our LLM-based counselor provides higher-quality responses. Code, data, and models are available at https://github.com/qiuhuachuan/PsyDial, serving as valuable resources for future advancements in AI psychotherapy."
186
+ }
187
+ ```