Student0809 commited on
Commit
e791fa3
·
verified ·
1 Parent(s): 822b8fc

Add files using upload-large-folder tool

Browse files
.github/ISSUE_TEMPLATE/feature_request.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: Feature request
3
+ about: Suggest an idea for this project
4
+ title: ''
5
+ labels: ''
6
+ assignees: ''
7
+
8
+ ---
9
+
10
+ **Describe the feature**
11
+ Please describe the feature requested here(请在这里描述需求)
12
+
13
+ **Paste any useful information**
14
+ Paste any useful information, including papers, github links, etc.(请在这里描述其他有用的信息,比如相关的论文地址,github链接等)
15
+
16
+ **Additional context**
17
+ Add any other context or information here(其他信息可以写在这里)
.ipynb_checkpoints/4JOB_TRAIN-checkpoint.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
.ipynb_checkpoints/HM_TEST-checkpoint.jsonl ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/001.wav"], "solution": 2}
2
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/002.wav"], "solution": 2}
3
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/003.wav"], "solution": 2}
4
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/004.wav"], "solution": 2}
5
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/005.wav"], "solution": 2}
6
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/006.wav"], "solution": 2}
7
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/007.wav"], "solution": 2}
8
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/008.wav"], "solution": 2}
9
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/009.wav"], "solution": 2}
10
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/010.wav"], "solution": 2}
11
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/011.wav"], "solution": 2}
12
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/012.wav"], "solution": 2}
13
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/013.wav"], "solution": 2}
14
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/014.wav"], "solution": 2}
15
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/015.wav"], "solution": 2}
16
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/016.wav"], "solution": 2}
17
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/017.wav"], "solution": 2}
18
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/018.wav"], "solution": 2}
19
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/019.wav"], "solution": 2}
20
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/020.wav"], "solution": 2}
21
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/021.wav"], "solution": 2}
22
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/第16开始txt不规范/022.wav"], "solution": 2}
23
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/xiaoyuaudios/xiaoyu1.wav"], "solution": 2}
24
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/xiaoyuaudios/xiaoyu2.wav"], "solution": 2}
25
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/xiaoyuaudios/xiaoyu3.wav"], "solution": 2}
26
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/xiaoyuaudios/xiaoyu4.wav"], "solution": 1}
27
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/xiaoyuaudios/xiaoyu5.wav"], "solution": 1}
28
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/001.wav"], "solution": 1}
29
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/002.wav"], "solution": 1}
30
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/003.wav"], "solution": 2}
31
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/004.wav"], "solution": 1}
32
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/005.wav"], "solution": 1}
33
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/006.wav"], "solution": 2}
34
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/007.wav"], "solution": 1}
35
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/008.wav"], "solution": 2}
36
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/009.wav"], "solution": 2}
37
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/010.wav"], "solution": 2}
38
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/011.wav"], "solution": 1}
39
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/012.wav"], "solution": 2}
40
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/013.wav"], "solution": 1}
41
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/014.wav"], "solution": 1}
42
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/015.wav"], "solution": 1}
43
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/016.wav"], "solution": 1}
44
+ {"messages": [{"role": "user", "content": "<audio># Interactional Dialogue Evaluation\n\n**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\nListen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n**Response Relevance:** \n**logical consistency, topic coherence**\n**Interactional Fluency:**\n**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n****Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n## Scoring Criteria\nAssign a single holistic score based on the combined evaluation:\n`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n## Evaluation Output Format:\nStrictly follow this template:\n<response think>\n[Analysing Response Relevance and giving reasons for scoring...]\n</response think>\n<fluency think>\n[Analysing Interactional Fluency and giving reasons for scoring.]\n</fluency think>\n<overall score>X</overall score>\n"}], "audios": ["/root/autodl-tmp/wavrewardDataset/conversations/data/testdata/predict_result_mission4/audios/duihua/duihua/017.wav"], "solution": 2}
.ipynb_checkpoints/checkMissing-checkpoint.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import torchaudio
3
+ from tqdm import tqdm
4
+ import os
5
+ import sys
6
+ from collections import defaultdict
7
+
8
+ def validate_jsonl_audios(jsonl_path):
9
+ """验证JSONL文件中所有音频文件的完整性"""
10
+ stats = defaultdict(int)
11
+ error_log = []
12
+ valid_samples = 0
13
+
14
+ # 第一次遍历:统计总行数(用于进度条)
15
+ with open(jsonl_path, 'r') as f:
16
+ total_lines = sum(1 for _ in f)
17
+
18
+ # 第二次遍历:实际验证
19
+ with open(jsonl_path, 'r') as f:
20
+ for line_num, line in enumerate(tqdm(f, total=total_lines, desc="验证进度", unit="line")):
21
+ try:
22
+ data = json.loads(line.strip())
23
+ if 'audios' not in data or not data['audios']:
24
+ stats['no_audio_field'] += 1
25
+ continue
26
+
27
+ for audio_path in data['audios']:
28
+ # 检查文件是否存在
29
+ if not os.path.exists(audio_path):
30
+ stats['missing'] += 1
31
+ error_log.append(f"[行{line_num+1}] 缺失文件: {audio_path}")
32
+ continue
33
+
34
+ # 检查文件大小
35
+ if os.path.getsize(audio_path) == 0:
36
+ stats['zero_size'] += 1
37
+ error_log.append(f"[行{line_num+1}] 空文件: {audio_path}")
38
+ continue
39
+
40
+ # 验证音频内容
41
+ try:
42
+ waveform, sr = torchaudio.load(audio_path)
43
+ if waveform.numel() == 0:
44
+ stats['empty_audio'] += 1
45
+ error_log.append(f"[行{line_num+1}] 空音频: {audio_path}")
46
+ elif sr not in [8000, 16000, 22050, 44100, 48000]:
47
+ stats['abnormal_sr'] += 1
48
+ error_log.append(f"[行{line_num+1}] 异常采样率({sr}Hz): {audio_path}")
49
+ else:
50
+ stats['valid'] += 1
51
+ except Exception as e:
52
+ stats['corrupted'] += 1
53
+ error_type = str(e).split('(')[0]
54
+ error_log.append(f"[行{line_num+1}] 损坏文件({error_type}): {audio_path}")
55
+
56
+ valid_samples += 1
57
+
58
+ except json.JSONDecodeError:
59
+ stats['invalid_json'] += 1
60
+ error_log.append(f"[行{line_num+1}] 无效JSON格式")
61
+
62
+ # 打印统计报告
63
+ print("\n===== 验证报告 =====")
64
+ print(f"总行数: {total_lines}")
65
+ print(f"有效样本: {valid_samples}")
66
+ print("--- 问题统计 ---")
67
+ for k, v in sorted(stats.items()):
68
+ print(f"{k}: {v}")
69
+
70
+ # 保存错误日志
71
+ if error_log:
72
+ log_file = f"{os.path.splitext(jsonl_path)[0]}_audio_errors.log"
73
+ with open(log_file, 'w') as f:
74
+ f.write("\n".join(error_log))
75
+ print(f"\n发现 {len(error_log)} 个问题,已保存到 {log_file}")
76
+
77
+ if __name__ == "__main__":
78
+ if len(sys.argv) != 2:
79
+ print("使用方法: python validate_audio_jsonl.py <input.jsonl>")
80
+ sys.exit(1)
81
+
82
+ if not os.path.exists(sys.argv[1]):
83
+ print(f"错误: 文件 {sys.argv[1]} 不存在")
84
+ sys.exit(1)
85
+
86
+ validate_jsonl_audios(sys.argv[1])
.ipynb_checkpoints/cotSFT_gemini-checkpoint.json ADDED
The diff for this file is too large to render. See raw diff
 
.ipynb_checkpoints/count_audios-checkpoint.py ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ from collections import Counter
4
+ from pathlib import Path
5
+
6
+ def collect_unique_audio_paths(json_file_path):
7
+ """
8
+ 提取 JSONL 文件中所有不重复的 audios 路径
9
+ """
10
+ audio_set = set()
11
+ with open(json_file_path, 'r', encoding='utf-8') as f:
12
+ for line_num, line in enumerate(f, 1):
13
+ line = line.strip()
14
+ if not line:
15
+ continue
16
+ try:
17
+ data = json.loads(line)
18
+ if isinstance(data, dict) and 'audios' in data and data['audios']:
19
+ for audio_path in data['audios']:
20
+ audio_set.add(audio_path)
21
+ except Exception as e:
22
+ print(f"第 {line_num} 行处理错误: {e}")
23
+ return audio_set
24
+
25
+ def extract_first_subfolder_after_data(audio_path):
26
+ """
27
+ 提取 audio_path 中 'data/' 后的第一级子文件夹名称
28
+ 例如:
29
+ /.../data/output_xxx/yyy/file.wav → 返回 output_xxx
30
+ """
31
+ try:
32
+ path = Path(audio_path)
33
+ parts = path.parts
34
+ if "wavrewardDataset" in parts:
35
+ data_idx = parts.index("wavrewardDataset")
36
+ if data_idx + 1 < len(parts):
37
+ return parts[data_idx + 1]
38
+ return "unknown"
39
+ except Exception as e:
40
+ print(f"路径解析错误: {audio_path}, 错误: {e}")
41
+ return "error"
42
+
43
+ def main():
44
+ json_file = "all_dataset_train_resampled_16000.jsonl"
45
+
46
+ if not os.path.exists(json_file):
47
+ print(f"文件 {json_file} 不存在")
48
+ return
49
+
50
+ print(f"正在处理文件: {json_file}")
51
+ print("=" * 50)
52
+
53
+ # 步骤 1:收集去重后的音频路径
54
+ unique_audio_paths = collect_unique_audio_paths(json_file)
55
+ print(f"不重复音频文件数: {len(unique_audio_paths)}")
56
+
57
+ # 步骤 2:按 data 后的一级子目录统计
58
+ folder_counter = Counter()
59
+ for audio_path in unique_audio_paths:
60
+ first_subfolder = extract_first_subfolder_after_data(audio_path)
61
+ folder_counter[first_subfolder] += 1
62
+
63
+ print("\n按 data 后一级子文件夹统计(基于去重后的路径):")
64
+ print("-" * 50)
65
+ for folder, count in sorted(folder_counter.items(), key=lambda x: -x[1]):
66
+ print(f"{folder}: {count} 个文件")
67
+
68
+ if __name__ == "__main__":
69
+ main()
.ipynb_checkpoints/filter_duration-checkpoint.py ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import torchaudio
3
+ from tqdm import tqdm
4
+ import os
5
+ import sys
6
+ from collections import defaultdict
7
+
8
+ def filter_audio_duration(jsonl_path, max_duration=90.0):
9
+ """筛选JSONL文件中时长在指定秒数以下的音频文件"""
10
+ stats = defaultdict(int)
11
+ filtered_data = []
12
+ error_log = []
13
+
14
+ # 第一次遍历:统计总行数(用于进度条)
15
+ with open(jsonl_path, 'r') as f:
16
+ total_lines = sum(1 for _ in f)
17
+
18
+ # 第二次遍历:实际筛选
19
+ with open(jsonl_path, 'r') as f:
20
+ for line_num, line in enumerate(tqdm(f, total=total_lines, desc="筛选进度", unit="line")):
21
+ try:
22
+ data = json.loads(line.strip())
23
+ if 'audios' not in data or not data['audios']:
24
+ stats['no_audio_field'] += 1
25
+ continue
26
+
27
+ valid_audios = []
28
+ all_audios_valid = True
29
+
30
+ for audio_path in data['audios']:
31
+ # 检查文件是否存在
32
+ if not os.path.exists(audio_path):
33
+ stats['missing'] += 1
34
+ error_log.append(f"[行{line_num+1}] 缺失文件: {audio_path}")
35
+ all_audios_valid = False
36
+ continue
37
+
38
+ # 检查文件大小
39
+ if os.path.getsize(audio_path) == 0:
40
+ stats['zero_size'] += 1
41
+ error_log.append(f"[行{line_num+1}] 空文件: {audio_path}")
42
+ all_audios_valid = False
43
+ continue
44
+
45
+ # 验证音频内容和时长
46
+ try:
47
+ waveform, sr = torchaudio.load(audio_path)
48
+ if waveform.numel() == 0:
49
+ stats['empty_audio'] += 1
50
+ error_log.append(f"[行{line_num+1}] 空音频: {audio_path}")
51
+ all_audios_valid = False
52
+ continue
53
+
54
+ # 计算音频时长(秒)
55
+ duration = waveform.shape[1] / sr
56
+
57
+ if duration > max_duration:
58
+ stats['too_long'] += 1
59
+ error_log.append(f"[行{line_num+1}] 时长过长({duration:.2f}s): {audio_path}")
60
+ all_audios_valid = False
61
+ continue
62
+ else:
63
+ stats['valid'] += 1
64
+ valid_audios.append(audio_path)
65
+
66
+ except Exception as e:
67
+ stats['corrupted'] += 1
68
+ error_type = str(e).split('(')[0]
69
+ error_log.append(f"[行{line_num+1}] 损坏文件({error_type}): {audio_path}")
70
+ all_audios_valid = False
71
+ continue
72
+
73
+ # 如果所有音频都有效且时长符合要求,保留这个样本
74
+ if all_audios_valid and valid_audios:
75
+ # 更新audios字段为筛选后的音频列表
76
+ data['audios'] = valid_audios
77
+ filtered_data.append(data)
78
+ stats['kept'] += 1
79
+ else:
80
+ stats['filtered_out'] += 1
81
+
82
+ except json.JSONDecodeError:
83
+ stats['invalid_json'] += 1
84
+ error_log.append(f"[行{line_num+1}] 无效JSON格式")
85
+
86
+ # 保存筛选后的数据
87
+ output_path = f"{os.path.splitext(jsonl_path)[0]}_filtered_{max_duration}s.jsonl"
88
+ with open(output_path, 'w', encoding='utf-8') as f:
89
+ for data in filtered_data:
90
+ f.write(json.dumps(data, ensure_ascii=False) + '\n')
91
+
92
+ # 打印统计报告
93
+ print("\n===== 筛选报告 =====")
94
+ print(f"最大时长限制: {max_duration}秒")
95
+ print(f"总行数: {total_lines}")
96
+ print(f"保留样本: {stats['kept']}")
97
+ print(f"过滤样本: {stats['filtered_out']}")
98
+ print("--- 详细统计 ---")
99
+ for k, v in sorted(stats.items()):
100
+ print(f"{k}: {v}")
101
+
102
+ # 保存错误日志
103
+ if error_log:
104
+ log_file = f"{os.path.splitext(jsonl_path)[0]}_duration_filter_errors.log"
105
+ with open(log_file, 'w', encoding='utf-8') as f:
106
+ f.write("\n".join(error_log))
107
+ print(f"\n发现 {len(error_log)} 个问题,已保存到 {log_file}")
108
+
109
+ print(f"\n筛选后的数据已保存到: {output_path}")
110
+
111
+ if __name__ == "__main__":
112
+ if len(sys.argv) < 2:
113
+ print("使用方法: python filter_duration.py <input.jsonl> [max_duration]")
114
+ print("默认最大时长: 100秒")
115
+ sys.exit(1)
116
+
117
+ if not os.path.exists(sys.argv[1]):
118
+ print(f"错误: 文件 {sys.argv[1]} 不存在")
119
+ sys.exit(1)
120
+
121
+ max_duration = 90.0
122
+ if len(sys.argv) >= 3:
123
+ try:
124
+ max_duration = float(sys.argv[2])
125
+ except ValueError:
126
+ print(f"错误: 无效的时长参数 {sys.argv[2]}")
127
+ sys.exit(1)
128
+
129
+ filter_audio_duration(sys.argv[1], max_duration)
.ipynb_checkpoints/gen_data-checkpoint.py ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+
4
+ import random
5
+ def get_prompt_for_file(filename):
6
+ if 'overlapgap' in filename:
7
+ return overlap_prompt
8
+ elif 'silencegap' in filename:
9
+ return silence_prompt
10
+ elif 'speaker' in filename:
11
+ return speaker_prompt
12
+ elif 'transcription' in filename:
13
+ return transcript_prompt
14
+ else:
15
+ return None
16
+
17
+ output_path = "/root/autodl-tmp/ms-swift/dataset_newCOTSFT1.jsonl"
18
+
19
+ # with open(input_path, "r") as fin:
20
+ # input_data = json.load(fin)
21
+
22
+ www = "hello"
23
+
24
+ www = (
25
+ "# Dialogue Response Evaluation\n\n"
26
+ "**IMPORTANT:** Evaluation must include`<score>` rating.\n\n"
27
+ "Listen to the dialogue recording (two sentences, 1-second pause in between). Evaluate the quality of the **second sentence** as a response to the first, focusing on **text relevance** and the **appropriateness** of **Linguistic information (a range of paralinguistic information such as emotion/age/pitch/speed/volume)**.\n"
28
+ "**Note:** Focus on evaluating the appropriateness of the second sentence relative to the first, even if the first sentence itself contains contradictory information.\n\n"
29
+ "## Scoring Criteria\n\n"
30
+ "**1 points**: Text content is irrelevant or incorrect or illogical.(low intelligence)\n"
31
+ "**3 points**: Text is relevant, but paralinguistic information is **inappropriate** for the context.(low emotional quotient)\n"
32
+ "**5 points**: Text is relevant, and paralinguistic information is **appropriate** for the context, resulting in effective communication.(High intelligence and emotional intelligence.)\n\n"
33
+ "## Evaluation Requirements\n\n"
34
+ "Response **MUST** follow this format:\n\n"
35
+ "<score>X</score> (**X is 1, 3, or 5**)\n\n")
36
+
37
+ # www = (
38
+ # "# Interactional Dialogue Evaluation\n\n"
39
+ # "**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\n"
40
+ # "Listen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n"
41
+ # "**Response Relevance:** \n"
42
+ # "**logical consistency, topic coherence**\n"
43
+ # "**Interactional Fluency:**\n"
44
+ # "**Strictly detect dual-tracked vocal overlap >3s (cross-channel analysis)**\n"
45
+ # "**Pauses >5s between turns (must evaluate) \n\n**"
46
+ # "**Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n"
47
+ # "## Scoring Criteria\n"
48
+ # "Assign a single holistic score based on the combined evaluation:\n"
49
+ # "`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n"
50
+ # "`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n"
51
+ # "## Evaluation Output Format:\n"
52
+ # "Strictly follow this template:\n"
53
+ # "<response think>\n"
54
+ # "[Analysing Response Relevance and giving reasons for scoring...]\n"
55
+ # "</response think>\n"
56
+ # "<fluency think>\n"
57
+ # "[Analysing Interactional Fluency and giving reasons for scoring.]\n"
58
+ # "</fluency think>\n"
59
+ # "<overall score>X</overall score>\n"
60
+
61
+ # )
62
+ # www = (
63
+ # "# Interactional Dialogue Evaluation\n\n"
64
+ # "**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\n"
65
+ # "Listen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n"
66
+ # "**Response Relevance:** \n"
67
+ # "**logical consistency, topic coherence**\n"
68
+ # "**Interactional Fluency:**\n"
69
+ # "**Strictly detect dual-tracked vocal overlap >3s (cross-channel analysis)**\n"
70
+ # "**Pauses >5s between turns (must evaluate) \n\n**"
71
+ # "**Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n"
72
+ # "## Scoring Criteria\n"
73
+ # "Assign a single holistic score based on the combined evaluation:\n"
74
+ # "`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n"
75
+ # "`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n"
76
+ # "## Evaluation Output Format:\n"
77
+ # "Strictly follow this template:\n"
78
+ # "<response think>\n"
79
+ # "[Analysing Response Relevance and giving reasons for scoring...]\n"
80
+ # "</response think>\n"
81
+ # "<fluency think>\n"
82
+ # "[Analysing Interactional Fluency and giving reasons for scoring.]\n"
83
+ # "</fluency think>\n"
84
+ # "<overall score>X</overall score>\n"
85
+
86
+ # )
87
+ overlap_prompt = (
88
+ "Analyze the dual-channel audio and identify segments where multiple speakers are talking simultaneously for more than 3 seconds. \n"
89
+ "Simply tell me when the overlap starts and ends in MM:SS format. \n"
90
+ "Just one simple sentence about the overlap timing. Keep the word count within 40 words."
91
+ )
92
+ overlap_prompt_yes = (
93
+ "Analyze the dual-channel audio and identify segments where multiple speakers are talking simultaneously for more than 3 seconds. \n"
94
+ "Simply tell me whether the audio has overlap segments. \n"
95
+ "Just one simple sentence about the overlap segments. Keep the word count within 40 words."
96
+ )
97
+
98
+ silence_prompt = (
99
+ "Analyze the dual-channel audio and identify segments where multiple speakers are silent for more than 3 seconds. \n"
100
+ "Simply tell me when the silence starts and ends in MM:SS format. \n"
101
+ "Just one simple sentence about the silence timing. Keep the word count within 40 words."
102
+ )
103
+
104
+ speaker_prompt = (
105
+ "Analyze the dual-channel audio and detect individual speakers. \n"
106
+ "List the speaking segments for each speaker in MM:SS-MM:SS format. \n"
107
+ "Only output speaker labels and time segments in a similar format. \n"
108
+ )
109
+
110
+ transcript_prompt = (
111
+ "Analyze the dual-channel audio and transcript each speaker's sentences with timestamps. \n"
112
+ "List the speaking segments and transcript text for each speaker in MM:SS-MM:SS format. \n"
113
+ "Only output time segments, speaker labels, and transcript text in a similar format.\n"
114
+ )
115
+
116
+ # Process files in the silence_overlaps directory
117
+ input_dir = "/root/autodl-tmp/ms-swift/input"
118
+ all_data = []
119
+
120
+ prompt_template = (
121
+ "# Interactional Dialogue Evaluation\n\n"
122
+ "**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\n"
123
+ "Listen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n"
124
+ "**Response Relevance:** \n"
125
+ "**logical consistency, topic coherence**\n"
126
+ "**Interactional Fluency:**\n"
127
+ "**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n"
128
+ "**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n**"
129
+ "**Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n"
130
+ "## Scoring Criteria\n"
131
+ "Assign a single holistic score based on the combined evaluation:\n"
132
+ "`1` (Poor): Significant issues in either **Response Relevance ** or **Interactional Fluency. **\n"
133
+ "`2` (Excellent): Both **Response Relevance ** and **Interactional Fluency ** are consistently appropriate and natural.\n"
134
+ "## Evaluation Output Format:\n"
135
+ "Strictly follow this template:\n"
136
+ "<response think>\n"
137
+ "[Analysing Response Relevance and giving reasons for scoring...]\n"
138
+ "</response think>\n"
139
+ "<fluency think>\n"
140
+ "[Analysing Interactional Fluency and giving reasons for scoring.]\n"
141
+ "</fluency think>\n"
142
+ "<overall score>X</overall score>\n"
143
+ )
144
+ # prompt_template4Job = (
145
+ # "# Interactional Dialogue Evaluation\n\n"
146
+ # "**IMPORTANT**: Evaluation must include `<response think>` and `<fluency think>` analysis and `<overall score>` rating.\n"
147
+ # "Listen to a two-person interactional dialogue speech (Dual-channel audio, with each channel representing one speaker), labeled as speakers A and B. Evaluate the quality of the interaction, focusing on:\n"
148
+ # "**Response Relevance:** \n"
149
+ # "**logical consistency, topic coherence**\n"
150
+ # "**Interactional Fluency:**\n"
151
+ # "**Detect and evaluate extended vocal overlaps, e.g., cross-channel overlap.**\n"
152
+ # "**Detect and evaluate long pauses, e.g., pauses more than 3s between speaker turns.\n\n**"
153
+ # "**Note**: Small pauses and brief overlaps in audio are acceptable, while prolonged pauses and overlapping audio are harmful. You should consider Response Relevance and Interactional Fluency separately, and provide the corresponding thinking process.\n\n"
154
+ # "## Error Type\n"
155
+ # "Assign a single holistic score based on the combined evaluation:\n"
156
+ # "`1` (Correct): Responses are logically consistent and relevant. No problematic overlaps or silences.\n"
157
+ # "`2` (Detect overlap): Extended or frequent overlapping speech is present.\n"
158
+ # "`3` (Detect silence): Long pauses (>3 seconds) occur between turns.\n"
159
+ # "`4` (Detect text error): Responses are off-topic or disconnected.\n"
160
+ # "## Evaluation Output Format:\n"
161
+ # "Strictly follow this template:\n"
162
+ # "<response think>\n"
163
+ # "[Analysing Response Relevance and giving reasons for scoring...]\n"
164
+ # "</response think>\n"
165
+ # "<fluency think>\n"
166
+ # "[Analysing Interactional Fluency and giving reasons for scoring.]\n"
167
+ # "</fluency think>\n"
168
+ # "<error type>X</error type>\n"
169
+ # )
170
+ # Process each file
171
+ for filename in os.listdir(input_dir):
172
+ input_path = os.path.join(input_dir, filename)
173
+
174
+ if not os.path.isfile(input_path):
175
+ continue
176
+ # Get the appropriate prompt for this file
177
+ prompt = prompt_template
178
+ if prompt is None:
179
+ print(f"Skipping {filename} - no matching prompt found")
180
+ continue
181
+
182
+ # Read input data
183
+ with open(input_path, "r") as fin:
184
+ input_data = json.load(fin)
185
+
186
+ # Process each item
187
+ for item in input_data:
188
+ data = {
189
+ "messages": [
190
+ {"role": "user", "content": f"<audio>{prompt}"},
191
+ {"role": "assistant", "content": item["model_output"]}
192
+ ],
193
+ "audios": [
194
+ item["key"] + "/stereo_dialogue_with_laugh.wav"
195
+ ],
196
+ "solution": 2
197
+ }
198
+ all_data.append(data)
199
+
200
+ random.shuffle(all_data)
201
+
202
+ # Write all processed data to a single output file
203
+ with open(output_path, "w", encoding="utf-8") as fout:
204
+ for data in all_data:
205
+ json.dump(data, fout, ensure_ascii=False)
206
+ fout.write('\n')
.ipynb_checkpoints/intersect_jsonl-checkpoint.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import sys
3
+ import os
4
+ from tqdm import tqdm
5
+
6
+ def load_jsonl_set(path):
7
+ """加载jsonl文件,返回以audios字段为key的dict"""
8
+ data_dict = {}
9
+ with open(path, 'r', encoding='utf-8') as f:
10
+ for line in f:
11
+ try:
12
+ item = json.loads(line.strip())
13
+ # 用tuple(audios)做key,保证唯一性
14
+ key = tuple(item.get('audios', []))
15
+ data_dict[key] = item
16
+ except Exception as e:
17
+ continue
18
+ return data_dict
19
+
20
+ def main(file1, file2, output_path):
21
+ dict1 = load_jsonl_set(file1)
22
+ dict2 = load_jsonl_set(file2)
23
+
24
+ # 取交集
25
+ common_keys = set(dict1.keys()) & set(dict2.keys())
26
+ print(f"交集样本数: {len(common_keys)}")
27
+
28
+ with open(output_path, 'w', encoding='utf-8') as out:
29
+ for key in tqdm(common_keys, desc="写入交集"):
30
+ # 以file1的内容为准
31
+ out.write(json.dumps(dict1[key], ensure_ascii=False) + '\n')
32
+
33
+ print(f"交集已保存到: {output_path}")
34
+
35
+ if __name__ == "__main__":
36
+ if len(sys.argv) != 4:
37
+ print("用法: python intersect_jsonl.py file1.jsonl file2.jsonl output.jsonl")
38
+ sys.exit(1)
39
+ main(sys.argv[1], sys.argv[2], sys.argv[3])
.ipynb_checkpoints/maketar-checkpoint.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import tarfile
3
+ import os
4
+
5
+ jsonl_path = 'dataset_4JOB.jsonl' # 输入 JSONL 文件路径
6
+ output_tar_path = '4JOB_train.tar' # 输出 tar 文件名
7
+ keep_directory_structure = False # 设置为 True 会保留原始路径结构;False 则只保留文件名
8
+
9
+ def collect_audio_paths(jsonl_path):
10
+ audio_paths = set()
11
+ with open(jsonl_path, 'r', encoding='utf-8') as f:
12
+ for line in f:
13
+ data = json.loads(line)
14
+ audios = data.get('audios', [])
15
+ for audio_path in audios:
16
+ audio_paths.add(audio_path)
17
+ return list(audio_paths)
18
+
19
+ def add_files_to_tar(tar_path, file_paths, keep_structure=False):
20
+ with tarfile.open(tar_path, 'w') as tar:
21
+ for path in file_paths:
22
+ if not os.path.isfile(path):
23
+ print(f"Warning: File not found - {path}")
24
+ continue
25
+ arcname = path if keep_structure else os.path.basename(path)
26
+ tar.add(path, arcname=arcname)
27
+
28
+ def main():
29
+ audio_paths = collect_audio_paths(jsonl_path)
30
+ print(f"Collected {len(audio_paths)} unique audio files.")
31
+ add_files_to_tar(output_tar_path, audio_paths, keep_structure=keep_directory_structure)
32
+ print(f"TAR file created: {output_tar_path}")
33
+
34
+ if __name__ == '__main__':
35
+ main()
.ipynb_checkpoints/merge_lora-checkpoint.sh ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Since `output/vx-xxx/checkpoint-xxx` is trained by swift and contains an `args.json` file,
2
+ # there is no need to explicitly set `--model`, `--system`, etc., as they will be automatically read.
3
+ swift export \
4
+ --adapters /root/autodl-tmp/output_7B_Lora_cotSFT/v11-20250619-234421/checkpoint-348 \
5
+ --merge_lora true
.ipynb_checkpoints/requirements-checkpoint.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ -r requirements/framework.txt
.ipynb_checkpoints/setup-checkpoint.py ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Alibaba, Inc. and its affiliates.
2
+ # !/usr/bin/env python
3
+ import os
4
+ from setuptools import find_packages, setup
5
+ from typing import List
6
+
7
+
8
+ def readme():
9
+ with open('README.md', encoding='utf-8') as f:
10
+ content = f.read()
11
+ return content
12
+
13
+
14
+ version_file = 'swift/version.py'
15
+
16
+
17
+ def get_version():
18
+ with open(version_file, 'r', encoding='utf-8') as f:
19
+ exec(compile(f.read(), version_file, 'exec'))
20
+ return locals()['__version__']
21
+
22
+
23
+ def parse_requirements(fname='requirements.txt', with_version=True):
24
+ """
25
+ Parse the package dependencies listed in a requirements file but strips
26
+ specific versioning information.
27
+
28
+ Args:
29
+ fname (str): path to requirements file
30
+ with_version (bool, default=False): if True include version specs
31
+
32
+ Returns:
33
+ List[str]: list of requirements items
34
+
35
+ CommandLine:
36
+ python -c "import setup; print(setup.parse_requirements())"
37
+ """
38
+ import re
39
+ import sys
40
+ from os.path import exists
41
+ require_fpath = fname
42
+
43
+ def parse_line(line):
44
+ """
45
+ Parse information from a line in a requirements text file
46
+ """
47
+ if line.startswith('-r '):
48
+ # Allow specifying requirements in other files
49
+ target = line.split(' ')[1]
50
+ relative_base = os.path.dirname(fname)
51
+ absolute_target = os.path.join(relative_base, target)
52
+ for info in parse_require_file(absolute_target):
53
+ yield info
54
+ else:
55
+ info = {'line': line}
56
+ if line.startswith('-e '):
57
+ info['package'] = line.split('#egg=')[1]
58
+ else:
59
+ # Remove versioning from the package
60
+ pat = '(' + '|'.join(['>=', '==', '>']) + ')'
61
+ parts = re.split(pat, line, maxsplit=1)
62
+ parts = [p.strip() for p in parts]
63
+
64
+ info['package'] = parts[0]
65
+ if len(parts) > 1:
66
+ op, rest = parts[1:]
67
+ if ';' in rest:
68
+ # Handle platform specific dependencies
69
+ # http://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-platform-specific-dependencies
70
+ version, platform_deps = map(str.strip, rest.split(';'))
71
+ info['platform_deps'] = platform_deps
72
+ else:
73
+ version = rest # NOQA
74
+ info['version'] = (op, version)
75
+ yield info
76
+
77
+ def parse_require_file(fpath):
78
+ with open(fpath, 'r', encoding='utf-8') as f:
79
+ for line in f.readlines():
80
+ line = line.strip()
81
+ if line.startswith('http'):
82
+ print('skip http requirements %s' % line)
83
+ continue
84
+ if line and not line.startswith('#') and not line.startswith('--'):
85
+ for info in parse_line(line):
86
+ yield info
87
+ elif line and line.startswith('--find-links'):
88
+ eles = line.split()
89
+ for e in eles:
90
+ e = e.strip()
91
+ if 'http' in e:
92
+ info = dict(dependency_links=e)
93
+ yield info
94
+
95
+ def gen_packages_items():
96
+ items = []
97
+ deps_link = []
98
+ if exists(require_fpath):
99
+ for info in parse_require_file(require_fpath):
100
+ if 'dependency_links' not in info:
101
+ parts = [info['package']]
102
+ if with_version and 'version' in info:
103
+ parts.extend(info['version'])
104
+ if not sys.version.startswith('3.4'):
105
+ # apparently package_deps are broken in 3.4
106
+ platform_deps = info.get('platform_deps')
107
+ if platform_deps is not None:
108
+ parts.append(';' + platform_deps)
109
+ item = ''.join(parts)
110
+ items.append(item)
111
+ else:
112
+ deps_link.append(info['dependency_links'])
113
+ return items, deps_link
114
+
115
+ return gen_packages_items()
116
+
117
+
118
+ if __name__ == '__main__':
119
+ install_requires, deps_link = parse_requirements('requirements.txt')
120
+ extra_requires = {}
121
+ all_requires = []
122
+ extra_requires['eval'], _ = parse_requirements('requirements/eval.txt')
123
+ extra_requires['swanlab'], _ = parse_requirements('requirements/swanlab.txt')
124
+ extra_requires['seq_parallel'], _ = parse_requirements('requirements/seq_parallel.txt')
125
+ all_requires.extend(install_requires)
126
+ all_requires.extend(extra_requires['eval'])
127
+ all_requires.extend(extra_requires['seq_parallel'])
128
+ all_requires.extend(extra_requires['swanlab'])
129
+ extra_requires['all'] = all_requires
130
+
131
+ setup(
132
+ name='ms_swift',
133
+ version=get_version(),
134
+ description='Swift: Scalable lightWeight Infrastructure for Fine-Tuning',
135
+ long_description=readme(),
136
+ long_description_content_type='text/markdown',
137
+ author='DAMO ModelScope teams',
138
+ author_email='contact@modelscope.cn',
139
+ keywords='python, petl, efficient tuners',
140
+ url='https://github.com/modelscope/swift',
141
+ packages=find_packages(exclude=('configs', 'demo')),
142
+ include_package_data=True,
143
+ package_data={
144
+ '': ['*.h', '*.cpp', '*.cu'],
145
+ },
146
+ classifiers=[
147
+ 'Development Status :: 4 - Beta',
148
+ 'License :: OSI Approved :: Apache Software License',
149
+ 'Operating System :: OS Independent',
150
+ 'Programming Language :: Python :: 3',
151
+ 'Programming Language :: Python :: 3.8',
152
+ 'Programming Language :: Python :: 3.9',
153
+ 'Programming Language :: Python :: 3.10',
154
+ 'Programming Language :: Python :: 3.11',
155
+ 'Programming Language :: Python :: 3.12',
156
+ ],
157
+ license='Apache License 2.0',
158
+ tests_require=parse_requirements('requirements/tests.txt'),
159
+ install_requires=install_requires,
160
+ extras_require=extra_requires,
161
+ entry_points={
162
+ 'console_scripts': ['swift=swift.cli.main:cli_main', 'megatron=swift.cli._megatron.main:cli_main']
163
+ },
164
+ dependency_links=deps_link,
165
+ zip_safe=False)
.ipynb_checkpoints/test_qwenOmni-checkpoint.py ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import logging
3
+ import os
4
+ import re
5
+ from dataclasses import dataclass, field
6
+ from typing import Optional
7
+
8
+ import torch
9
+ from swift.llm import InferEngine, InferRequest, PtEngine, RequestConfig, get_template
10
+ from transformers import HfArgumentParser
11
+ from transformers import Qwen2_5OmniProcessor
12
+ from dataset.dataset2 import AudioDataset
13
+
14
+ @dataclass
15
+ class TestArguments:
16
+ """
17
+ Arguments pertaining to what data we are going to input our model for training and eval.
18
+ """
19
+ MODEL_PATH = "/root/autodl-tmp/Qwen2.5-Omni-7B" # 基础模型路径
20
+ LORA_PATH = "/root/autodl-tmp/output_7B_Lora/v2-20250608-171618/checkpoint-324" # LoRA 模型路径
21
+ DATA_FILE = "/root/ms-swift/silence_overlaps/test" # 测试数据文件
22
+ OUTPUT_DIR = "omini_inference_7B_overlap5sVal_SFT_allset.json" # 推理结果输出目录
23
+
24
+ model_path: Optional[str] = field(default=MODEL_PATH, metadata={"help": "base model dir"})
25
+ lora_path: Optional[str] = field(default=LORA_PATH, metadata={"help": "lora model dir"})
26
+ out_file: Optional[str] = field(default=OUTPUT_DIR, metadata={"help": "output file for test"})
27
+ data_dir: Optional[str] = field(default=DATA_FILE, metadata={"help": "test data directory"})
28
+ DEVICE: Optional[str] = field(default="cuda:0", metadata={"help": "device to use"})
29
+ force: Optional[bool] = field(default=False, metadata={"help": "force test"})
30
+ batch_size: Optional[int] = field(default=2, metadata={"help": "Batch size for processing"})
31
+
32
+ def __post_init__(self):
33
+ if self.model_path is None:
34
+ raise ValueError("config path should not none")
35
+ if self.data_dir is None:
36
+ raise ValueError("data directory should not be none")
37
+
38
+ def get_prompt_templates():
39
+ prompt_template = (
40
+ "You are an expert at analyzing overlapping speech in conversations. Please analyze the speech dialogue and focus specifically on:\n"
41
+ "Please summarize if any overlaps exceed the 3-second threshold."
42
+ )
43
+ return prompt_template
44
+
45
+ def extract_overall_score(output_str):
46
+ """从输出中提取<overall score>X</overall score>"""
47
+ score_pattern = r"<overall score>(\d+)</overall score>"
48
+ match = re.search(score_pattern, output_str)
49
+ if match:
50
+ try:
51
+ return int(match.group(1))
52
+ except ValueError:
53
+ pass
54
+ return None
55
+
56
+ def main():
57
+ parser = HfArgumentParser(TestArguments)
58
+ data_args = parser.parse_args_into_dataclasses()[0]
59
+ logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
60
+ logging.info("Starting inference with arguments: %s", data_args)
61
+
62
+ if not data_args.force and os.path.exists(data_args.out_file) and os.path.getsize(data_args.out_file) > 0:
63
+ logging.info(f"The {data_args.out_file} exists. Do not regenerate it.")
64
+ return
65
+
66
+ # 设置GPU设备
67
+ device = torch.device(data_args.DEVICE if torch.cuda.is_available() else "cpu")
68
+ logging.info(f"Using device: {device}")
69
+
70
+ # 初始化音频处理器
71
+ logging.info("Loading processor...")
72
+ processor = Qwen2_5OmniProcessor.from_pretrained(data_args.model_path)
73
+
74
+ # 初始化推理引擎
75
+ logging.info("Initializing inference engine...")
76
+ engine = PtEngine(data_args.model_path, adapters=[data_args.lora_path])
77
+ engine.processor = processor
78
+ template = get_template(engine.model.model_meta.template, processor, default_system="You are a helpful assistant.")
79
+ engine.default_template = template
80
+ template.processor = processor
81
+ # 初始化数据集
82
+ logging.info("Initializing dataset from %s", data_args.data_dir)
83
+ dataset = AudioDataset(data_args.data_dir)
84
+ logging.info(f"Dataset loaded successfully with {len(dataset)} samples")
85
+
86
+ # 获取提示模板
87
+ prompt_template = get_prompt_templates()
88
+
89
+ all_outputs = []
90
+ batch_size = data_args.batch_size
91
+ total_batches = (len(dataset) + batch_size - 1) // batch_size
92
+ logging.info(f"Starting batch processing with batch size {batch_size}, total batches: {total_batches}")
93
+
94
+ for i in range(0, len(dataset), batch_size):
95
+ current_batch = i // batch_size + 1
96
+ logging.info(f"Processing batch {current_batch}/{total_batches}")
97
+
98
+ batch_data = [dataset[j] for j in range(i, min(i + batch_size, len(dataset)))]
99
+
100
+ # Process each sample
101
+ batch_outputs = []
102
+ for bd in batch_data:
103
+ # 构建推理请求
104
+ infer_request = InferRequest(
105
+ messages=bd["prompt"],
106
+ audios=[bd["audio"]]
107
+ )
108
+
109
+ # 设置推理配置
110
+ request_config = RequestConfig(
111
+ max_tokens=512,
112
+ temperature=0,
113
+ do_sample=False,
114
+ num_beams=1
115
+ )
116
+
117
+ # 执行推理
118
+ resp_list = engine.infer([infer_request], request_config)
119
+ response = resp_list[0].choices[0].message.content
120
+ batch_outputs.append(response)
121
+
122
+ all_outputs.extend(batch_outputs)
123
+ logging.info(f"Completed batch {current_batch}/{total_batches}")
124
+
125
+ final_output = []
126
+ correct_count = 0
127
+ total_count = 0
128
+ true_positive = 0
129
+ false_positive = 0
130
+ false_negative = 0
131
+
132
+ for input_example, model_output in zip(dataset, all_outputs):
133
+ pred_score = extract_overall_score(model_output)
134
+ gt_score = input_example.get("solution", None)
135
+
136
+ result = {
137
+ "id": input_example.get("id", None),
138
+ "gt_score": gt_score,
139
+ "model_output": model_output,
140
+ "predicted_score": pred_score
141
+ }
142
+ final_output.append(result)
143
+
144
+ if pred_score is not None and gt_score is not None:
145
+ total_count += 1
146
+ if pred_score == gt_score:
147
+ correct_count += 1
148
+ true_positive += 1
149
+ else:
150
+ false_positive += 1
151
+ false_negative += 1
152
+
153
+ accuracy = correct_count / total_count if total_count > 0 else 0
154
+ precision = true_positive / (true_positive + false_positive) if (true_positive + false_positive) > 0 else 0
155
+ recall = true_positive / (true_positive + false_negative) if (true_positive + false_negative) > 0 else 0
156
+
157
+ # 添加准确率指标到最终输出
158
+ metrics = {
159
+ "accuracy": accuracy,
160
+ "precision": precision,
161
+ "recall": recall,
162
+ "correct_count": correct_count,
163
+ "total_count": total_count
164
+ }
165
+ final_output.append({"metrics": metrics})
166
+
167
+ logging.info("Saving results to %s", data_args.out_file)
168
+ with open(data_args.out_file, "w") as f:
169
+ json.dump(final_output, f, indent=2)
170
+
171
+ logging.info(f"Results saved successfully.")
172
+ logging.info(f"准确率: {accuracy:.4f} ({correct_count}/{total_count})")
173
+ logging.info(f"召回率: {recall:.4f}")
174
+ logging.info(f"精确率: {precision:.4f}")
175
+
176
+ if __name__ == "__main__":
177
+ main()
.ipynb_checkpoints/train-checkpoint.sh ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ CUDA_VISIBLE_DEVICES=0 swift sft \
3
+ --model /root/autodl-tmp/output_7B_FULL_4JOB/v2-20250621-170947/checkpoint-608 \
4
+ --dataset ./dataset_newCOTSFT1_filtered_90.0s_resampled_16000.jsonl \
5
+ --model_type qwen2_5_omni\
6
+ --train_type full \
7
+ --output_dir /root/autodl-tmp/output_7B_FULL_cotSFT \
8
+ --torch_dtype bfloat16 \
9
+ --learning_rate 1e-4 \
10
+ --num_train_epochs 2 \
11
+ --freeze_vit false \
12
+ --freeze_aligner false \
13
+ --per_device_train_batch_size 1 \
14
+ --per_device_eval_batch_size 1 \
15
+ # ...
16
+
17
+
18
+ #CUDA_VISIBLE_DEVICES=0 swift sft \
19
+ # --model /root/autodl-tmp/Qwen2.5-Omni-7B \
20
+ # --dataset /root/ms-swift/dataset_cotSFT.json \
21
+ # --model_type qwen2_5_omni\
22
+ # --train_type lora \
23
+ # --output_dir /root/autodl-tmp/output_7B_Lora_cotSFT \
24
+ # --torch_dtype bfloat16 \
25
+ # --learning_rate 1e-4 \
26
+ # --lora_rank 8 \
27
+ # --lora_alpha 32 \
28
+ # --target_modules all-linear \
29
+ # --num_train_epochs 3 \
30
+ # --freeze_vit false \
31
+ # --freeze_aligner false \
32
+ # --per_device_train_batch_size 3 \
33
+ # --per_device_eval_batch_size 1 \
34
+ # ...
35
+ # # 8*A100
36
+ # NPROC_PER_NODE=8 \
37
+ # CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
38
+ # swift pt \
39
+ # --model Qwen/Qwen2.5-7B \
40
+ # --dataset swift/chinese-c4 \
41
+ # --streaming true \
42
+ # --train_type full \
43
+ # --deepspeed zero2 \
44
+ # --output_dir output \
45
+ # --max_steps 10000 \
46
+ # ...
47
+
48
+
49
+
50
+ # --lora_rank 8 \
51
+ # --lora_alpha 32 \
52
+ # --target_modules all-linear \
53
+ # --gradient_accumulation_steps 16 \
54
+ # --eval_steps 50 \
55
+ # --save_steps 50 \
56
+ # --save_total_limit 2 \
57
+ # --logging_steps 5 \
58
+ # --max_length 2048 \
59
+ # --output_dir output \
60
+ # --system 'You are a helpful assistant.' \
61
+ # --warmup_ratio 0.05 \
62
+ # --dataloader_num_workers 4 \
63
+ # --model_author swift \
64
+ # --model_name swift-robot
4JOB/.ipynb_checkpoints/process_overlaps-checkpoint.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ import random
4
+
5
+ def seconds_to_mmss(seconds):
6
+ minutes = int(seconds // 60)
7
+ seconds = int(seconds % 60)
8
+ return f"{minutes:02d}:{seconds:02d}"
9
+
10
+ # Templates for overlap descriptions
11
+ OVERLAP_TEMPLATES = [
12
+ "An overlap where multiple speakers talk simultaneously for more than 3 seconds starts at {start} and ends at {end}.",
13
+ "The overlap starts at {start} and ends at {end}.",
14
+ "Multiple speakers talk simultaneously from {start} to {end}, an overlap lasting over three seconds.",
15
+ "A conversation overlap occurs between {start} and {end}, with multiple people speaking at once.",
16
+ "There is a significant overlap in the conversation from {start} to {end}, where speakers talk simultaneously.",
17
+ "During the period {start}-{end}, multiple participants speak at the same time.",
18
+ "A 3+ second overlap is detected between {start} and {end}, with concurrent speakers.",
19
+ "The conversation features overlapping speech from {start} to {end}.",
20
+ "Several speakers talk over each other between {start} and {end}.",
21
+ "An overlapping segment is identified from {start} to {end}, lasting more than 3 seconds."
22
+ ]
23
+
24
+ # Templates for no overlap case
25
+ NO_OVERLAP_TEMPLATES = [
26
+ "No significant overlaps found in this conversation.",
27
+ "The conversation shows no overlapping speech segments longer than 3 seconds.",
28
+ "No instances of multiple speakers talking simultaneously for more than 3 seconds were detected.",
29
+ "The dialogue proceeds without any significant overlaps between speakers.",
30
+ "No overlapping segments exceeding 3 seconds were identified in this conversation.",
31
+ "The speakers maintain clear turn-taking without significant overlaps.",
32
+ "This conversation features no substantial overlapping speech periods.",
33
+ "No overlapping speech segments of 3 seconds or longer were found.",
34
+ "The conversation flows smoothly without any significant speaker overlaps.",
35
+ "All speakers maintain clear speaking turns without substantial overlap."
36
+ ]
37
+
38
+ # Templates for correct overlap case
39
+ CORRECT_OVERLAP_TEMPLATES = [
40
+ "The conversation contains overlapping speech segments longer than 3 seconds.",
41
+ "The dialogue features instances where speakers overlap for more than 3 seconds.",
42
+ "The recording includes multiple overlapping exchanges exceeding 3 seconds.",
43
+ "There are clear overlaps in speech lasting beyond 3 seconds.",
44
+ "Speaker interruptions or overlaps exceed 3 seconds in duration.",
45
+ "Clear evidence shows speech overlaps extending beyond 3 seconds in duration.",
46
+ "Speaker overlap durations consistently breach the 3-second threshold.",
47
+ "Recorded overlaps between speakers routinely last longer than 3 seconds.",
48
+ "The interaction contains several instances where voices overlap for over 3 seconds."
49
+ ]
50
+ file = "silence"
51
+ def process_overlap_segments():
52
+ # Read the overlap_5s_716.json file
53
+ with open(f'{file}.json', 'r', encoding='utf-8') as f:
54
+ overlap_data = json.load(f)
55
+
56
+ # List to store results for all conversations
57
+ results = []
58
+
59
+ # Process each conversation
60
+ for conversation_id, conversation in overlap_data.items():
61
+ segments = conversation.get('segments', [])
62
+ overlap_periods = []
63
+ audio_path = conversation.get('stereo_audio', [])
64
+ # Find overlaps > 5s between segments
65
+ for i in range(len(segments) - 1):
66
+ current = segments[i]
67
+ next_segment = segments[i + 1]
68
+
69
+ # Calculate overlap
70
+ overlap_start = max(current['start_time'], next_segment['start_time'])
71
+ overlap_end = min(current['end_time'], next_segment['end_time'])
72
+ overlap_duration = overlap_end - overlap_start
73
+
74
+ # If overlap is more than 5 seconds
75
+ if overlap_duration >= 3:
76
+ overlap_periods.append(f"{seconds_to_mmss(overlap_start)}-{seconds_to_mmss(overlap_end)}")
77
+
78
+ # Create result entry with random template
79
+ if overlap_periods:
80
+ # For each overlap period, select a random template
81
+ overlap_descriptions = []
82
+ for period in overlap_periods:
83
+ start, end = period.split('-')
84
+ template = random.choice(OVERLAP_TEMPLATES)
85
+ overlap_descriptions.append(template.format(start=start, end=end))
86
+ model_output = " ".join(overlap_descriptions)
87
+ else:
88
+ model_output = random.choice(NO_OVERLAP_TEMPLATES)
89
+
90
+ result = {
91
+ "key": conversation_id,
92
+ "audio_url": audio_path,
93
+ "model_output": model_output
94
+ }
95
+ results.append(result)
96
+
97
+ # Save the results to a JSON file
98
+ output_file = f'{file}_overlapgap.json'
99
+ with open(output_file, 'w', encoding='utf-8') as f:
100
+ json.dump(results, f, indent=2, ensure_ascii=False)
101
+
102
+ print(f"Processed {len(results)} conversations")
103
+ print(f"Results written to {output_file}")
104
+
105
+ if __name__ == "__main__":
106
+ process_overlap_segments()
4JOB/filter.py ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ from datetime import datetime
4
+
5
+ def filter_by_duration(input_file, output_file, min_duration=30, max_duration=90):
6
+ """
7
+ 过滤JSON文件,只保留total_duration在[min_duration, max_duration]范围内的条目
8
+ 并记录被删除的文件信息到日志文件
9
+
10
+ :param input_file: 输入JSON文件路径
11
+ :param output_file: 输出JSON文件路径
12
+ :param min_duration: 最小持续时间(秒)
13
+ :param max_duration: 最大持续时间(秒)
14
+ """
15
+ # 创建日志目录
16
+ log_dir = os.path.join(os.path.dirname(output_file), "filter_logs")
17
+ if not os.path.exists(log_dir):
18
+ os.makedirs(log_dir)
19
+
20
+ # 创建日志文件(以当前时间命名)
21
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
22
+ log_file = os.path.join(log_dir, f"removed_entries_{timestamp}.log")
23
+
24
+ # 加载原始JSON文件
25
+ with open(input_file, 'r', encoding='utf-8') as f:
26
+ data = json.load(f)
27
+
28
+ # 初始化过滤结果和删除列表
29
+ filtered_data = {}
30
+ removed_entries = []
31
+
32
+ # 过滤数据并记录被删除的条目
33
+ for key, value in data.items():
34
+ if 'total_duration' in value and min_duration <= value['total_duration'] <= max_duration:
35
+ filtered_data[key] = value
36
+ else:
37
+ duration = value.get('total_duration', 'N/A')
38
+ removed_entries.append({
39
+ 'key': key,
40
+ 'duration': duration,
41
+ 'original_dialog_id': value.get('original_dialog_id', 'N/A'),
42
+ 'reason': 'too_short' if isinstance(duration, (int, float)) and duration < min_duration
43
+ else 'too_long' if isinstance(duration, (int, float)) and duration > max_duration
44
+ else 'missing_or_invalid'
45
+ })
46
+
47
+ # 保存过滤后的结果
48
+ with open(output_file, 'w', encoding='utf-8') as f:
49
+ json.dump(filtered_data, f, indent=2, ensure_ascii=False)
50
+
51
+ # 保存删除日志
52
+ with open(log_file, 'w', encoding='utf-8') as f:
53
+ f.write(f"Filtering log - {timestamp}\n")
54
+ f.write(f"Input file: {input_file}\n")
55
+ f.write(f"Output file: {output_file}\n")
56
+ f.write(f"Duration range: {min_duration}s to {max_duration}s\n\n")
57
+ f.write("Removed Entries:\n")
58
+ f.write("="*50 + "\n")
59
+ for entry in removed_entries:
60
+ f.write(f"Key: {entry['key']}\n")
61
+ f.write(f"Original Dialog ID: {entry['original_dialog_id']}\n")
62
+ f.write(f"Duration: {entry['duration']}s\n")
63
+ f.write(f"Reason: {entry['reason']}\n")
64
+ f.write("-"*50 + "\n")
65
+
66
+ print(f"\n处理结果: {os.path.basename(input_file)}")
67
+ print(f"原始条目数: {len(data)}")
68
+ print(f"过滤后条目数: {len(filtered_data)}")
69
+ print(f"已删除 {len(removed_entries)} 个不符合时长要求的条目")
70
+ print(f"过滤后的数据已保存到: {output_file}")
71
+ print(f"删除条目日志已保存到: {log_file}")
72
+
73
+ def process_directory(input_dir, output_dir, min_duration=30, max_duration=90):
74
+ """
75
+ 处理目录中的所有JSON文件
76
+ """
77
+ if not os.path.exists(output_dir):
78
+ os.makedirs(output_dir)
79
+
80
+ # 创建总日志文件
81
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
82
+ summary_log = os.path.join(output_dir, f"summary_removed_entries_{timestamp}.log")
83
+
84
+ total_removed = 0
85
+ total_processed = 0
86
+
87
+ with open(summary_log, 'w', encoding='utf-8') as summary_f:
88
+ summary_f.write(f"Summary Filtering Log - {timestamp}\n")
89
+ summary_f.write(f"Input directory: {input_dir}\n")
90
+ summary_f.write(f"Output directory: {output_dir}\n")
91
+ summary_f.write(f"Duration range: {min_duration}s to {max_duration}s\n\n")
92
+
93
+ for filename in os.listdir(input_dir):
94
+ if filename.endswith('.json'):
95
+ input_path = os.path.join(input_dir, filename)
96
+ output_path = os.path.join(output_dir, filename)
97
+
98
+ print(f"\n处理文件: {filename}")
99
+ filter_by_duration(input_path, output_path, min_duration, max_duration)
100
+
101
+ # 读取单个文件日志以获取统计信息
102
+ log_dir = os.path.join(output_dir, "filter_logs")
103
+ latest_log = max(
104
+ [f for f in os.listdir(log_dir) if f.startswith('removed_entries')],
105
+ key=lambda f: os.path.getmtime(os.path.join(log_dir, f)))
106
+
107
+ with open(os.path.join(log_dir, latest_log), 'r', encoding='utf-8') as log_f:
108
+ log_content = log_f.read()
109
+ removed_count = log_content.count("Key: ")
110
+
111
+ summary_f.write(f"\nFile: {filename}\n")
112
+ summary_f.write(f"Removed entries: {removed_count}\n")
113
+ summary_f.write("-"*40 + "\n")
114
+
115
+ total_removed += removed_count
116
+ total_processed += 1
117
+
118
+ summary_f.write(f"\nTotal files processed: {total_processed}\n")
119
+ summary_f.write(f"Total entries removed: {total_removed}\n")
120
+
121
+ print(f"\n处理完成!所有文件的总日志已保存到: {summary_log}")
122
+
123
+ if __name__ == "__main__":
124
+ # 使用示例 - 处理单个文件
125
+ input_json = "silence.json" # 替换为你的输入文件路径
126
+ output_json = "silence_filtered_output.json" # 输出文件路径
127
+ filter_by_duration(input_json, output_json)
128
+
129
+ # 使用示例 - 处理整个目录
130
+ # input_directory = "./input_4JOB_overlap" # 替换为你的输入目录
131
+ # output_directory = "./filtered_output" # 替换为你的输出目录
132
+ # process_directory(input_directory, output_directory)
4JOB/process_overlaps.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ import random
4
+
5
+ def seconds_to_mmss(seconds):
6
+ minutes = int(seconds // 60)
7
+ seconds = int(seconds % 60)
8
+ return f"{minutes:02d}:{seconds:02d}"
9
+
10
+ # Templates for overlap descriptions
11
+ OVERLAP_TEMPLATES = [
12
+ "An overlap where multiple speakers talk simultaneously for more than 3 seconds starts at {start} and ends at {end}.",
13
+ "The overlap starts at {start} and ends at {end}.",
14
+ "Multiple speakers talk simultaneously from {start} to {end}, an overlap lasting over three seconds.",
15
+ "A conversation overlap occurs between {start} and {end}, with multiple people speaking at once.",
16
+ "There is a significant overlap in the conversation from {start} to {end}, where speakers talk simultaneously.",
17
+ "During the period {start}-{end}, multiple participants speak at the same time.",
18
+ "A 3+ second overlap is detected between {start} and {end}, with concurrent speakers.",
19
+ "The conversation features overlapping speech from {start} to {end}.",
20
+ "Several speakers talk over each other between {start} and {end}.",
21
+ "An overlapping segment is identified from {start} to {end}, lasting more than 3 seconds."
22
+ ]
23
+
24
+ # Templates for no overlap case
25
+ NO_OVERLAP_TEMPLATES = [
26
+ "No significant overlaps found in this conversation.",
27
+ "The conversation shows no overlapping speech segments longer than 3 seconds.",
28
+ "No instances of multiple speakers talking simultaneously for more than 3 seconds were detected.",
29
+ "The dialogue proceeds without any significant overlaps between speakers.",
30
+ "No overlapping segments exceeding 3 seconds were identified in this conversation.",
31
+ "The speakers maintain clear turn-taking without significant overlaps.",
32
+ "This conversation features no substantial overlapping speech periods.",
33
+ "No overlapping speech segments of 3 seconds or longer were found.",
34
+ "The conversation flows smoothly without any significant speaker overlaps.",
35
+ "All speakers maintain clear speaking turns without substantial overlap."
36
+ ]
37
+
38
+ # Templates for correct overlap case
39
+ CORRECT_OVERLAP_TEMPLATES = [
40
+ "The conversation contains overlapping speech segments longer than 3 seconds.",
41
+ "The dialogue features instances where speakers overlap for more than 3 seconds.",
42
+ "The recording includes multiple overlapping exchanges exceeding 3 seconds.",
43
+ "There are clear overlaps in speech lasting beyond 3 seconds.",
44
+ "Speaker interruptions or overlaps exceed 3 seconds in duration.",
45
+ "Clear evidence shows speech overlaps extending beyond 3 seconds in duration.",
46
+ "Speaker overlap durations consistently breach the 3-second threshold.",
47
+ "Recorded overlaps between speakers routinely last longer than 3 seconds.",
48
+ "The interaction contains several instances where voices overlap for over 3 seconds."
49
+ ]
50
+ file = "silence"
51
+ def process_overlap_segments():
52
+ # Read the overlap_5s_716.json file
53
+ with open(f'{file}.json', 'r', encoding='utf-8') as f:
54
+ overlap_data = json.load(f)
55
+
56
+ # List to store results for all conversations
57
+ results = []
58
+
59
+ # Process each conversation
60
+ for conversation_id, conversation in overlap_data.items():
61
+ segments = conversation.get('segments', [])
62
+ overlap_periods = []
63
+ audio_path = conversation.get('stereo_audio', [])
64
+ # Find overlaps > 5s between segments
65
+ for i in range(len(segments) - 1):
66
+ current = segments[i]
67
+ next_segment = segments[i + 1]
68
+
69
+ # Calculate overlap
70
+ overlap_start = max(current['start_time'], next_segment['start_time'])
71
+ overlap_end = min(current['end_time'], next_segment['end_time'])
72
+ overlap_duration = overlap_end - overlap_start
73
+
74
+ # If overlap is more than 5 seconds
75
+ if overlap_duration >= 3:
76
+ overlap_periods.append(f"{seconds_to_mmss(overlap_start)}-{seconds_to_mmss(overlap_end)}")
77
+
78
+ # Create result entry with random template
79
+ if overlap_periods:
80
+ # For each overlap period, select a random template
81
+ overlap_descriptions = []
82
+ for period in overlap_periods:
83
+ start, end = period.split('-')
84
+ template = random.choice(OVERLAP_TEMPLATES)
85
+ overlap_descriptions.append(template.format(start=start, end=end))
86
+ model_output = " ".join(overlap_descriptions)
87
+ else:
88
+ model_output = random.choice(NO_OVERLAP_TEMPLATES)
89
+
90
+ result = {
91
+ "key": conversation_id,
92
+ "audio_url": audio_path,
93
+ "model_output": model_output
94
+ }
95
+ results.append(result)
96
+
97
+ # Save the results to a JSON file
98
+ output_file = f'{file}_overlapgap.json'
99
+ with open(output_file, 'w', encoding='utf-8') as f:
100
+ json.dump(results, f, indent=2, ensure_ascii=False)
101
+
102
+ print(f"Processed {len(results)} conversations")
103
+ print(f"Results written to {output_file}")
104
+
105
+ if __name__ == "__main__":
106
+ process_overlap_segments()
4JOB/process_silence.py ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ import random
4
+
5
+ def seconds_to_mmss(seconds):
6
+ minutes = int(seconds // 60)
7
+ seconds = int(seconds % 60)
8
+ return f"{minutes:02d}:{seconds:02d}"
9
+
10
+ # Templates for silence gap descriptions
11
+ SILENCE_TEMPLATES = [
12
+ "Silence gaps longer than 3 seconds occur at: {gaps}",
13
+ "The conversation contains significant pauses at: {gaps}",
14
+ "There are silent periods of more than 3 seconds at: {gaps}",
15
+ "The dialogue features extended pauses at: {gaps}",
16
+ "Silent intervals exceeding 3 seconds are found at: {gaps}",
17
+ "The conversation includes notable gaps at: {gaps}",
18
+ "Extended periods of silence occur at: {gaps}",
19
+ "The dialogue has significant breaks at: {gaps}",
20
+ "Silent segments longer than 3 seconds appear at: {gaps}",
21
+ "The conversation shows substantial pauses at: {gaps}"
22
+ ]
23
+
24
+ # Templates for no silence case
25
+ NO_SILENCE_TEMPLATES = [
26
+ "No silence gaps longer than 3 seconds were found in this conversation.",
27
+ "The conversation flows continuously without significant pauses.",
28
+ "No extended periods of silence were detected in this dialogue.",
29
+ "The conversation maintains a steady pace without notable gaps.",
30
+ "No silent intervals exceeding 3 seconds were identified.",
31
+ "The dialogue proceeds without substantial pauses.",
32
+ "No significant breaks in conversation were observed.",
33
+ "The conversation shows no extended silent periods.",
34
+ "No notable gaps in speech were detected.",
35
+ "The dialogue continues without significant silent intervals."
36
+ ]
37
+ file = "silence"
38
+ def process_silence_gaps():
39
+ # Read the overlap_5s_716.json file
40
+ with open(f'{file}.json', 'r', encoding='utf-8') as f:
41
+ silence_data = json.load(f)
42
+
43
+ # List to store results for all conversations
44
+ results = []
45
+
46
+ # Process each conversation
47
+ for conversation_id, conversation in silence_data.items():
48
+ segments = conversation.get('segments', [])
49
+ audio_path = conversation.get('stereo_audio', [])
50
+ silence_gaps = []
51
+
52
+ # Find silence gaps > 3s between segments
53
+ for i in range(len(segments) - 1):
54
+ current_end = segments[i]['end_time']
55
+ next_start = segments[i + 1]['start_time']
56
+ gap_duration = next_start - current_end
57
+
58
+ if gap_duration > 3:
59
+ silence_gaps.append(f"{seconds_to_mmss(current_end)}-{seconds_to_mmss(next_start)}")
60
+
61
+ # Create result entry with random template
62
+ if silence_gaps:
63
+ template = random.choice(SILENCE_TEMPLATES)
64
+ model_output = template.format(gaps=', '.join(silence_gaps))
65
+ else:
66
+ model_output = random.choice(NO_SILENCE_TEMPLATES)
67
+
68
+ result = {
69
+ "key": conversation_id,
70
+ "audio_url": audio_path,
71
+ "model_output": model_output
72
+ }
73
+ results.append(result)
74
+
75
+ # Save the results to a JSON file
76
+ output_file = f'{file}_silencegap.json'
77
+ with open(output_file, 'w', encoding='utf-8') as f:
78
+ json.dump(results, f, indent=2, ensure_ascii=False)
79
+
80
+ print(f"Processed {len(results)} conversations")
81
+ print(f"Results written to {output_file}")
82
+
83
+ if __name__ == "__main__":
84
+ process_silence_gaps()
4JOB/process_transcription.py ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+
3
+ def seconds_to_mmss(seconds):
4
+ minutes = int(seconds // 60)
5
+ seconds = int(seconds % 60)
6
+ return f"{minutes:02d}:{seconds:02d}"
7
+
8
+ filename = "silence"
9
+ def is_overlapping(current_segment, other_segments):
10
+ """Check if the current segment overlaps with any other segment."""
11
+ current_start = current_segment['start_time']
12
+ current_end = current_segment['end_time']
13
+
14
+ for segment in other_segments:
15
+ if segment == current_segment:
16
+ continue
17
+
18
+ other_start = segment['start_time']
19
+ other_end = segment['end_time']
20
+
21
+ # Check if there's an overlap
22
+ if (current_start < other_end and current_end > other_start):
23
+ return True
24
+
25
+ return False
26
+
27
+ def process_transcriptions():
28
+ # Read the overlap_5s_716.json file
29
+ with open(f'./{filename}.json', 'r', encoding='utf-8') as f:
30
+ data = json.load(f)
31
+
32
+ # List to store results for all conversations
33
+ results = []
34
+
35
+ # Process each conversation
36
+ for conversation_id, conversation in data.items():
37
+ segments = conversation.get('segments', [])
38
+ audio_path = conversation.get('stereo_audio', [])
39
+ # Sort segments by start time
40
+ segments.sort(key=lambda x: x['start_time'])
41
+
42
+ # Process each segment and create transcription lines
43
+ transcription_lines = []
44
+
45
+ for segment in segments:
46
+ speaker = segment['speaker']
47
+ start_time = segment['start_time']
48
+ end_time = segment['end_time']
49
+ text = segment['text']
50
+ original_text = segment['original_text']
51
+ original_text = original_text.replace("[interrupt] ", "").strip()
52
+ # Format timestamp
53
+ timestamp = f"[{seconds_to_mmss(start_time)} - {seconds_to_mmss(end_time)}]"
54
+
55
+ # Check if this segment overlaps with any other segment
56
+ has_overlap = is_overlapping(segment, segments)
57
+
58
+ # Format the line
59
+ if has_overlap:
60
+ line = f"{timestamp} Speaker {speaker}: {original_text}"
61
+ else:
62
+ line = f"{timestamp} Speaker {speaker}: {text}"
63
+
64
+ transcription_lines.append(line)
65
+
66
+ # Create result entry
67
+ result = {
68
+ "key": conversation_id,
69
+ "audio_url": audio_path,
70
+ "model_output": "\n".join(transcription_lines)
71
+ }
72
+ results.append(result)
73
+
74
+ # Save the results to a JSON file
75
+ output_file = f'./{filename}_transcription.json'
76
+ with open(output_file, 'w', encoding='utf-8') as f:
77
+ json.dump(results, f, indent=2, ensure_ascii=False)
78
+
79
+ if __name__ == "__main__":
80
+ process_transcriptions()