zrj619 commited on
Commit
f7ebad3
·
verified ·
1 Parent(s): f50ece4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -3
README.md CHANGED
@@ -1,3 +1,142 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## TEN Turn Detection
2
+ Turn detection for full-duplex dialogue communication
3
+
4
+ ## Introduction
5
+
6
+ TEN Turn Detection is an advanced intelligent turn detection model designed specifically for natural and dynamic communication between humans and AI agents. This technology addresses one of the most challenging aspects of human-AI conversation: detecting natural turn-taking cues and enabling contextually-aware interruptions. TEN incorporates deep semantic understanding of conversation context, rhythm, intonation, and linguistic patterns to create more natural dialogue with AI.
7
+ <div align="center">
8
+ <img src="images/turn_detection.svg" alt="TEN Turn Detection SVG Diagram" width="800"/>
9
+ </div>
10
+
11
+ TEN Turn Detection categorizes user's text into three key states:
12
+
13
+ Finished: A finished utterance where the user has expressed a complete thought and expects a response. Example: "Hey there I was wondering can you help me with my order"
14
+
15
+ Wait: An ambiguous utterance where the system cannot confidently determine if more speech will follow. Example: "This conversation needs to end now"
16
+
17
+ Unfinished: A clearly unfinished utterance where the user has momentarily paused but intends to continue speaking. Example: "Hello I have a question about"
18
+
19
+ These three classification states allow the TEN system to create natural conversation dynamics by intelligently managing turn-taking, reducing awkward interruptions while maintaining conversation flow.
20
+
21
+ TEN Turn Detection utilizes a multi-layered approach based on the transformer-based language model(Qwen2.5-7B) for semantic analysis.
22
+
23
+ ## Key Features
24
+
25
+ - **Context-Aware Turn Management**
26
+ TEN Turn Detection analyzes linguistic patterns and semantic context to accurately identify turn completion points. This capability enables intelligent interruption handling, allowing the system to determine when interruptions are contextually appropriate while maintaining natural conversation flow across various dialogue scenarios.
27
+
28
+ - **Multilingual Turn Detection Support**
29
+ TEN Turn Detection provides comprehensive support for both English and Chinese languages. It is engineered to accurately identify turn-taking cues and completion signals across multilingual conversations.
30
+
31
+ - **Superior Performance**
32
+ Compared with multiple open-source solutions, TEN achieves superior performance across all metrics on our publicly available test dataset.
33
+
34
+ ## Prepared Dataset
35
+ We have open-sourced the TEN-Turn-Detection TestSet, a bilingual (Chinese and English) collection of conversational inputs specifically designed to evaluate turn detection capabilities in AI dialogue systems. The dataset consists of three distinct components:
36
+
37
+ wait.txt: Contains expressions requesting conversation pauses or termination
38
+
39
+ unfinished.txt: Features incomplete dialogue inputs with truncated utterances
40
+
41
+ finished.txt: Provides complete conversational inputs across multiple domains
42
+
43
+
44
+ ## Detection Performance
45
+
46
+ We conducted comprehensive evaluations comparing several open-source models for turn detection using our test dataset:
47
+
48
+ <div align="center">
49
+
50
+
51
+ | LANGUAGE | MODEL | FINISHED<br>ACCURACY | UNFINISHED<br>ACCURACY | WAIT<br>ACCURACY |
52
+ |:--------:|:-----:|:--------------------:|:----------------------:|:----------------:|
53
+ | English | Model A | **59.74%** | **86.46%** | *N/A* |
54
+ | English | Model B | **71.61%** | **96.88%** | *N/A* |
55
+ | English | **TEN Turn Detection** | **90.64%** | **98.44%** | **91%** |
56
+
57
+
58
+
59
+
60
+ | LANGUAGE | MODEL | FINISHED<br>ACCURACY | UNFINISHED<br>ACCURACY | WAIT<br>ACCURACY |
61
+ |:--------:|:-----:|:--------------------:|:----------------------:|:----------------:|
62
+ | Chinese | Model B | **74.63%** | **88.89%** | *N/A* |
63
+ | Chinese | **TEN Turn Detection** | **98.90%** | **92.74%** | **92%** |
64
+
65
+
66
+ </div>
67
+
68
+ > **Notes:**
69
+ > 1. Model A doesn't support Chinese language processing
70
+ > 2. Neither Model A nor Model B support the "WAIT" state detection
71
+
72
+ ## Quick Start
73
+
74
+
75
+ ### Model Weights
76
+
77
+ The TEN Turn Detection model is available on HuggingFace:
78
+ - Model Repository: [TEN-framework/TEN_Turn_Detection](https://huggingface.co/TEN-framework/TEN_Turn_Detection)
79
+
80
+ ### Inference
81
+
82
+ The inference script accepts command line arguments for system prompt and user input:
83
+
84
+ ```python
85
+ from transformers import AutoTokenizer, AutoModelForCausalLM
86
+ import torch
87
+
88
+ # Load model and tokenizer
89
+ model_id = 'TEN-framework/TEN_Turn_Detection'
90
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.bfloat16)
91
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
92
+
93
+ # Move model to GPU
94
+ model = model.cuda()
95
+ model.eval()
96
+
97
+ # Function for inference
98
+ def analyze_text(text, system_prompt=""):
99
+ inf_messages = [{"role":"system", "content":system_prompt}] + [{"role":"user", "content":text}]
100
+ input_ids = tokenizer.apply_chat_template(
101
+ inf_messages,
102
+ add_generation_prompt=True,
103
+ return_tensors="pt"
104
+ ).cuda()
105
+
106
+ with torch.no_grad():
107
+ outputs = model.generate(
108
+ input_ids,
109
+ max_new_tokens=1,
110
+ do_sample=True,
111
+ top_p=0.1,
112
+ temperature=0.1,
113
+ pad_token_id=tokenizer.eos_token_id
114
+ )
115
+
116
+ response = outputs[0][input_ids.shape[-1]:]
117
+ return tokenizer.decode(response, skip_special_tokens=True)
118
+
119
+ # Example usage
120
+ text = "Hello I have a question about"
121
+ result = analyze_text(text)
122
+ print(f"Input: '{text}'")
123
+ print(f"Turn Detection Result: '{result}'")
124
+ ```
125
+
126
+ ## Citation
127
+ If you use TEN Turn Detection in your research or applications, please cite:
128
+
129
+ ```
130
+ @misc{TEN_Turn_Detection,
131
+ author = {TEN Team},
132
+ title = {TEN Turn Detection: Turn detection for full-duplex dialogue communication
133
+
134
+ },
135
+ year = {2025},
136
+ url = {https://github.com/TEN-framework/ten-turn-detection},
137
+ }
138
+ ```
139
+ ## License
140
+ This project is Apache 2.0 licensed.
141
+
142
+