File size: 2,009 Bytes
614e9c1
1a53b21
 
 
 
 
 
 
 
614e9c1
 
1a53b21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
614e9c1
 
1a53b21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: apache-2.0
language:
- zh
- en
base_model:
- google-bert/bert-base-multilingual-cased
tags:
- agent
---

<div align="center">
<h1>FireRedChat-turn-detector</h1>
</div>

<div align="center">
  <a href="https://fireredteam.github.io/demos/firered_chat/">Demo</a><a href="https://arxiv.org/pdf/2509.06502">Paper</a><a href="https://huggingface.co/FireRedTeam">Huggingface</a>
</div>


## Descriptions

Compact end-of-turn detection used in FireRedChat. [livekit plugin available here](https://github.com/fireredchat-submodules/livekit-plugins-fireredchat-turn-detector)
- chinese_best_model_q8.onnx: FireRedChat turn-detector model (Chinese only)
- multilingual_best_model_q8.onnx: FireRedChat turn-detector model (Chinese and English)

## Roadmap

- [x] 2025/09
  - [x] Release the onnx checkpoints and livekit plugin.

## Usage
```python
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

session = ort.InferenceSession(
    "chinese_best_model_q8.onnx", providers=["CPUExecutionProvider"]
)

tokenizer = AutoTokenizer.from_pretrained(
    "./tokenizer",
    local_files_only=True,
    truncation_side="left"
)

text = "这是一句没有标点的文本"
inputs = tokenizer(
            text,
            truncation=True,
            padding='max_length',
            add_special_tokens=False,
            return_tensors="np",
            max_length=128,
        )
# Run inference
outputs = session.run(None, 
                      {
                          "input_ids": inputs["input_ids"].astype("int64"), 
                          "attention_mask": inputs["attention_mask"].astype("int64")
                      })
eou_probability = softmax(outputs[0]).flatten()[-1]
print(eou_probability, eou_probability>0.5)
```

### Acknowledgment
- Base model: google-bert/bert-base-multilingual-cased (license: "apache-2.0")