latishab commited on
Commit
1ddc8f6
Β·
verified Β·
1 Parent(s): c0ed7ed

update: readme.md

Browse files
Files changed (1) hide show
  1. README.md +39 -39
README.md CHANGED
@@ -7,38 +7,51 @@ base_model:
7
  pipeline_tag: text-classification
8
  datasets:
9
  - latishab/turns-2k
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
  # Turnsense: Turn-Detector Model
12
 
13
- A lightweight end-of-utterance (EOU) detection model fine-tuned on SmolLM2-135M, optimized for Raspberry Pi and low-power devices. The model is trained on TURNS-2K, a diverse dataset designed to capture various Speech-to-Text (STT) output patterns, including backchannels, mispronunciations, code-switching, and different text formatting styles. This makes the model robust across different STT systems and their output variations.
14
 
15
- ## πŸ”‘ Key Features
16
 
17
- - **Lightweight Architecture**: Built on SmolLM2-135M (~135M parameters)
18
- - **High Performance**: 97.50% accuracy (standard) / 93.75% accuracy (quantized)
19
- - **Resource Efficient**: Optimized for edge devices and low-power hardware
20
- - **ONNX Support**: Compatible with ONNX Runtime and Hugging Face Transformers
21
 
22
- ## πŸ“Š Performance Metrics
23
 
24
- The model demonstrates robust performance across different configurations:
25
 
26
- - **Standard Model**: 97.50% accuracy
27
- - **Quantized Model**: 93.75% accuracy
28
- - **Average Probability Difference**: 0.0323 between versions
29
 
30
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c903f100104ea998d9fccf/UPeoiiuCSunFZhMg-pDu8.png)
31
 
32
- ### Speed Performance
33
 
34
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c903f100104ea998d9fccf/JT8a4i7Pl60_gnna4CYwe.png)
35
 
36
- ## πŸ”Ή Installation
37
  ```bash
38
  pip install transformers onnxruntime numpy huggingface_hub
39
  ```
40
 
41
- ## πŸš€ Quick Start
42
 
43
  ```python
44
  import onnxruntime as ort
@@ -76,43 +89,30 @@ print(f"Text: '{text}'")
76
  print(f"Prediction (0 or 1): {prediction}")
77
  ```
78
 
79
- ## πŸ“š Dataset: TURNS-2K
80
 
81
- The model is trained on TURNS-2K, a comprehensive dataset specifically designed for end-of-utterance detection. It captures diverse speech patterns including:
82
 
83
  - Backchannels and self-corrections
84
  - Code-switching and language mixing
85
  - Multiple text formatting styles
86
- - Speech-to-Text (STT) output variations
87
 
88
- This diverse training data ensures robustness across different:
89
- - Speech patterns and dialects
90
- - STT systems and their output formats
91
- - Use cases and deployment scenarios
92
 
93
- ## πŸ’­ Motivation & Current State
94
 
95
- The inspiration for Turnsense came from a notable gap in the open-source AI landscape - the scarcity of efficient, lightweight turn detection models. While building a local conversational AI agent, I found that most available solutions were either proprietary or too resource-intensive for edge devices. This led to the development of Turnsense, a practical solution designed specifically for real-world deployment on hardware like Raspberry Pi.
96
 
97
- Currently, the model is trained primarily on English speech patterns using a modest dataset of 2,000 samples through LoRA fine-tuning on SmolLM2-135M. While it handles common speech-to-text outputs effectively, there are certainly edge cases and complex conversational patterns yet to be addressed. The choice of ONNX format was deliberate, prioritizing compatibility with low-power devices, though we're exploring potential ports to platforms like Apple MLX.
 
98
 
99
- The project's success relies heavily on community involvement. Whether it's expanding the dataset, adding multilingual support, or improving pattern recognition for complex conversational scenarios, contributions of all kinds can help evolve Turnsense into a more robust and versatile tool.
100
 
101
- ## πŸ“„ License
102
- This project is licensed under the Apache 2.0 License.
103
 
104
- ## 🀝 Contributing
105
-
106
- Contributions are welcome! Areas where you can help:
107
- - Dataset expansion
108
- - Model optimization
109
- - Documentation improvements
110
- - Bug reports and fixes
111
-
112
- Please feel free to submit a Pull Request or open an Issue.
113
-
114
- ## πŸ“š Citation
115
- If you use this model in your research, please cite it using:
116
 
117
  ```bibtex
118
  @software{latishab2025turnsense,
 
7
  pipeline_tag: text-classification
8
  datasets:
9
  - latishab/turns-2k
10
+ model-index:
11
+ - name: Turnsense
12
+ results:
13
+ - task:
14
+ type: text-classification
15
+ name: End-of-Utterance Detection
16
+ metrics:
17
+ - name: Accuracy (Standard)
18
+ type: accuracy
19
+ value: 97.50
20
+ - name: Accuracy (Quantized)
21
+ type: accuracy
22
+ value: 93.75
23
  ---
24
  # Turnsense: Turn-Detector Model
25
 
26
+ A lightweight end-of-utterance (EOU) detection model fine-tuned on SmolLM2-135M, optimized for Raspberry Pi and low-power devices. Trained on TURNS-2K, a dataset designed to cover various STT output patterns including backchannels, mispronunciations, code-switching, and different text formatting styles. This makes the model work well across different STT systems.
27
 
28
+ ## Key Features
29
 
30
+ - **Lightweight**: Built on SmolLM2-135M (~135M parameters)
31
+ - **High accuracy**: 97.50% (standard) / 93.75% (quantized)
32
+ - **Edge-ready**: Runs on Raspberry Pi and similar hardware
33
+ - **ONNX support**: Works with ONNX Runtime and Hugging Face Transformers
34
 
35
+ ## Performance
36
 
37
+ The model holds up well across configurations:
38
 
39
+ - **Standard model**: 97.50% accuracy
40
+ - **Quantized model**: 93.75% accuracy
41
+ - **Average probability difference**: 0.0323 between versions
42
 
43
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c903f100104ea998d9fccf/UPeoiiuCSunFZhMg-pDu8.png)
44
 
45
+ ### Speed
46
 
47
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c903f100104ea998d9fccf/JT8a4i7Pl60_gnna4CYwe.png)
48
 
49
+ ## Installation
50
  ```bash
51
  pip install transformers onnxruntime numpy huggingface_hub
52
  ```
53
 
54
+ ## Quick Start
55
 
56
  ```python
57
  import onnxruntime as ort
 
89
  print(f"Prediction (0 or 1): {prediction}")
90
  ```
91
 
92
+ ## Dataset: TURNS-2K
93
 
94
+ The model is trained on TURNS-2K, a dataset built for end-of-utterance detection. It covers:
95
 
96
  - Backchannels and self-corrections
97
  - Code-switching and language mixing
98
  - Multiple text formatting styles
99
+ - Variations in STT output across different systems
100
 
101
+ ## Motivation and current state
 
 
 
102
 
103
+ I built Turnsense because I couldn't find a good open-source turn detection model for edge devices. Most options were either proprietary or too heavy to run on something like a Raspberry Pi.
104
 
105
+ The model is trained on English speech patterns using 2,000 samples via LoRA fine-tuning on SmolLM2-135M. It handles common STT outputs well, but there are edge cases and complex conversational patterns it doesn't cover yet. ONNX was a deliberate choice for device compatibility, though a port to Apple MLX is on the table.
106
 
107
+ ## License
108
+ Apache 2.0. See the LICENSE file for details.
109
 
110
+ ## Contributing
111
 
112
+ Contributions are welcome. Some areas that could use help: dataset expansion, model optimization, documentation, and bug reports. Feel free to open a PR or issue.
 
113
 
114
+ ## Citation
115
+ If you use this model in your research:
 
 
 
 
 
 
 
 
 
 
116
 
117
  ```bibtex
118
  @software{latishab2025turnsense,