update: readme.md
Browse files
README.md
CHANGED
|
@@ -7,38 +7,51 @@ base_model:
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
datasets:
|
| 9 |
- latishab/turns-2k
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
# Turnsense: Turn-Detector Model
|
| 12 |
|
| 13 |
-
A lightweight end-of-utterance (EOU) detection model fine-tuned on SmolLM2-135M, optimized for Raspberry Pi and low-power devices.
|
| 14 |
|
| 15 |
-
##
|
| 16 |
|
| 17 |
-
- **Lightweight
|
| 18 |
-
- **High
|
| 19 |
-
- **
|
| 20 |
-
- **ONNX
|
| 21 |
|
| 22 |
-
##
|
| 23 |
|
| 24 |
-
The model
|
| 25 |
|
| 26 |
-
- **Standard
|
| 27 |
-
- **Quantized
|
| 28 |
-
- **Average
|
| 29 |
|
| 30 |

|
| 31 |
|
| 32 |
-
### Speed
|
| 33 |
|
| 34 |

|
| 35 |
|
| 36 |
-
##
|
| 37 |
```bash
|
| 38 |
pip install transformers onnxruntime numpy huggingface_hub
|
| 39 |
```
|
| 40 |
|
| 41 |
-
##
|
| 42 |
|
| 43 |
```python
|
| 44 |
import onnxruntime as ort
|
|
@@ -76,43 +89,30 @@ print(f"Text: '{text}'")
|
|
| 76 |
print(f"Prediction (0 or 1): {prediction}")
|
| 77 |
```
|
| 78 |
|
| 79 |
-
##
|
| 80 |
|
| 81 |
-
The model is trained on TURNS-2K, a
|
| 82 |
|
| 83 |
- Backchannels and self-corrections
|
| 84 |
- Code-switching and language mixing
|
| 85 |
- Multiple text formatting styles
|
| 86 |
-
-
|
| 87 |
|
| 88 |
-
|
| 89 |
-
- Speech patterns and dialects
|
| 90 |
-
- STT systems and their output formats
|
| 91 |
-
- Use cases and deployment scenarios
|
| 92 |
|
| 93 |
-
|
| 94 |
|
| 95 |
-
The
|
| 96 |
|
| 97 |
-
|
|
|
|
| 98 |
|
| 99 |
-
|
| 100 |
|
| 101 |
-
|
| 102 |
-
This project is licensed under the Apache 2.0 License.
|
| 103 |
|
| 104 |
-
##
|
| 105 |
-
|
| 106 |
-
Contributions are welcome! Areas where you can help:
|
| 107 |
-
- Dataset expansion
|
| 108 |
-
- Model optimization
|
| 109 |
-
- Documentation improvements
|
| 110 |
-
- Bug reports and fixes
|
| 111 |
-
|
| 112 |
-
Please feel free to submit a Pull Request or open an Issue.
|
| 113 |
-
|
| 114 |
-
## π Citation
|
| 115 |
-
If you use this model in your research, please cite it using:
|
| 116 |
|
| 117 |
```bibtex
|
| 118 |
@software{latishab2025turnsense,
|
|
|
|
| 7 |
pipeline_tag: text-classification
|
| 8 |
datasets:
|
| 9 |
- latishab/turns-2k
|
| 10 |
+
model-index:
|
| 11 |
+
- name: Turnsense
|
| 12 |
+
results:
|
| 13 |
+
- task:
|
| 14 |
+
type: text-classification
|
| 15 |
+
name: End-of-Utterance Detection
|
| 16 |
+
metrics:
|
| 17 |
+
- name: Accuracy (Standard)
|
| 18 |
+
type: accuracy
|
| 19 |
+
value: 97.50
|
| 20 |
+
- name: Accuracy (Quantized)
|
| 21 |
+
type: accuracy
|
| 22 |
+
value: 93.75
|
| 23 |
---
|
| 24 |
# Turnsense: Turn-Detector Model
|
| 25 |
|
| 26 |
+
A lightweight end-of-utterance (EOU) detection model fine-tuned on SmolLM2-135M, optimized for Raspberry Pi and low-power devices. Trained on TURNS-2K, a dataset designed to cover various STT output patterns including backchannels, mispronunciations, code-switching, and different text formatting styles. This makes the model work well across different STT systems.
|
| 27 |
|
| 28 |
+
## Key Features
|
| 29 |
|
| 30 |
+
- **Lightweight**: Built on SmolLM2-135M (~135M parameters)
|
| 31 |
+
- **High accuracy**: 97.50% (standard) / 93.75% (quantized)
|
| 32 |
+
- **Edge-ready**: Runs on Raspberry Pi and similar hardware
|
| 33 |
+
- **ONNX support**: Works with ONNX Runtime and Hugging Face Transformers
|
| 34 |
|
| 35 |
+
## Performance
|
| 36 |
|
| 37 |
+
The model holds up well across configurations:
|
| 38 |
|
| 39 |
+
- **Standard model**: 97.50% accuracy
|
| 40 |
+
- **Quantized model**: 93.75% accuracy
|
| 41 |
+
- **Average probability difference**: 0.0323 between versions
|
| 42 |
|
| 43 |

|
| 44 |
|
| 45 |
+
### Speed
|
| 46 |
|
| 47 |

|
| 48 |
|
| 49 |
+
## Installation
|
| 50 |
```bash
|
| 51 |
pip install transformers onnxruntime numpy huggingface_hub
|
| 52 |
```
|
| 53 |
|
| 54 |
+
## Quick Start
|
| 55 |
|
| 56 |
```python
|
| 57 |
import onnxruntime as ort
|
|
|
|
| 89 |
print(f"Prediction (0 or 1): {prediction}")
|
| 90 |
```
|
| 91 |
|
| 92 |
+
## Dataset: TURNS-2K
|
| 93 |
|
| 94 |
+
The model is trained on TURNS-2K, a dataset built for end-of-utterance detection. It covers:
|
| 95 |
|
| 96 |
- Backchannels and self-corrections
|
| 97 |
- Code-switching and language mixing
|
| 98 |
- Multiple text formatting styles
|
| 99 |
+
- Variations in STT output across different systems
|
| 100 |
|
| 101 |
+
## Motivation and current state
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
+
I built Turnsense because I couldn't find a good open-source turn detection model for edge devices. Most options were either proprietary or too heavy to run on something like a Raspberry Pi.
|
| 104 |
|
| 105 |
+
The model is trained on English speech patterns using 2,000 samples via LoRA fine-tuning on SmolLM2-135M. It handles common STT outputs well, but there are edge cases and complex conversational patterns it doesn't cover yet. ONNX was a deliberate choice for device compatibility, though a port to Apple MLX is on the table.
|
| 106 |
|
| 107 |
+
## License
|
| 108 |
+
Apache 2.0. See the LICENSE file for details.
|
| 109 |
|
| 110 |
+
## Contributing
|
| 111 |
|
| 112 |
+
Contributions are welcome. Some areas that could use help: dataset expansion, model optimization, documentation, and bug reports. Feel free to open a PR or issue.
|
|
|
|
| 113 |
|
| 114 |
+
## Citation
|
| 115 |
+
If you use this model in your research:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
```bibtex
|
| 118 |
@software{latishab2025turnsense,
|