File size: 1,859 Bytes
b255dcc 0f3e9f5 b255dcc 0f3e9f5 b255dcc ec9e728 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
---
license: mit
language:
- en
tags:
- lstm
- text-segmentation
- lightweight
- client-side
- web
- onnxruntime-web
- speech-to-text
- low-memory-footprint
---
Check this [NPM package](https://github.com/orgs/the-vedantic-coder/packages/npm/stst/580080323) (built for Speech-To-Text usecase) implements the setup and inference for this model. It provides a [React app demo](https://sentence-splitter-poc.vercel.app/) and a `processDirectText` method to try direct inference on text.
The sentence splitter model is modification of the LSTM model with around 500 B input size taken from the repository: [NNSplit](https://github.com/kornelski/nnsplit)
The size of the model used here is **~4 MB**.
| | NNSplit | Spacy (Tagger) | Spacy (Sentencizer) |
|------------------------|------------|----------------|---------------------|
| Clean | 0.754371 | 0.853603 | 0.820934 |
| Partial punctuation | 0.485907 | 0.517829 | 0.249753 |
| Partial case | 0.761754 | 0.825119 | 0.819679 |
| Partial punctuation and case | 0.443704 | 0.458619 | 0.249873 |
| No punctuation and case| 0.166273 | 0.180859 | 0.00463281 |
### Example
No punctuation and no cases (~17% accuracy) <br>
**Input:**
```text
the difference between rest and graphql is explained as follows
rest is an architectural style that exposes resources via endpoints typically following crud operations each endpoint returns a fixed data structure graphql on the other hand allows clients to specify exactly what data they need in a single query often reducing overfetching and underfetching issues
```
**Result: 28.90ms ✅**

|