|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- lstm |
|
|
- text-segmentation |
|
|
- lightweight |
|
|
- client-side |
|
|
- web |
|
|
- onnxruntime-web |
|
|
- speech-to-text |
|
|
- low-memory-footprint |
|
|
--- |
|
|
|
|
|
Check this [NPM package](https://github.com/orgs/the-vedantic-coder/packages/npm/stst/580080323) (built for Speech-To-Text usecase) implements the setup and inference for this model. It provides a [React app demo](https://sentence-splitter-poc.vercel.app/) and a `processDirectText` method to try direct inference on text. |
|
|
|
|
|
The sentence splitter model is modification of the LSTM model with around 500 B input size taken from the repository: [NNSplit](https://github.com/kornelski/nnsplit) |
|
|
The size of the model used here is **~4 MB**. |
|
|
|
|
|
| | NNSplit | Spacy (Tagger) | Spacy (Sentencizer) | |
|
|
|------------------------|------------|----------------|---------------------| |
|
|
| Clean | 0.754371 | 0.853603 | 0.820934 | |
|
|
| Partial punctuation | 0.485907 | 0.517829 | 0.249753 | |
|
|
| Partial case | 0.761754 | 0.825119 | 0.819679 | |
|
|
| Partial punctuation and case | 0.443704 | 0.458619 | 0.249873 | |
|
|
| No punctuation and case| 0.166273 | 0.180859 | 0.00463281 | |
|
|
|
|
|
|
|
|
### Example |
|
|
No punctuation and no cases (~17% accuracy) <br> |
|
|
**Input:** |
|
|
```text |
|
|
the difference between rest and graphql is explained as follows |
|
|
rest is an architectural style that exposes resources via endpoints typically following crud operations each endpoint returns a fixed data structure graphql on the other hand allows clients to specify exactly what data they need in a single query often reducing overfetching and underfetching issues |
|
|
``` |
|
|
**Result: 28.90ms ✅** |
|
|
 |
|
|
|