sentence-splitter / README.md
vinitgore's picture
Update README.md
ec9e728 verified
---
license: mit
language:
- en
tags:
- lstm
- text-segmentation
- lightweight
- client-side
- web
- onnxruntime-web
- speech-to-text
- low-memory-footprint
---
Check this [NPM package](https://github.com/orgs/the-vedantic-coder/packages/npm/stst/580080323) (built for Speech-To-Text usecase) implements the setup and inference for this model. It provides a [React app demo](https://sentence-splitter-poc.vercel.app/) and a `processDirectText` method to try direct inference on text.
The sentence splitter model is modification of the LSTM model with around 500 B input size taken from the repository: [NNSplit](https://github.com/kornelski/nnsplit)
The size of the model used here is **~4 MB**.
| | NNSplit | Spacy (Tagger) | Spacy (Sentencizer) |
|------------------------|------------|----------------|---------------------|
| Clean | 0.754371 | 0.853603 | 0.820934 |
| Partial punctuation | 0.485907 | 0.517829 | 0.249753 |
| Partial case | 0.761754 | 0.825119 | 0.819679 |
| Partial punctuation and case | 0.443704 | 0.458619 | 0.249873 |
| No punctuation and case| 0.166273 | 0.180859 | 0.00463281 |
### Example
No punctuation and no cases (~17% accuracy) <br>
**Input:**
```text
the difference between rest and graphql is explained as follows
rest is an architectural style that exposes resources via endpoints typically following crud operations each endpoint returns a fixed data structure graphql on the other hand allows clients to specify exactly what data they need in a single query often reducing overfetching and underfetching issues
```
**Result: 28.90ms ✅**
![image](https://cdn-uploads.huggingface.co/production/uploads/687852a7f13fbe6c3d4c9974/LrASgi-BmWROVZIUK-36-.png)