the-vedantic-coder
/

sentence-splitter

text-segmentation

onnxruntime-web

low-memory-footprint

Model card Files Files and versions

sentence-splitter / README.md

vinitgore's picture

Update README.md

ec9e728 verified 25 days ago

|

history blame contribute delete

1.86 kB

	---
	license: mit
	language:
	- en
	tags:
	- lstm
	- text-segmentation
	- lightweight
	- client-side
	- web
	- onnxruntime-web
	- speech-to-text
	- low-memory-footprint
	---

	Check this [NPM package](https://github.com/orgs/the-vedantic-coder/packages/npm/stst/580080323) (built for Speech-To-Text usecase) implements the setup and inference for this model. It provides a [React app demo](https://sentence-splitter-poc.vercel.app/) and a `processDirectText` method to try direct inference on text.

	The sentence splitter model is modification of the LSTM model with around 500 B input size taken from the repository: [NNSplit](https://github.com/kornelski/nnsplit)
	The size of the model used here is ~4 MB.

	\| \| NNSplit \| Spacy (Tagger) \| Spacy (Sentencizer) \|
	\|------------------------\|------------\|----------------\|---------------------\|
	\| Clean \| 0.754371 \| 0.853603 \| 0.820934 \|
	\| Partial punctuation \| 0.485907 \| 0.517829 \| 0.249753 \|
	\| Partial case \| 0.761754 \| 0.825119 \| 0.819679 \|
	\| Partial punctuation and case \| 0.443704 \| 0.458619 \| 0.249873 \|
	\| No punctuation and case\| 0.166273 \| 0.180859 \| 0.00463281 \|


	### Example
	No punctuation and no cases (~17% accuracy) <br>
	Input:
	```text
	the difference between rest and graphql is explained as follows
	rest is an architectural style that exposes resources via endpoints typically following crud operations each endpoint returns a fixed data structure graphql on the other hand allows clients to specify exactly what data they need in a single query often reducing overfetching and underfetching issues
	```
	Result: 28.90ms ✅
	![image](https://cdn-uploads.huggingface.co/production/uploads/687852a7f13fbe6c3d4c9974/LrASgi-BmWROVZIUK-36-.png)