File size: 1,859 Bytes
b255dcc
 
 
 
 
0f3e9f5
 
 
 
 
 
 
 
b255dcc
 
 
 
0f3e9f5
 
b255dcc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec9e728
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
license: mit
language:
- en
tags:
- lstm
- text-segmentation
- lightweight
- client-side
- web
- onnxruntime-web
- speech-to-text
- low-memory-footprint
---

Check this [NPM package](https://github.com/orgs/the-vedantic-coder/packages/npm/stst/580080323) (built for Speech-To-Text usecase) implements the setup and inference for this model. It provides a [React app demo](https://sentence-splitter-poc.vercel.app/) and a `processDirectText` method to try direct inference on text. 

The sentence splitter model is modification of the LSTM model with around 500 B input size taken from the repository: [NNSplit](https://github.com/kornelski/nnsplit)
The size of the model used here is **~4 MB**.

|                        | NNSplit    | Spacy (Tagger) | Spacy (Sentencizer) |
|------------------------|------------|----------------|---------------------|
| Clean                  | 0.754371   | 0.853603       | 0.820934            |
| Partial punctuation    | 0.485907   | 0.517829       | 0.249753            |
| Partial case           | 0.761754   | 0.825119       | 0.819679            |
| Partial punctuation and case | 0.443704 | 0.458619   | 0.249873            |
| No punctuation and case| 0.166273   | 0.180859       | 0.00463281          |


### Example
No punctuation and no cases (~17% accuracy) <br>
   **Input:** 
  ```text
  the difference between rest and graphql is explained as follows 
  rest is an architectural style that exposes resources via endpoints typically following crud operations each endpoint returns a fixed data structure graphql on the other hand allows clients to specify exactly what data they need in a single query often reducing overfetching and underfetching issues
  ```
  **Result: 28.90ms ✅**
  ![image](https://cdn-uploads.huggingface.co/production/uploads/687852a7f13fbe6c3d4c9974/LrASgi-BmWROVZIUK-36-.png)