Upload 3 files
Browse files- README.md +31 -3
- config.json +9 -0
- model.onnx +3 -0
README.md
CHANGED
|
@@ -1,3 +1,31 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- lstm
|
| 7 |
+
- text-segmentation
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
Check this [NPM package](https://github.com/orgs/the-vedantic-coder/packages/npm/stst/580080323) (built for Speech-To-Text usecase) implements the setup and inference for this model. It provides a [React app demo](https://sentence-splitter-poc.vercel.app/) and a `processDirectText` method to try direct inference on text.
|
| 11 |
+
|
| 12 |
+
The sentence splitter model is an LSTM model with around 500 B input size taken from the repository: [NNSplit](https://github.com/kornelski/nnsplit)
|
| 13 |
+
|
| 14 |
+
| | NNSplit | Spacy (Tagger) | Spacy (Sentencizer) |
|
| 15 |
+
|------------------------|------------|----------------|---------------------|
|
| 16 |
+
| Clean | 0.754371 | 0.853603 | 0.820934 |
|
| 17 |
+
| Partial punctuation | 0.485907 | 0.517829 | 0.249753 |
|
| 18 |
+
| Partial case | 0.761754 | 0.825119 | 0.819679 |
|
| 19 |
+
| Partial punctuation and case | 0.443704 | 0.458619 | 0.249873 |
|
| 20 |
+
| No punctuation and case| 0.166273 | 0.180859 | 0.00463281 |
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
### Example
|
| 24 |
+
No punctuation and no cases (~17% accuracy) <br>
|
| 25 |
+
**Input:**
|
| 26 |
+
```text
|
| 27 |
+
the difference between rest and graphql is explained as follows
|
| 28 |
+
rest is an architectural style that exposes resources via endpoints typically following crud operations each endpoint returns a fixed data structure graphql on the other hand allows clients to specify exactly what data they need in a single query often reducing overfetching and underfetching issues
|
| 29 |
+
```
|
| 30 |
+
**Result: 28.90ms ✅**
|
| 31 |
+

|
config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_format": "onnx",
|
| 3 |
+
"model_id": "vinitgore/sentence-splitter",
|
| 4 |
+
"model_type": "custom",
|
| 5 |
+
"files": {
|
| 6 |
+
"onnx": "onnx/model.onnx"
|
| 7 |
+
},
|
| 8 |
+
"description": "Sentence splitter ONNX model for browser inference via Transformers.js"
|
| 9 |
+
}
|
model.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b6fa35f12d12b6cfbdd80ab45bb8134ab0bcb60e76ca1aa479ebb74b459b0362
|
| 3 |
+
size 132
|