vinitgore commited on
Commit
b255dcc
·
verified ·
1 Parent(s): 72dea8e

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +31 -3
  2. config.json +9 -0
  3. model.onnx +3 -0
README.md CHANGED
@@ -1,3 +1,31 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - lstm
7
+ - text-segmentation
8
+ ---
9
+
10
+ Check this [NPM package](https://github.com/orgs/the-vedantic-coder/packages/npm/stst/580080323) (built for Speech-To-Text usecase) implements the setup and inference for this model. It provides a [React app demo](https://sentence-splitter-poc.vercel.app/) and a `processDirectText` method to try direct inference on text.
11
+
12
+ The sentence splitter model is an LSTM model with around 500 B input size taken from the repository: [NNSplit](https://github.com/kornelski/nnsplit)
13
+
14
+ | | NNSplit | Spacy (Tagger) | Spacy (Sentencizer) |
15
+ |------------------------|------------|----------------|---------------------|
16
+ | Clean | 0.754371 | 0.853603 | 0.820934 |
17
+ | Partial punctuation | 0.485907 | 0.517829 | 0.249753 |
18
+ | Partial case | 0.761754 | 0.825119 | 0.819679 |
19
+ | Partial punctuation and case | 0.443704 | 0.458619 | 0.249873 |
20
+ | No punctuation and case| 0.166273 | 0.180859 | 0.00463281 |
21
+
22
+
23
+ ### Example
24
+ No punctuation and no cases (~17% accuracy) <br>
25
+ **Input:**
26
+ ```text
27
+ the difference between rest and graphql is explained as follows
28
+ rest is an architectural style that exposes resources via endpoints typically following crud operations each endpoint returns a fixed data structure graphql on the other hand allows clients to specify exactly what data they need in a single query often reducing overfetching and underfetching issues
29
+ ```
30
+ **Result: 28.90ms ✅**
31
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/687852a7f13fbe6c3d4c9974/4xF4cDfK2pqJCY2H7ugPY.png)
config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_format": "onnx",
3
+ "model_id": "vinitgore/sentence-splitter",
4
+ "model_type": "custom",
5
+ "files": {
6
+ "onnx": "onnx/model.onnx"
7
+ },
8
+ "description": "Sentence splitter ONNX model for browser inference via Transformers.js"
9
+ }
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6fa35f12d12b6cfbdd80ab45bb8134ab0bcb60e76ca1aa479ebb74b459b0362
3
+ size 132