generall commited on
Commit
43972e2
·
1 Parent(s): 523cadc

init minicoil

Browse files
Files changed (3) hide show
  1. README.md +6 -29
  2. minicoil.triplet.model.vocab +0 -0
  3. stopwords.txt +179 -0
README.md CHANGED
@@ -1,37 +1,14 @@
1
  ---
2
- base_model: jinaai/jina-embeddings-v2-small-en
3
- library_name: transformers.js
4
- pipeline_tag: feature-extraction
5
  ---
6
 
7
- https://huggingface.co/jinaai/jina-embeddings-v2-small-en with ONNX weights to be compatible with Transformers.js.
8
 
9
- ## Usage with 🤗 Transformers.js
 
10
 
11
- If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
12
- ```bash
13
- npm i @huggingface/transformers
14
- ```
15
 
16
- You can then use the model as follows:
17
- ```js
18
- import { pipeline, cos_sim } from '@huggingface/transformers';
19
 
20
- // Create feature extraction pipeline
21
- const extractor = await pipeline('feature-extraction', 'Xenova/jina-embeddings-v2-small-en',
22
- { dtype: "fp32" } // Options: "fp32", "fp16", "q8", "q4"
23
- );
24
 
25
- // Generate embeddings
26
- const output = await extractor(
27
- ['How is the weather today?', 'What is the current weather like today?'],
28
- { pooling: 'mean' }
29
- );
30
-
31
- // Compute cosine similarity
32
- console.log(cos_sim(output[0].data, output[1].data)); // 0.9399812684139274 (unquantized) vs. 0.9341121503699659 (quantized)
33
- ```
34
-
35
- ---
36
-
37
- Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using [🤗 Optimum](https://huggingface.co/docs/optimum/index) and structuring your repo like this one (with ONNX weights located in a subfolder named `onnx`).
 
1
  ---
 
 
 
2
  ---
3
 
4
+ # MiniCOIL v1
5
 
6
+ MiniCOIL - is a sparse contextualized per-token embeddings.
7
+ Read more about it in [the article](https://qdrant.tech/articles/minicoil).
8
 
 
 
 
 
9
 
10
+ ## Usage
 
 
11
 
12
+ This model is designed to be used with [FastEmbed](https://github.com/qdrant/fastembed) library.
 
 
 
13
 
14
+ ToDo
 
 
 
 
 
 
 
 
 
 
 
 
minicoil.triplet.model.vocab ADDED
The diff for this file is too large to render. See raw diff
 
stopwords.txt ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ i
2
+ me
3
+ my
4
+ myself
5
+ we
6
+ our
7
+ ours
8
+ ourselves
9
+ you
10
+ you're
11
+ you've
12
+ you'll
13
+ you'd
14
+ your
15
+ yours
16
+ yourself
17
+ yourselves
18
+ he
19
+ him
20
+ his
21
+ himself
22
+ she
23
+ she's
24
+ her
25
+ hers
26
+ herself
27
+ it
28
+ it's
29
+ its
30
+ itself
31
+ they
32
+ them
33
+ their
34
+ theirs
35
+ themselves
36
+ what
37
+ which
38
+ who
39
+ whom
40
+ this
41
+ that
42
+ that'll
43
+ these
44
+ those
45
+ am
46
+ is
47
+ are
48
+ was
49
+ were
50
+ be
51
+ been
52
+ being
53
+ have
54
+ has
55
+ had
56
+ having
57
+ do
58
+ does
59
+ did
60
+ doing
61
+ a
62
+ an
63
+ the
64
+ and
65
+ but
66
+ if
67
+ or
68
+ because
69
+ as
70
+ until
71
+ while
72
+ of
73
+ at
74
+ by
75
+ for
76
+ with
77
+ about
78
+ against
79
+ between
80
+ into
81
+ through
82
+ during
83
+ before
84
+ after
85
+ above
86
+ below
87
+ to
88
+ from
89
+ up
90
+ down
91
+ in
92
+ out
93
+ on
94
+ off
95
+ over
96
+ under
97
+ again
98
+ further
99
+ then
100
+ once
101
+ here
102
+ there
103
+ when
104
+ where
105
+ why
106
+ how
107
+ all
108
+ any
109
+ both
110
+ each
111
+ few
112
+ more
113
+ most
114
+ other
115
+ some
116
+ such
117
+ no
118
+ nor
119
+ not
120
+ only
121
+ own
122
+ same
123
+ so
124
+ than
125
+ too
126
+ very
127
+ s
128
+ t
129
+ can
130
+ will
131
+ just
132
+ don
133
+ don't
134
+ should
135
+ should've
136
+ now
137
+ d
138
+ ll
139
+ m
140
+ o
141
+ re
142
+ ve
143
+ y
144
+ ain
145
+ aren
146
+ aren't
147
+ couldn
148
+ couldn't
149
+ didn
150
+ didn't
151
+ doesn
152
+ doesn't
153
+ hadn
154
+ hadn't
155
+ hasn
156
+ hasn't
157
+ haven
158
+ haven't
159
+ isn
160
+ isn't
161
+ ma
162
+ mightn
163
+ mightn't
164
+ mustn
165
+ mustn't
166
+ needn
167
+ needn't
168
+ shan
169
+ shan't
170
+ shouldn
171
+ shouldn't
172
+ wasn
173
+ wasn't
174
+ weren
175
+ weren't
176
+ won
177
+ won't
178
+ wouldn
179
+ wouldn't