foochun commited on
Commit
180df1d
·
verified ·
1 Parent(s): 3ce497b

256 Dimension updated

Browse files
Files changed (3) hide show
  1. 2_Dense/model.safetensors +1 -1
  2. README.md +44 -44
  3. model.safetensors +1 -1
2_Dense/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f0cca8263b133c578012248311d13c57bc0c91c801eea2d59b4cbf97564660f8
3
  size 1049760
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8eadfa9595c8f175d2a5113f17d40d956f408b29cd32aa5e6523dc473034ec2f
3
  size 1049760
README.md CHANGED
@@ -4,35 +4,35 @@ tags:
4
  - sentence-similarity
5
  - feature-extraction
6
  - generated_from_trainer
7
- - dataset_size:69043
8
  - loss:MultipleNegativesRankingLoss
9
  base_model: BAAI/bge-large-en-v1.5
10
  widget:
11
- - source_sentence: raja muhammad irfan bin raja ismail
12
  sentences:
13
- - loong min seow
14
- - raja ismail bin raja yusof
15
- - irfan ismail
16
- - source_sentence: brandon loh liang meng
17
  sentences:
18
- - liang loh meng
19
- - meng loh liang brandon
20
- - tan ee zhen
21
- - source_sentence: kamariah binti abdullah
22
  sentences:
23
- - zulkifli bin hassan
24
- - chee sim liang
25
- - kamariah binti abdullah
26
- - source_sentence: hajjah salmah binti ismael
27
  sentences:
28
- - yusof bin ishak
29
- - salmah binti ismael
30
- - wei kiat ong
31
- - source_sentence: low kian tian
32
  sentences:
33
- - lo kian tian
34
- - low kian tian
35
- - ee wei ng
36
  pipeline_tag: sentence-similarity
37
  library_name: sentence-transformers
38
  ---
@@ -87,9 +87,9 @@ from sentence_transformers import SentenceTransformer
87
  model = SentenceTransformer("foochun/bge-large-finetuned")
88
  # Run inference
89
  sentences = [
90
- 'low kian tian',
91
- 'low kian tian',
92
- 'lo kian tian',
93
  ]
94
  embeddings = model.encode(sentences)
95
  print(embeddings.shape)
@@ -143,19 +143,19 @@ You can finetune this model on your own dataset.
143
 
144
  #### Unnamed Dataset
145
 
146
- * Size: 69,043 training samples
147
  * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
148
  * Approximate statistics based on the first 1000 samples:
149
  | | query | pos | neg |
150
  |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
151
  | type | string | string | string |
152
- | details | <ul><li>min: 4 tokens</li><li>mean: 8.91 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.19 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.57 tokens</li><li>max: 16 tokens</li></ul> |
153
  * Samples:
154
- | query | pos | neg |
155
- |:-----------------------------------|:-------------------------------|:------------------------------------|
156
- | <code>kavita doraisamy</code> | <code>kavita doraisamy</code> | <code>kavita a/l doraisamy</code> |
157
- | <code>siva s/o krishnan</code> | <code>siva a/l krishnan</code> | <code>krishnan siva</code> |
158
- | <code>wan faiz bin wan azmi</code> | <code>wan faiz wan azmi</code> | <code>wan nabil bin wan azmi</code> |
159
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
160
  ```json
161
  {
@@ -168,19 +168,19 @@ You can finetune this model on your own dataset.
168
 
169
  #### Unnamed Dataset
170
 
171
- * Size: 9,863 evaluation samples
172
  * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
173
  * Approximate statistics based on the first 1000 samples:
174
  | | query | pos | neg |
175
  |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
176
  | type | string | string | string |
177
- | details | <ul><li>min: 4 tokens</li><li>mean: 7.95 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.45 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.62 tokens</li><li>max: 14 tokens</li></ul> |
178
  * Samples:
179
- | query | pos | neg |
180
- |:---------------------------------|:-----------------------------|:-----------------------------------|
181
- | <code>felix ho ee wei</code> | <code>ee wei ho felix</code> | <code>felix wei ee ho</code> |
182
- | <code>lau man yen</code> | <code>man yen lau</code> | <code>lau an yen</code> |
183
- | <code>mohd noor bin awang</code> | <code>mohd noor awang</code> | <code>siti noor binti awang</code> |
184
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
185
  ```json
186
  {
@@ -325,12 +325,12 @@ You can finetune this model on your own dataset.
325
  ### Training Logs
326
  | Epoch | Step | Training Loss | Validation Loss |
327
  |:----------:|:--------:|:-------------:|:---------------:|
328
- | 0.4634 | 500 | 0.126 | 0.0151 |
329
- | 0.9268 | 1000 | 0.0155 | 0.0084 |
330
- | 1.3902 | 1500 | 0.0084 | 0.0059 |
331
- | 1.8536 | 2000 | 0.0064 | 0.0055 |
332
- | 2.3170 | 2500 | 0.0057 | 0.0045 |
333
- | **2.7804** | **3000** | **0.0044** | **0.0045** |
334
 
335
  * The bold row denotes the saved checkpoint.
336
 
 
4
  - sentence-similarity
5
  - feature-extraction
6
  - generated_from_trainer
7
+ - dataset_size:69216
8
  - loss:MultipleNegativesRankingLoss
9
  base_model: BAAI/bge-large-en-v1.5
10
  widget:
11
+ - source_sentence: ajith s/o sockalingam
12
  sentences:
13
+ - ajith a/l sockalingam
14
+ - marcus ping yi ng
15
+ - ajith a/p sockalingam
16
+ - source_sentence: quinn kwan xin fang
17
  sentences:
18
+ - ambiga a/p jacob
19
+ - quinn fang kwan xin
20
+ - xin kwan fang
21
+ - source_sentence: brandon teh min ling
22
  sentences:
23
+ - victor bing yong ng
24
+ - min ling teh brandon
25
+ - ling min teh brandon
26
+ - source_sentence: carmen ho xin jun
27
  sentences:
28
+ - xin ho jun carmen
29
+ - pei ho yi grace
30
+ - xin jun ho carmen
31
+ - source_sentence: alicia lim siu ling
32
  sentences:
33
+ - lim ling siu alicia
34
+ - alicia siu ling lim
35
+ - nadia soh meng jun
36
  pipeline_tag: sentence-similarity
37
  library_name: sentence-transformers
38
  ---
 
87
  model = SentenceTransformer("foochun/bge-large-finetuned")
88
  # Run inference
89
  sentences = [
90
+ 'alicia lim siu ling',
91
+ 'alicia siu ling lim',
92
+ 'lim ling siu alicia',
93
  ]
94
  embeddings = model.encode(sentences)
95
  print(embeddings.shape)
 
143
 
144
  #### Unnamed Dataset
145
 
146
+ * Size: 69,216 training samples
147
  * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
148
  * Approximate statistics based on the first 1000 samples:
149
  | | query | pos | neg |
150
  |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
151
  | type | string | string | string |
152
+ | details | <ul><li>min: 4 tokens</li><li>mean: 8.96 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.22 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.47 tokens</li><li>max: 16 tokens</li></ul> |
153
  * Samples:
154
+ | query | pos | neg |
155
+ |:-----------------------------------|:-------------------------------|:------------------------------|
156
+ | <code>abdul karim bin bakar</code> | <code>abdul karim bakar</code> | <code>johan bin hamid</code> |
157
+ | <code>rupai anak jamit</code> | <code>rupai jamit</code> | <code>rupai anak karim</code> |
158
+ | <code>sim kim ning</code> | <code>ning sim kim</code> | <code>kim sim ning</code> |
159
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
160
  ```json
161
  {
 
168
 
169
  #### Unnamed Dataset
170
 
171
+ * Size: 9,887 evaluation samples
172
  * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
173
  * Approximate statistics based on the first 1000 samples:
174
  | | query | pos | neg |
175
  |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
176
  | type | string | string | string |
177
+ | details | <ul><li>min: 4 tokens</li><li>mean: 7.86 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.38 tokens</li><li>max: 16 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.65 tokens</li><li>max: 16 tokens</li></ul> |
178
  * Samples:
179
+ | query | pos | neg |
180
+ |:------------------------------------|:---------------------------------------|:------------------------------------|
181
+ | <code>mohd ridzuan bin nasir</code> | <code>mohamad ridzuan bin nasir</code> | <code>mohd ridzuan bin naser</code> |
182
+ | <code>isabel koh jun liang</code> | <code>isabel koh jun liang</code> | <code>liang jun koh isabel</code> |
183
+ | <code>neo mei chuan</code> | <code>neo mei chuan</code> | <code>mak mei chuan</code> |
184
  * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
185
  ```json
186
  {
 
325
  ### Training Logs
326
  | Epoch | Step | Training Loss | Validation Loss |
327
  |:----------:|:--------:|:-------------:|:---------------:|
328
+ | 0.4621 | 500 | 0.1357 | 0.0127 |
329
+ | 0.9242 | 1000 | 0.0149 | 0.0065 |
330
+ | 1.3863 | 1500 | 0.0079 | 0.0065 |
331
+ | 1.8484 | 2000 | 0.0069 | 0.0043 |
332
+ | 2.3105 | 2500 | 0.0059 | 0.0040 |
333
+ | **2.7726** | **3000** | **0.0052** | **0.0039** |
334
 
335
  * The bold row denotes the saved checkpoint.
336
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0e58d04da3e441f00a7d1d383258c1bb8d6c8449ce5527bb832ae0aba938b405
3
  size 1340612432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15b52f7abf658111d9430675ac14595f44e24a6d62b078f77ee10351c0ce222f
3
  size 1340612432