foochun
/

bge-large-finetuned

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f0cca8263b133c578012248311d13c57bc0c91c801eea2d59b4cbf97564660f8
 size 1049760

 version https://git-lfs.github.com/spec/v1
+oid sha256:8eadfa9595c8f175d2a5113f17d40d956f408b29cd32aa5e6523dc473034ec2f
 size 1049760

README.md CHANGED Viewed

@@ -4,35 +4,35 @@ tags:
 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
-- dataset_size:69043
 - loss:MultipleNegativesRankingLoss
 base_model: BAAI/bge-large-en-v1.5
 widget:
-- source_sentence: raja muhammad irfan bin raja ismail
   sentences:
-  - loong min seow
-  - raja ismail bin raja yusof
-  - irfan ismail
-- source_sentence: brandon loh liang meng
   sentences:
-  - liang loh meng
-  - meng loh liang brandon
-  - tan ee zhen
-- source_sentence: kamariah binti abdullah
   sentences:
-  - zulkifli bin hassan
-  - chee sim liang
-  - kamariah binti abdullah
-- source_sentence: hajjah salmah binti ismael
   sentences:
-  - yusof bin ishak
-  - salmah binti ismael
-  - wei kiat ong
-- source_sentence: low kian tian
   sentences:
-  - lo kian tian
-  - low kian tian
-  - ee wei ng
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
@@ -87,9 +87,9 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("foochun/bge-large-finetuned")
 # Run inference
 sentences = [
-    'low kian tian',
-    'low kian tian',
-    'lo kian tian',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
@@ -143,19 +143,19 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 69,043 training samples
 * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
 * Approximate statistics based on the first 1000 samples:
   |         | query                                                                            | pos                                                                              | neg                                                                              |
   |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
   | type    | string                                                                           | string                                                                           | string                                                                           |
-  | details | <ul><li>min: 4 tokens</li><li>mean: 8.91 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.19 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.57 tokens</li><li>max: 16 tokens</li></ul> |
 * Samples:
-  | query                              | pos                            | neg                                 |
-  |:-----------------------------------|:-------------------------------|:------------------------------------|
-  | <code>kavita doraisamy</code>      | <code>kavita doraisamy</code>  | <code>kavita a/l doraisamy</code>   |
-  | <code>siva s/o krishnan</code>     | <code>siva a/l krishnan</code> | <code>krishnan siva</code>          |
-  | <code>wan faiz bin wan azmi</code> | <code>wan faiz wan azmi</code> | <code>wan nabil bin wan azmi</code> |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
@@ -168,19 +168,19 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 9,863 evaluation samples
 * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
 * Approximate statistics based on the first 1000 samples:
   |         | query                                                                            | pos                                                                              | neg                                                                              |
   |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
   | type    | string                                                                           | string                                                                           | string                                                                           |
-  | details | <ul><li>min: 4 tokens</li><li>mean: 7.95 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.45 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.62 tokens</li><li>max: 14 tokens</li></ul> |
 * Samples:
-  | query                            | pos                          | neg                                |
-  |:---------------------------------|:-----------------------------|:-----------------------------------|
-  | <code>felix ho ee wei</code>     | <code>ee wei ho felix</code> | <code>felix wei ee ho</code>       |
-  | <code>lau man yen</code>         | <code>man yen lau</code>     | <code>lau an yen</code>            |
-  | <code>mohd noor bin awang</code> | <code>mohd noor awang</code> | <code>siti noor binti awang</code> |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
@@ -325,12 +325,12 @@ You can finetune this model on your own dataset.
 ### Training Logs
 | Epoch      | Step     | Training Loss | Validation Loss |
 |:----------:|:--------:|:-------------:|:---------------:|
-| 0.4634     | 500      | 0.126         | 0.0151          |
-| 0.9268     | 1000     | 0.0155        | 0.0084          |
-| 1.3902     | 1500     | 0.0084        | 0.0059          |
-| 1.8536     | 2000     | 0.0064        | 0.0055          |
-| 2.3170     | 2500     | 0.0057        | 0.0045          |
-| **2.7804** | **3000** | **0.0044**    | **0.0045**      |
 * The bold row denotes the saved checkpoint.

 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
+- dataset_size:69216
 - loss:MultipleNegativesRankingLoss
 base_model: BAAI/bge-large-en-v1.5
 widget:
+- source_sentence: ajith s/o sockalingam
   sentences:
+  - ajith a/l sockalingam
+  - marcus ping yi ng
+  - ajith a/p sockalingam
+- source_sentence: quinn kwan xin fang
   sentences:
+  - ambiga a/p jacob
+  - quinn fang kwan xin
+  - xin kwan fang
+- source_sentence: brandon teh min ling
   sentences:
+  - victor bing yong ng
+  - min ling teh brandon
+  - ling min teh brandon
+- source_sentence: carmen ho xin jun
   sentences:
+  - xin ho jun carmen
+  - pei ho yi grace
+  - xin jun ho carmen
+- source_sentence: alicia lim siu ling
   sentences:
+  - lim ling siu alicia
+  - alicia siu ling lim
+  - nadia soh meng jun
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
 model = SentenceTransformer("foochun/bge-large-finetuned")
 # Run inference
 sentences = [
+    'alicia lim siu ling',
+    'alicia siu ling lim',
+    'lim ling siu alicia',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 #### Unnamed Dataset
+* Size: 69,216 training samples
 * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
 * Approximate statistics based on the first 1000 samples:
   |         | query                                                                            | pos                                                                              | neg                                                                              |
   |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
   | type    | string                                                                           | string                                                                           | string                                                                           |
+  | details | <ul><li>min: 4 tokens</li><li>mean: 8.96 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.22 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.47 tokens</li><li>max: 16 tokens</li></ul> |
 * Samples:
+  | query                              | pos                            | neg                           |
+  |:-----------------------------------|:-------------------------------|:------------------------------|
+  | <code>abdul karim bin bakar</code> | <code>abdul karim bakar</code> | <code>johan bin hamid</code>  |
+  | <code>rupai anak jamit</code>      | <code>rupai jamit</code>       | <code>rupai anak karim</code> |
+  | <code>sim kim ning</code>          | <code>ning sim kim</code>      | <code>kim sim ning</code>     |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
 #### Unnamed Dataset
+* Size: 9,887 evaluation samples
 * Columns: <code>query</code>, <code>pos</code>, and <code>neg</code>
 * Approximate statistics based on the first 1000 samples:
   |         | query                                                                            | pos                                                                              | neg                                                                              |
   |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
   | type    | string                                                                           | string                                                                           | string                                                                           |
+  | details | <ul><li>min: 4 tokens</li><li>mean: 7.86 tokens</li><li>max: 17 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.38 tokens</li><li>max: 16 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.65 tokens</li><li>max: 16 tokens</li></ul> |
 * Samples:
+  | query                               | pos                                    | neg                                 |
+  |:------------------------------------|:---------------------------------------|:------------------------------------|
+  | <code>mohd ridzuan bin nasir</code> | <code>mohamad ridzuan bin nasir</code> | <code>mohd ridzuan bin naser</code> |
+  | <code>isabel koh jun liang</code>   | <code>isabel koh jun liang</code>      | <code>liang jun koh isabel</code>   |
+  | <code>neo mei chuan</code>          | <code>neo mei chuan</code>             | <code>mak mei chuan</code>          |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
 ### Training Logs
 | Epoch      | Step     | Training Loss | Validation Loss |
 |:----------:|:--------:|:-------------:|:---------------:|
+| 0.4621     | 500      | 0.1357        | 0.0127          |
+| 0.9242     | 1000     | 0.0149        | 0.0065          |
+| 1.3863     | 1500     | 0.0079        | 0.0065          |
+| 1.8484     | 2000     | 0.0069        | 0.0043          |
+| 2.3105     | 2500     | 0.0059        | 0.0040          |
+| **2.7726** | **3000** | **0.0052**    | **0.0039**      |
 * The bold row denotes the saved checkpoint.

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0e58d04da3e441f00a7d1d383258c1bb8d6c8449ce5527bb832ae0aba938b405
 size 1340612432

 version https://git-lfs.github.com/spec/v1
+oid sha256:15b52f7abf658111d9430675ac14595f44e24a6d62b078f77ee10351c0ce222f
 size 1340612432