Syldehayem
/

all-MiniLM-L12-v2_embedder_train

@@ -4,48 +4,52 @@ tags:
 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
-- dataset_size:13657
 - loss:TripletLoss
-base_model: sentence-transformers/all-MiniLM-L12-v2
 widget:
-- source_sentence: AN ISLAND DRIFTS | Omeleto Drama
   sentences:
-  - 'The DUST Files: Awesome Aliens Vol. 1'
-  - Doshti Putuler Galpo | দশটি পুতুলের গল্প | Bangla Telefilm | Part - 2
-  - Mon Shunte Ki Chay | Hingsha | Bengali Movie Song | Kumar Sanu, Kavita
-- source_sentence: 'CGI 3D Animated Short: "Knight GYM" - by Alexis Dumortier | TheCGBros'
   sentences:
-  - 'Zoom Cloud Hack #91784 The Tribe Murders | Short Horror Film | Screamfest'
-  - '**Caution** CGI 3D Animated Spot : "#feelingnuts" - by Studio AKA'
-  - 'CGI VFX Breakdowns : "Dust - Creature Shot VFX" by Ember Lab'
-- source_sentence: CGI Animated Short Film HD "Scarlett " by The STUDIO NYC | CGMeetup
   sentences:
-  - 'CGI 3D Animated Short: "The Song of the Rain"  - by Hezmon Animation | TheCGBros'
-  - Ghum Ghum Chand | Sabar Oparey | Bengali Movie Song | Sandhya Mukherjee
-  - 'CGI 3D Showreel :  "Architectural 2012"  by - ALLCGSTUDIO'
-- source_sentence: Chucky | Halloween Horror Nights 2023
   sentences:
-  - Bhalobasa Bhalobasa | ভালবাসা ভালবাসা | Bengali Movie - 3/13
-  - 'CGI & VFX Breakdowns: "After Earth Breakdown" - by Tippett Studio | TheCGBros'
-  - Horror Short Film "The World Over" | ALTER
-- source_sentence: Horror Short Film “The Guest” | ALTER
   sentences:
-  - Sci-Fi Short Film "Who Among Us" | DUST
-  - '"The Amazing SpiderDad Trailer" - Mike Wilson'
-  - Horror Short Film "Peter the Penguin" | ALTER
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
-# SentenceTransformer based on sentence-transformers/all-MiniLM-L12-v2
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
-- **Base model:** [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) <!-- at revision c004d8e3e901237d8fa7e9fff12774962e391ce5 -->
 - **Maximum Sequence Length:** 128 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
@@ -87,9 +91,9 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("Syldehayem/all-MiniLM-L12-v2_embedder_train")
 # Run inference
 sentences = [
-    'Horror Short Film “The Guest” | ALTER',
-    'Sci-Fi Short Film "Who Among Us" | DUST',
-    'Horror Short Film "Peter the Penguin" | ALTER',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
@@ -143,19 +147,19 @@ You can finetune this model on your own dataset.
 #### Unnamed Dataset
-* Size: 13,657 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | sentence_0                                                                        | sentence_1                                                                        | sentence_2                                                                        |
-  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
-  | type    | string                                                                            | string                                                                            | string                                                                            |
-  | details | <ul><li>min: 3 tokens</li><li>mean: 19.84 tokens</li><li>max: 69 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 19.98 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 19.89 tokens</li><li>max: 46 tokens</li></ul> |
 * Samples:
-  | sentence_0                                                                                        | sentence_1                                                                               | sentence_2                                                                        |
-  |:--------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
-  | <code>CGI 3D Animated Promo Short : "EFdeN: Where N is Nature" - by UmbrellaFX</code>             | <code>A Sci-Fi Short Film: "The Last Star" by  Dimitri Vallein | TheCGBros</code>        | <code>CGI & VFX Showreels: "Character reel" - by Dario Triglia | TheCGBros</code> |
-  | <code>**Award Winning** Sci-Fi Short Film: "The Developer" - by Robert Odegnal | TheCGBros</code> | <code>Vonnis | Short Horror Film | Screamfest</code>                                     | <code>Adobe and the Frog BTS - Day 3 & 4!</code>                                  |
-  | <code>CGI 3D Animated Short "Olrik" - by Philip Harris-Genois and Marilyn Marcotte</code>         | <code>CGI & VFX Showreels: "VFX Compositing Showreel" - by Ameya More | TheCGBros</code> | <code>CGI 3D Making Of : "Project 4450" - by The Animation Workshop</code>        |
 * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
   ```json
   {
@@ -169,7 +173,7 @@ You can finetune this model on your own dataset.
 - `per_device_train_batch_size`: 16
 - `per_device_eval_batch_size`: 16
-- `num_train_epochs`: 10
 - `multi_dataset_batch_sampler`: round_robin
 #### All Hyperparameters
@@ -192,7 +196,7 @@ You can finetune this model on your own dataset.
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1
-- `num_train_epochs`: 10
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
@@ -293,25 +297,68 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch  | Step | Training Loss |
-|:------:|:----:|:-------------:|
-| 0.5855 | 500  | 4.9981        |
-| 1.1710 | 1000 | 4.997         |
-| 1.7564 | 1500 | 4.9753        |
-| 2.3419 | 2000 | 4.9609        |
-| 2.9274 | 2500 | 4.9416        |
-| 3.5129 | 3000 | 4.8768        |
-| 4.0984 | 3500 | 4.8283        |
-| 4.6838 | 4000 | 4.7853        |
-| 5.2693 | 4500 | 4.7767        |
-| 5.8548 | 5000 | 4.7234        |
-| 6.4403 | 5500 | 4.7153        |
-| 7.0258 | 6000 | 4.6914        |
-| 7.6112 | 6500 | 4.6429        |
-| 8.1967 | 7000 | 4.6607        |
-| 8.7822 | 7500 | 4.6422        |
-| 9.3677 | 8000 | 4.613         |
-| 9.9532 | 8500 | 4.6118        |
 ### Framework Versions

 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
+- dataset_size:9712
 - loss:TripletLoss
+base_model: Syldehayem/all-MiniLM-L12-v2_embedder_train
 widget:
+- source_sentence: CGI 3D Animated Short "The Scarf" - by Team The Scarf
   sentences:
+  - 'CGI 3D Short: "Lenovo Legion: Turning Point" - by Audis Huang & Moonshine Animation
+    | TheCGBros'
+  - 'CGI Animated Trailers : "Dropzone" - by RealtimeUK'
+  - 'CGI 3D Animated Short: "SOLVIVAL" - by Pixelhunters | TheCGBros'
+- source_sentence: CGI Animated Short Film HD "Terazia's Zoo " by Alison Dulou & Estelle
+    Lefebvre | CGMeetup
   sentences:
+  - A comedian puppet decides to branch out on his own / You're The Puppet
+  - Horror Short Film Series “The Outer Darkness” Part 1 | ALTER
+  - ERNIE | Omeleto
+- source_sentence: Kenneth Branagh in the thriller "Schneider's 2nd Stage" - Short
+    film by Phil Stoole
   sentences:
+  - 'CGI 3D Animated Short Film: "Fish in LOVE" by ISArt Digital |  @CGMeetup'
+  - Cookies By The Fire Short Horror Film | Screamfest | Merry Christmas
+  - 'CGI 3D Animated Spot: "Mantse Palm Wine" - by Arnold Bannerman | TheCGBros'
+- source_sentence: The Portrait
   sentences:
+  - A teenage girl must quickly adapt to a radically different urban environment |
+    Barrio Frontera
+  - Queen of Meatloaf | Short film tease
+  - 'CGI 3D Tutorial : "Using Zapplink in Zbrush" - by 3dmotive'
+- source_sentence: Horror Short Film "Nice to Finally Meet You" | ALTER | Online Premiere
   sentences:
+  - 'Mondays: The Spielberg Challenge Winner!'
+  - 'The Curse of Pandora''s Box Returns to #UniversalHHN 2021'
+  - SONS OF APRIL | Omeleto
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
+# SentenceTransformer based on Syldehayem/all-MiniLM-L12-v2_embedder_train
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Syldehayem/all-MiniLM-L12-v2_embedder_train](https://huggingface.co/Syldehayem/all-MiniLM-L12-v2_embedder_train). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+- **Base model:** [Syldehayem/all-MiniLM-L12-v2_embedder_train](https://huggingface.co/Syldehayem/all-MiniLM-L12-v2_embedder_train) <!-- at revision 58956428f2d485efdf2697a1a2cc793795e25057 -->
 - **Maximum Sequence Length:** 128 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
 model = SentenceTransformer("Syldehayem/all-MiniLM-L12-v2_embedder_train")
 # Run inference
 sentences = [
+    'Horror Short Film "Nice to Finally Meet You" | ALTER | Online Premiere',
+    "The Curse of Pandora's Box Returns to #UniversalHHN 2021",
+    'Mondays: The Spielberg Challenge Winner!',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 #### Unnamed Dataset
+* Size: 9,712 training samples
 * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | sentence_0                                                                       | sentence_1                                                                        | sentence_2                                                                        |
+  |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
+  | type    | string                                                                           | string                                                                            | string                                                                            |
+  | details | <ul><li>min: 3 tokens</li><li>mean: 19.7 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 19.91 tokens</li><li>max: 49 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 20.27 tokens</li><li>max: 50 tokens</li></ul> |
 * Samples:
+  | sentence_0                                                                       | sentence_1                                                                              | sentence_2                                                           |
+  |:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|:---------------------------------------------------------------------|
+  | <code>মেয়ে যখন মায়ের মতন | Bidhilipi | #Shorts | Bengali Family Drama</code>     | <code>CGI 3D Animated Shorts: "Rust" - by Matthieu Druaud</code>                        | <code>Mukhyamantri | মুখ্যমন্ত্রী | Bengali Movie Part – 3/12</code> |
+  | <code>A Sci-Fi Short Film: "Voltok" - by Jonathan Vleeschower | TheCGBros</code> | <code>CGI MoCap Demo : "Finger Mocap Without Any Post Animation" by the MocapLab</code> | <code>A MAN DEPARTED | Omeleto Drama</code>                          |
+  | <code>LEAKY PIPES</code>                                                         | <code>Taking care of a baby at 15 | "Fifteen" - Short film by Sameh Alaa</code>         | <code>CGI VFX Spot :  "Black Beetle" by - The MILL</code>            |
 * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
   ```json
   {
 - `per_device_train_batch_size`: 16
 - `per_device_eval_batch_size`: 16
+- `num_train_epochs`: 50
 - `multi_dataset_batch_sampler`: round_robin
 #### All Hyperparameters
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1
+- `num_train_epochs`: 50
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
 </details>
 ### Training Logs
+| Epoch   | Step  | Training Loss |
+|:-------:|:-----:|:-------------:|
+| 0.8237  | 500   | 5.0075        |
+| 1.6474  | 1000  | 4.9816        |
+| 2.4712  | 1500  | 5.013         |
+| 3.2949  | 2000  | 4.981         |
+| 4.1186  | 2500  | 4.9981        |
+| 4.9423  | 3000  | 4.9727        |
+| 5.7661  | 3500  | 4.9698        |
+| 6.5898  | 4000  | 4.9839        |
+| 7.4135  | 4500  | 5.0001        |
+| 8.2372  | 5000  | 4.9996        |
+| 9.0610  | 5500  | 4.9993        |
+| 9.8847  | 6000  | 4.9999        |
+| 10.7084 | 6500  | 5.0015        |
+| 11.5321 | 7000  | 4.9934        |
+| 12.3558 | 7500  | 4.9903        |
+| 13.1796 | 8000  | 4.9875        |
+| 14.0033 | 8500  | 5.0018        |
+| 14.8270 | 9000  | 5.0088        |
+| 15.6507 | 9500  | 4.9643        |
+| 16.4745 | 10000 | 4.9447        |
+| 17.2982 | 10500 | 4.8911        |
+| 18.1219 | 11000 | 4.8719        |
+| 18.9456 | 11500 | 4.8671        |
+| 19.7694 | 12000 | 4.8268        |
+| 20.5931 | 12500 | 4.8195        |
+| 21.4168 | 13000 | 4.7726        |
+| 22.2405 | 13500 | 4.7479        |
+| 23.0643 | 14000 | 4.7465        |
+| 23.8880 | 14500 | 4.7776        |
+| 24.7117 | 15000 | 4.7366        |
+| 25.5354 | 15500 | 4.7076        |
+| 26.3591 | 16000 | 4.74          |
+| 27.1829 | 16500 | 4.7118        |
+| 28.0066 | 17000 | 4.6797        |
+| 28.8303 | 17500 | 4.7144        |
+| 29.6540 | 18000 | 4.662         |
+| 30.4778 | 18500 | 4.6849        |
+| 31.3015 | 19000 | 4.6608        |
+| 32.1252 | 19500 | 4.6844        |
+| 32.9489 | 20000 | 4.6561        |
+| 33.7727 | 20500 | 4.6513        |
+| 34.5964 | 21000 | 4.6418        |
+| 35.4201 | 21500 | 4.635         |
+| 36.2438 | 22000 | 4.6418        |
+| 37.0675 | 22500 | 4.62          |
+| 37.8913 | 23000 | 4.615         |
+| 38.7150 | 23500 | 4.6189        |
+| 39.5387 | 24000 | 4.6113        |
+| 40.3624 | 24500 | 4.6054        |
+| 41.1862 | 25000 | 4.5824        |
+| 42.0099 | 25500 | 4.5907        |
+| 42.8336 | 26000 | 4.5949        |
+| 43.6573 | 26500 | 4.5769        |
+| 44.4811 | 27000 | 4.5758        |
+| 45.3048 | 27500 | 4.5613        |
+| 46.1285 | 28000 | 4.5816        |
+| 46.9522 | 28500 | 4.5538        |
+| 47.7759 | 29000 | 4.5645        |
+| 48.5997 | 29500 | 4.5653        |
+| 49.4234 | 30000 | 4.5494        |
 ### Framework Versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a918830d1c43472f6d370ac225943262274803f3e69eb4e4035a76cfeb339374
 size 133462128

 version https://git-lfs.github.com/spec/v1
+oid sha256:8a0ef17f513afbe54faae7df152aa8782ec9c31ce60484187db0018d367f169a
 size 133462128