SentenceTransformer based on google/embeddinggemma-300m

This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the discord_sum dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google/embeddinggemma-300m
  • Maximum Sequence Length: 2048 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Plasmoxy/embeddinggemma-300m-DiscordSum-RC-PD58")
# Run inference
queries = [
    "new large biome map with 15k15k render distance for future event",
]
documents = [
    '[F]: Beep\n[D]: Idk what math that is :a_cute_dead_scared_die_no_help:\n[F]: Geometric sequence\n[D]: Sounds familiar\n[F]: Or geometric progression is also a term apperantly\n[D]: Nvm never heard of it\n[B]: oh these\n[B]: i think i know\n[B]: but since im french\n[B]: i dont want to be wrong so idk\n[E]: Ooh i know that\n[E]: The geometric equation\n[E]: A6= blah blah blah\n[B]: spawn\n[C]: Bootiful\n[B]: the map is in large biome\n[B]: and that the very first biome\n[B]: 15k - 15k render distance\n[C]: I love the colours\n[C]: its so green\n[B]: there is even a good place for a futur event\n[B]: yk the season thingy\n[B]: just scared about how the server will handle the map\n[B]: cuz if i want a good map, gotta be to 10k-10k to 15k-15k\n[B]: the v3 was max 2k-2k\n[B]: so its wayy bigger\n[A]: GODOT HOLY FUC- Been so long since i have used it',
    "[C]: First time making overnight oats. Godspeed 🫡\n[C]: Looks like cappuccino. Tastes like heart attack speedrun.\n[A]: 1.5 tbsp of honey each is quite a bit\n[A]: I use flavoured protein so I usually don't use any sweetener\n[C]: Ahhh thank you, in that case I know better for next time! 😁\n[C]: I used cookie flavoured protein powder it's gonna be so yum heheh\n[A]: Wooo\n[B]: I'm kinda sad that I don't like oatmeal (texture being the main component).\n[D]: Don't be. I actually like oatmeal but it makes me go to the bathroom once every hour afterwards.\n[D]: So don't be sad, you're sparing yourself from that",
    '[F]: Impressive\n[O]: they said "porting", not adding it to the nintendo service\n[U]: wake me up when they do WC09, WC10, WC11 collection\n[O]: "and other titles"\n[O]: forbidden memories remastered\n[O]: which would fit well with the new Millennium archetype, based on the guy you meet at the start of the game explaining the story (and you running away to play card games)\n[O]: simon?\n[D]: I still have my original copy of Forbidden Memories\n[D]: It\'s too bad a compilation of those Yu-Gi-Oh! games won\'t make it outside Japan since some are Japanese exclusive. Unless they make a separate compilation for an international release with what was initially released outside Japan.\n[N]: Falsebound Kingdom Remaster pls\n[T]: should i be corncerned if i.. ||have a friend who commisioned an artist to draw a picture of them doing lewd things to BN characters|| :GeoDoom:\n[N]: I mean\n[N]: Depends on which characters but I\'m gonna go with probably yeah\n[P]: yes\n[S]: yes\n[D]: I bet it\'s Wily\n[N]: If its Wily its ok\n[D]: NO, NOT OK. I WAS JOKING\n[G]: at least hes of age\n[D]: That\'s enough Internet for today\n[T]: indeed\n[N]: Wily origin story\n[N]: That\'s why he wants Net Society ended\n[D]: lol\n[D]: That\'s one way of taking down Rule 34, take the rest of net society with it\n[D]: Everyone else taking him down are the true villains\n[M]: Bring all the Tag Force games to Steam please\n\nAnd translate the last two\n[N]: I love Eternal Duelist Soul which was Expert 5 but I get that 6 was better with everything lol\n[N]: I also liked one of the DS ones\n[N]: I think it was the 2008 one\n[E]: i played the shit out of the duel academy one lol\n[E]: or was that gba? idr\n[C]: The best thing EDS has was the simple opponent selection from the GB DM games. A simple selection like that stopped existing for years until WC2006 for some reason\n[C]: Play the game that inspired mega man battle network\n[J]: forbidden memories is a good game until the devs remember how much they hate your guts\n[V]: this sounds great I will play this despite knowing nothing about yugioh and im sure nothing will go wrong\n[J]: ~~i would want theoretical ygo game remasters to have reprints of the cards that came with them for hard copy editions~~\n[T]: i\'m missing out\n[N]: When the enemy just casually summons Gate Guardian on you\n[S]: what is this japanese mobile phone port ass edit lol\n[Q]: Wair that exists???\n[N]: Interesting game\n[N]: I never owned it myself but my friend did\n[N]: I always thought DarkNite and Nitemare were in the anime until they never showed up lol\n[H]: Finally played through wily wars\n[H]: Cute game\n[H]: Very glad I waited until the nso version that fixes the performance\n[H]: I found out later though that I wasn’t crazy and they removed quick man’s buster weakness in it for some godforsaken reason\n[H]: and his AI was fixed too so he doesn’t run into walls but is also broken bc his pattern randomizes\n[C]: the original gameboy and gbc yugioh games play like sacred cards but no story\n[C]: they were made before the 2nd anime even aired and before the official card rules were finalized lol\n[N]: Yeah its fun seeing the wacky rules\n[H]: Wasn’t one of them the only way to get an exodia piece\n[N]: Forbidden Memories fusion and astrological sign stuff was interesting\n[N]: Its kinda similar to how the manga card game worked at first since they had elemental weakness and stuff there\n[C]: the tokyo dome riot was where the exodia piece was at\n[H]: Oh wait it had harpy’s\n[H]: That’s what I was thinking of\n[H]: That one gbc game casually introducing a very fucked up card even to this day\n[H]: Ygo is so funny as an extremely casual onlooker because I’ll post a card being like “this little guy rules” and my friends will trauma dump on how it burned their water supplies\n[N]: Something Something Yatagarasu\n[B]: Melffys going into Zeus\n[N]: Oh damn Reload isn\'t forgiving on a Game Over lol\n[N]: Reload when you first entered Tartarus or Title Screen :EguchiLOL:\n[N]: And I got one shot from full health, the true P3 experience\n[K]: Or the "normal way"\nAlt + F4 + "F*uck you stupid game, piece of sh1t" :beluga:\n[S]: aww hell yeah they didn\'t bitch out lets goooo\n[N]: holy shit I just realized I got hit with multiple crits lol\n[K]: Why eng dub ._.\n[N]: why not\n[K]: Because japanese dub\n[N]: eh, unless it\'s really bad I prefer being able to understand the game :BarylSmile:\n[S]: the true persona 3 experience has been preserved\n[S]: the world order is correct\n[N]: Speaking of world order\n[A]: I hate that Makoto face in the second pic\n[K]: Where is Victory Cry :pleure:\n[N]: There\'s an edit of it with a bigger chin lol\n[R]: I think these va’s are pretty good\n[R]: I wonder what Takaya sounds like :EguchiApprove:\n[A]: Agree to disagree\n[R]: Okay, yeah, *some* of them aren’t on par with the OG’s.\n[R]: But man, I love Allegra Clark voicing Mitsuru. It’s perfect.\n[R]: Derek Stephen Prince was **marvelous** as Takaya.\nI’ve yet to hear this new VA for him, so I’m a little excited to see what voice he went with.\n[A]: Correction, Derek Stephen Prince is marvelous.\n[R]: Yes. You’re right, how foolish of me.\n[N]: Wasn\'t a fan of new Akihiko but I got used to him\n[A]: See, it\'d be fine if this were much older Akihiko, but it\'s not, it\'s the exact same character\n[R]: Want to have your mind blown?\n[R]: The ENG VA for Ikutsuki voiced Michael MORBIUS in a Marvel game.\n[I]: I-\n[A]: You mean, the *Marvel Legend himself, Morbius* :EguchiTroll:\n[I]: XDDDD I DON\'T KNOW MUCH ON PERSONA, LET ALONE P3 AND JUST- OH MY GOSH\n[A]: Excellent news\n[A]: Guess my favorite part will be when Ikutsuki says "It\'s Iku time!" and he Ikus all over those bitches\n[L]: they should just have Bill Farmer voice all the male characters\n[R]: I unironically love Ikutsuki though\n[R]: Idk why my brain gaslit itself into thinking that her name was Chiori..\n[S]: chihiro is a funny character\n[L]: NovaMan should rename themselves to PersonaMan\n[A]: Would be funny, but it loses the joke behind his name for those who understand',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.5380, -0.0133, -0.1024]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.164
cosine_accuracy@3 0.479
cosine_accuracy@5 0.77
cosine_accuracy@10 0.931
cosine_precision@1 0.164
cosine_precision@3 0.1597
cosine_precision@5 0.154
cosine_precision@10 0.0931
cosine_recall@1 0.164
cosine_recall@3 0.479
cosine_recall@5 0.77
cosine_recall@10 0.931
cosine_ndcg@10 0.5159
cosine_mrr@10 0.3851
cosine_map@10 0.3851

Training Details

Training Dataset

discord_sum

  • Dataset: discord_sum at 3d21c9e
  • Size: 243,787 training samples
  • Columns: query and document
  • Approximate statistics based on the first 1000 samples:
    query document
    type string string
    details
    • min: 5 tokens
    • mean: 14.96 tokens
    • max: 113 tokens
    • min: 107 tokens
    • mean: 931.55 tokens
    • max: 2048 tokens
  • Samples:
    query document
    where XP product key is OEM and already activated but still usable [J]: Nice XP Product Key
    [I]: it's OEM only
    [I]: good luck using it LOL
    [J]: Yes but
    [I]: it's also already activated
    [J]: Shouldn't be revaling it and you can use it even if your not a OEM
    [J]: Nah fr? Did you forget you can reuse keys?
    [I]: not if it's already activated?
    [I]: i really don't give a shit though, use it if you want
    [J]: I've used the same product key for Pro editions of Windows
    [I]: ah
    [J]: And it was already active
    [I]: u can use it if you want, idrc that much lol
    [K]: If I ever need to run XP for some reason I'll recall it and I'll use it without your knowledge 😈
    [I]: go for it
    [J]: Kinda downloaded the image for it :KEKW:
    [I]: haven't touched this computer in years, all yours
    [I]: in fact i'm thinking of getting linux
    [J]: Do it lol
    [J]: With 512mb of RAM
    [I]: lmao you wouldn't run high end demanding linux distros, good fucking luck! i'd run something older and less demanding - tiny linux
    [J]: LInux probably going to run better than Windows
    [I]: by a long shot
    [A]: o...
    linux migration discussed with recommendations for void linux debian 11 and tinycore [J]: Nice XP Product Key
    [I]: it's OEM only
    [I]: good luck using it LOL
    [J]: Yes but
    [I]: it's also already activated
    [J]: Shouldn't be revaling it and you can use it even if your not a OEM
    [J]: Nah fr? Did you forget you can reuse keys?
    [I]: not if it's already activated?
    [I]: i really don't give a shit though, use it if you want
    [J]: I've used the same product key for Pro editions of Windows
    [I]: ah
    [J]: And it was already active
    [I]: u can use it if you want, idrc that much lol
    [K]: If I ever need to run XP for some reason I'll recall it and I'll use it without your knowledge 😈
    [I]: go for it
    [J]: Kinda downloaded the image for it :KEKW:
    [I]: haven't touched this computer in years, all yours
    [I]: in fact i'm thinking of getting linux
    [J]: Do it lol
    [J]: With 512mb of RAM
    [I]: lmao you wouldn't run high end demanding linux distros, good fucking luck! i'd run something older and less demanding - tiny linux
    [J]: LInux probably going to run better than Windows
    [I]: by a long shot
    [A]: o...
    Lenovo Yoga 370 metal hinges and custom server rack with good cable management shared [J]: Nice XP Product Key
    [I]: it's OEM only
    [I]: good luck using it LOL
    [J]: Yes but
    [I]: it's also already activated
    [J]: Shouldn't be revaling it and you can use it even if your not a OEM
    [J]: Nah fr? Did you forget you can reuse keys?
    [I]: not if it's already activated?
    [I]: i really don't give a shit though, use it if you want
    [J]: I've used the same product key for Pro editions of Windows
    [I]: ah
    [J]: And it was already active
    [I]: u can use it if you want, idrc that much lol
    [K]: If I ever need to run XP for some reason I'll recall it and I'll use it without your knowledge 😈
    [I]: go for it
    [J]: Kinda downloaded the image for it :KEKW:
    [I]: haven't touched this computer in years, all yours
    [I]: in fact i'm thinking of getting linux
    [J]: Do it lol
    [J]: With 512mb of RAM
    [I]: lmao you wouldn't run high end demanding linux distros, good fucking luck! i'd run something older and less demanding - tiny linux
    [J]: LInux probably going to run better than Windows
    [I]: by a long shot
    [A]: o...
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 48,
        "gather_across_devices": false
    }
    

Evaluation Dataset

discord_sum

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 384
  • per_device_eval_batch_size: 384
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • dataloader_drop_last: True
  • dataloader_num_workers: 8
  • dataloader_pin_memory: False
  • prompts: {'query': 'task: search result | query: ', 'document': 'title: none | text: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 384
  • per_device_eval_batch_size: 384
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 8
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: False
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: {'query': 'task: search result | query: ', 'document': 'title: none | text: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss discord_ir_eval_cosine_ndcg@10
0.0158 10 2.5664 - -
0.0315 20 2.0376 - -
0.0473 30 1.5393 - -
0.0631 40 1.3381 - -
0.0789 50 1.2515 - -
0.0946 60 1.18 - -
0.1104 70 1.0803 - -
0.1262 80 1.0435 - -
0.1420 90 1.0492 - -
0.1577 100 1.0053 0.9939 0.5083
0.1735 110 1.017 - -
0.1893 120 0.9814 - -
0.2050 130 0.9895 - -
0.2208 140 0.9719 - -
0.2366 150 0.9824 - -
0.2524 160 0.9602 - -
0.2681 170 0.9875 - -
0.2839 180 0.9476 - -
0.2997 190 0.9297 - -
0.3155 200 0.9047 0.9026 0.5100
0.3312 210 0.9411 - -
0.3470 220 0.9395 - -
0.3628 230 0.9147 - -
0.3785 240 0.9398 - -
0.3943 250 0.935 - -
0.4101 260 0.8673 - -
0.4259 270 0.8771 - -
0.4416 280 0.9182 - -
0.4574 290 0.8744 - -
0.4732 300 0.8511 0.8605 0.5155
0.4890 310 0.9027 - -
0.5047 320 0.8582 - -
0.5205 330 0.8991 - -
0.5363 340 0.8903 - -
0.5521 350 0.863 - -
0.5678 360 0.8429 - -
0.5836 370 0.8609 - -
0.5994 380 0.8184 - -
0.6151 390 0.9005 - -
0.6309 400 0.853 0.8363 0.5178
0.6467 410 0.8573 - -
0.6625 420 0.8761 - -
0.6782 430 0.9104 - -
0.6940 440 0.8221 - -
0.7098 450 0.8066 - -
0.7256 460 0.8495 - -
0.7413 470 0.8726 - -
0.7571 480 0.8492 - -
0.7729 490 0.8683 - -
0.7886 500 0.817 0.8236 0.5149
0.8044 510 0.8442 - -
0.8202 520 0.8626 - -
0.8360 530 0.821 - -
0.8517 540 0.8213 - -
0.8675 550 0.8556 - -
0.8833 560 0.8303 - -
0.8991 570 0.8374 - -
0.9148 580 0.8545 - -
0.9306 590 0.832 - -
0.9464 600 0.8904 0.8197 0.5159
0.9621 610 0.8581 - -
0.9779 620 0.8152 - -
0.9937 630 0.8459 - -

Framework Versions

  • Python: 3.11.5
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.3
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.4.2
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
97
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Plasmoxy/embeddinggemma-300m-DiscordSum-RC-PD58

Finetuned
(165)
this model

Dataset used to train Plasmoxy/embeddinggemma-300m-DiscordSum-RC-PD58

Evaluation results