ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("PremkumarHF1/modernbert-embed-base-legal-matryoshka-2")
# Run inference
sentences = [
    'What is appropriate if the entire substance of Document 3 is reflected in publicly available meeting minutes?',
    '72 Portions of Document 3 may not have been disclosed in the meeting minutes submitted by the plaintiff and thus \nneed not be disclosed to the plaintiff. On the other hand, disclosure of Document 3 in its entirety is appropriate if the \nentire substance of which is reflected in those publicly available meeting minutes. \n142',
    'KLAN202300916 \n \n \n \n \n9\nLos derechos morales, a su vez, están fundamentalmente \nprotegidos por la legislación estatal. Esta reconoce los derechos de \nlos autores como exclusivos de estos y los protege no solo en \nbeneficio propio, sino también de la sociedad por la contribución \nsocial y cultural que históricamente se le ha reconocido a la',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7694, 0.0882],
#         [0.7694, 1.0000, 0.0689],
#         [0.0882, 0.0689, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5487
cosine_accuracy@3 0.5935
cosine_accuracy@5 0.6909
cosine_accuracy@10 0.7573
cosine_precision@1 0.5487
cosine_precision@3 0.509
cosine_precision@5 0.3954
cosine_precision@10 0.2325
cosine_recall@1 0.204
cosine_recall@3 0.5089
cosine_recall@5 0.6376
cosine_recall@10 0.7424
cosine_ndcg@10 0.6524
cosine_mrr@10 0.5959
cosine_map@100 0.6368

Information Retrieval

Metric Value
cosine_accuracy@1 0.5394
cosine_accuracy@3 0.5811
cosine_accuracy@5 0.6708
cosine_accuracy@10 0.7573
cosine_precision@1 0.5394
cosine_precision@3 0.5008
cosine_precision@5 0.3858
cosine_precision@10 0.2318
cosine_recall@1 0.1998
cosine_recall@3 0.5006
cosine_recall@5 0.623
cosine_recall@10 0.7414
cosine_ndcg@10 0.6451
cosine_mrr@10 0.5865
cosine_map@100 0.6264

Information Retrieval

Metric Value
cosine_accuracy@1 0.5039
cosine_accuracy@3 0.5379
cosine_accuracy@5 0.6414
cosine_accuracy@10 0.7172
cosine_precision@1 0.5039
cosine_precision@3 0.4683
cosine_precision@5 0.3641
cosine_precision@10 0.2202
cosine_recall@1 0.1856
cosine_recall@3 0.4674
cosine_recall@5 0.5894
cosine_recall@10 0.7072
cosine_ndcg@10 0.6103
cosine_mrr@10 0.5513
cosine_map@100 0.5938

Information Retrieval

Metric Value
cosine_accuracy@1 0.4374
cosine_accuracy@3 0.4791
cosine_accuracy@5 0.5688
cosine_accuracy@10 0.6538
cosine_precision@1 0.4374
cosine_precision@3 0.4091
cosine_precision@5 0.3221
cosine_precision@10 0.1964
cosine_recall@1 0.1625
cosine_recall@3 0.4116
cosine_recall@5 0.5259
cosine_recall@10 0.6346
cosine_ndcg@10 0.5405
cosine_mrr@10 0.4846
cosine_map@100 0.529

Information Retrieval

Metric Value
cosine_accuracy@1 0.3292
cosine_accuracy@3 0.357
cosine_accuracy@5 0.4328
cosine_accuracy@10 0.51
cosine_precision@1 0.3292
cosine_precision@3 0.3019
cosine_precision@5 0.2386
cosine_precision@10 0.1549
cosine_recall@1 0.1262
cosine_recall@3 0.3105
cosine_recall@5 0.3936
cosine_recall@10 0.4985
cosine_ndcg@10 0.418
cosine_mrr@10 0.3681
cosine_map@100 0.4148

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 5,822 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 16.7 tokens
    • max: 38 tokens
    • min: 26 tokens
    • mean: 96.94 tokens
    • max: 153 tokens
  • Samples:
    anchor positive
    What does EPIC agree about its FOIA request? other documents which were made available to or prepared for or by” the Commission, a direct
    quotation from section 10(b) of FACA. Pl.’s Mot. Exs. at 21. EPIC agrees that its FOIA request
    “exactly track[s] the language of FACA § 10(b)”—i.e., that its FOIA request is meant to be
    coterminous with FACA’s parameters. Pl.’s Mem. at 24; Pl.’s Reply at 9.
    25
    What specific finding does Sussman, 494 F.3d at 1116 emphasize that the district court must make? 79 The Court need not assess the segregability efforts of the NSA because the plaintiff does not challenge any
    withholding decisions made by the NSA, and thus the Court need not review any such withholding decisions. See,
    e.g., Sussman, 494 F.3d at 1116 (holding that “the district court must make specific findings of segregability
    Who expressed a viewpoint on the application of the reasonable juror test? test used by United States Courts of Appeals for authentication of social media evidence
    because it was consistent with Maryland Rule 5-901. See id. at 366, 19 A.3d at 429
    (Harrell, J., dissenting). Judge Harrell explained that, in his view, applying the reasonable
    juror test would have led to the conclusion that the social media evidence at issue was
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • bf16: True
  • tf32: False
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0.1
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: False
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0879 1 5.7621 - - - - -
0.1758 2 5.9431 - - - - -
0.2637 3 5.9474 - - - - -
0.3516 4 5.4979 - - - - -
0.4396 5 4.7991 - - - - -
0.5275 6 4.7927 - - - - -
0.6154 7 3.4607 - - - - -
0.7033 8 2.9653 - - - - -
0.7912 9 3.3088 - - - - -
0.8791 10 2.7624 - - - - -
0.9670 11 3.0379 - - - - -
1.0 12 2.5283 0.5986 0.5935 0.5643 0.4801 0.3721
1.0879 13 2.4313 - - - - -
1.1758 14 2.4523 - - - - -
1.2637 15 2.2690 - - - - -
1.3516 16 1.7914 - - - - -
1.4396 17 2.1696 - - - - -
1.5275 18 1.8344 - - - - -
1.6154 19 1.9749 - - - - -
1.7033 20 2.0728 - - - - -
1.7912 21 1.8242 - - - - -
1.8791 22 1.9102 - - - - -
1.9670 23 1.8151 - - - - -
2.0 24 1.8869 0.6457 0.6404 0.6092 0.5305 0.4135
2.0879 25 1.5929 - - - - -
2.1758 26 1.5348 - - - - -
2.2637 27 1.6101 - - - - -
2.3516 28 1.5381 - - - - -
2.4396 29 1.5966 - - - - -
2.5275 30 1.8647 - - - - -
2.6154 31 1.6108 - - - - -
2.7033 32 1.3501 - - - - -
2.7912 33 1.4097 - - - - -
2.8791 34 1.4909 - - - - -
2.9670 35 1.6101 - - - - -
3.0 36 1.9478 0.6506 0.6433 0.6093 0.5422 0.4167
3.0879 37 1.5579 - - - - -
3.1758 38 1.4603 - - - - -
3.2637 39 1.5181 - - - - -
3.3516 40 1.4586 - - - - -
3.4396 41 1.2483 - - - - -
3.5275 42 1.3902 - - - - -
3.6154 43 1.2197 - - - - -
3.7033 44 1.4976 - - - - -
3.7912 45 1.3860 - - - - -
3.8791 46 1.4929 - - - - -
3.9670 47 1.3975 - - - - -
4.0 48 1.4246 0.6524 0.6451 0.6103 0.5405 0.4180
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.2
  • Transformers: 5.0.0
  • PyTorch: 2.9.0+cu128
  • Accelerate: 1.12.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PremkumarHF1/modernbert-embed-base-legal-matryoshka-2

Finetuned
(109)
this model

Papers for PremkumarHF1/modernbert-embed-base-legal-matryoshka-2

Evaluation results