Instructions to use RikoteMaster/MNLP_M2_document_encoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RikoteMaster/MNLP_M2_document_encoder with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("RikoteMaster/MNLP_M2_document_encoder")

sentences = [
"##inemia a the the upper limit of normal concentra - b success was defined or at least success menstrual cycles without of cycle. the most from the trial were dizziness, reproduced, comparison bromocriptine the hyperpro n j med ; 904. )",
"the yearly inci - dence of symptomatic gallstones is about 1 %. cardiac effects include sinus bradycardia ( 25 % ) and conduction disturbances ( 10 % ). pain at the site of injection is common, especially with the long - acting octreotide suspension. vitamin b 12 deficiency may occur with long - term use of octreotide. a long - acting formulation of lanreotide, another octapeptide somatostatin analog, was approved by the fda in 2007 for treat - ment of acromegaly. lanreotide appears to have effects comparable to those of octreotide on reducing gh levels and normalizing igf - i concentrations. pegvisomant pegvisomant is a gh receptor antagonist used to treat acromegaly. it is the polyethylene glycol ( peg ) derivative of a mutant gh, b2036. like native gh, pegvisomant has two gh receptor bind - ing sites. however, one of the pegvisomant gh receptor binding sites has increased affinity for the gh receptor, whereas its second gh receptor binding site has reduced affinity. this differential receptor affinity allows the initial step ( gh receptor dime",
"##inemia and anovulation. a : the dotted line indicates the upper limit of normal serum prolactin concentra - tions. b : complete success was defined as pregnancy or at least two consecutive menses with evidence of ovulation at least once. partial success was two menstrual cycles without evidence of ovulation or just one ovulatory cycle. the most common reasons for withdrawal from the trial were nausea, headache, dizziness, abdominal pain, and fatigue. ( modified and reproduced, with permission, from webster j et al : a comparison of cabergoline and bromocriptine in the treatment of hyperpro - lactinemic amenorrhea. n engl j med 1994 ; 331 : 904. )",
"compounds are shown in figure 38 – 5. the thiocarbamide group is essential for antithyroid activity. pharmacokinetics methimazole is completely absorbed but at variable rates. it is readily accumulated by the thyroid gland and has a volume of distribution similar to that of propylthiouracil. excretion is slower than with propylthiouracil ; 65 – 70 % of a dose is recovered in the urine in 48 hours. in contrast, propylthiouracil is rapidly absorbed, reaching peak serum levels after 1 hour. the bioavailability of 50 – 80 % may be due to incomplete absorption or a large first - pass effect in the liver. the volume of distribution approximates total body water with accumulation in the thyroid gland. most of an ingested dose of propylthiouracil is excreted by the kidney as the inactive glucuronide within 24 hours. the short plasma half - life of these agents ( 1. 5 hours for propyl - thiouracil and 6 hours for methimazole ) has little influence on the duration of the antithyroid action or the dosing interval because both agents are accumulated by the thyroid gland. for propyl -"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("RikoteMaster/retriever_pdf_and_books")
# Run inference
sentences = [
    'chapter drugs of 571 genetic by comparing relatively modest for very high of the relative liability of a drug – its heritability, that basis of common all is being genomic indicates that only perhaps even allele in combination phenotype. involved remains elusive. some substance - have identified ( dehydrogenase ), future will also focus on the mechanisms all drugs of abuse some not for substances without reward such the and the dissocia anesthetics ( drugs, primarily',
    'chapter 32 drugs of abuse 571 of environmental and genetic factors. heritability of addiction, as determined by comparing monozygotic with dizygotic twins, is relatively modest for cannabinoids but very high for cocaine. it is of interest that the relative risk for addiction ( addiction liability ) of a drug ( table 32 – 1 ) correlates with its heritability, suggesting that the neurobiologic basis of addiction common to all drugs is what is being inherited. further genomic analysis indicates that only a few alleles ( or perhaps even a single recessive allele ) need to function in combination to produce the phenotype. however, identification of the genes involved remains elusive. although some substance - specific candidate genes have been identified ( eg, alcohol dehydrogenase ), future research will also focus on genes implicated in the neurobiologic mechanisms common to all addictive drugs. nonaddictive drugs of abuse some drugs of abuse do not lead to addiction. this is the case for substances that alter perception without causing sensations of reward and euphoria, such as the hallucinogens and the dissocia - tive anesthetics ( table 32 – 1 ). unlike addictive drugs, which primarily',
    '602 section vi drugs used to treat diseases of the blood, inflammation, & gout amputation or organ failure. venous clots tend to be more fibrin - rich, contain large numbers of trapped red blood cells, and are recognized pathologically as red thrombi. venous thrombi can cause severe swelling and pain of the affected extremity, but the most feared consequence is pulmonary embolism. this occurs when part or all of the clot breaks off from its location in the deep venous system and travels as an embolus through the right side of the heart and into the pulmonary arterial circulation. sudden occlusion of a large pulmonary artery can cause acute right heart failure and sudden death. in addition lung ischemia or infarction will occur distal to the occluded pulmonary arterial segment. such emboli usually arise from the deep venous system of the proximal lower extremities or pelvis. although all thrombi are mixed, the platelet nidus dominates the arterial thrombus and the fibrin tail dominates the venous thrombus. blood coagulation cascade blood coagulates due to the transformation of soluble fibrinogen into insoluble fi',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 57,126 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 3 tokens
mean: 58.68 tokens
max: 133 tokens

min: 11 tokens
mean: 143.97 tokens
max: 256 tokens

	anchor	positive
type	string	string
details	min: 3 tokens mean: 58.68 tokens max: 133 tokens	min: 11 tokens mean: 143.97 tokens max: 256 tokens

Samples:

anchor	positive
`advanced march lecture : lps weights ola svensson1 this lecture do the : we ( actually hedge method. solve lps. fast very solving lps approximately. version 11 of topics in tcs, ” were simon rodriguez the by that used in the last lecture. recall last the lecture, saw how fairly follow the of recall that game t and n experts was : for : i ∈ n gives up or ) based the expert, up 3. with the expert advice ’ decides the up / down`	advanced algorithms march 22, 2022 lecture 9 : solving lps using multiplicative weights notes by ola svensson1 in this lecture we do the following : • we describe the multiplicative weight update ( actually hedge ) method. • we then use this method to solve covering lps. • this is a very fast and simple ( i. e., very attractive ) method for solving these lps approximately. these lecture notes are partly based on an updated version of “ lecture 11 of topics in tcs, 2015 ” that were written by vincent eggerling and simon rodriguez and on the lecture notes by shiva kaul that we used in the last lecture. 1 recall last lecture in the previous lecture, we saw how to use the weighted majority method in order to fairly smartly follow the advice of experts. recall that the general game - setting with t days and n experts was as follows : for t = 1,..., t : 1. each expert i ∈ [ n ] gives some advice : up or down 2. aggregator ( you ) predicts, based on the advice of the expert, up or down. 3. ad...
`or down predicts, up down. adversary, of expert the the / down 4. aggregator the parameterized > 0 “ rate now as : • expert i ( 1. ( experts are in begin - at each t • predict / based weighted vote w = w t w n observing the set ( t i = w ( i · ( 1−ε i ] was ( trustworthiness experts. lecture the when / 2. the sequence of outcomes, t, and expert ∈ [ n ], of wm mistakes ≤2`	up or down 2. aggregator ( you ) predicts, based on the advice of the expert, up or down. 3. adversary, with knowledge of the expert advice and the aggregator ’ s decision, decides the up / down outcome. 4. aggregator observes the outcome and [UNK] if his prediction was incorrect. the weighted majority algorithm, parameterized by [UNK] > 0 ( the “ learning rate ” ), now works as follows : • assign each expert i a weight w ( 1 ) i initialized to 1. ( all experts are equally trustworthy in the begin - ning. ) at each time t : • predict up / down based on a weighted majority vote per w ( t ) = ( w ( t ) 1,..., w ( t ) n ). • after observing the cost vector, set w ( t + 1 ) i = w ( t ) i · ( 1−ε ) for every expert i ∈ [ n ] whose prediction was wrong. ( discount the trustworthiness of erroneous experts. ) last lecture we analyzed the case when [UNK] = 1 / 2. the same proof gives the following theorem 1 for any sequence of outcomes, duration t, and expert i ∈ [ n ], # of wm mistakes ≤2
`) last lecture analyzed [UNK] = 1 / 2. the following theorem sequence outcomes, duration t, and n wm ≤2 [UNK] ( ’ + o ( ( ). notes as for lecturer. have not inconsistent omit citations 1`	`##roneous experts. ) last lecture we analyzed the case when [UNK] = 1 / 2. the same proof gives the following theorem 1 for any sequence of outcomes, duration t, and expert i ∈ [ n ], # of wm mistakes ≤2 ( 1 + [UNK] ) · ( # of i ’ s mistakes ) + o ( log ( n ) / [UNK] ). 1disclaimer : these notes were written as notes for the lecturer. they have not been peer - reviewed and may contain inconsistent notation, typos, and omit citations of relevant works. 1`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 6,348 evaluation samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 14 tokens
mean: 97.31 tokens
max: 142 tokens

min: 53 tokens
mean: 238.84 tokens
max: 256 tokens

	anchor	positive
type	string	string
details	min: 14 tokens mean: 97.31 tokens max: 142 tokens	min: 53 tokens mean: 238.84 tokens max: 256 tokens

Samples:

anchor	positive
`more at gesic doses. be predominantly however, it also the μ agonist weak or partial nist it mixed available. it used orally however, its injection miscellaneous tramadol is on blockade has been norepinephrine function. because it partially is μ agonist. the recommended mg orally times daily. association with ; drug contraindicated in history of epilepsy with that lower the serious risk the development sero toni`	##phine but appears to produce more sedation at equianal - gesic doses. butorphanol is considered to be predominantly a κ agonist. however, it may also act as a partial agonist or antagonist at the μ receptor. benzomorphans pentazocine is a κ agonist with weak μ - antagonist or partial ago - nist properties. it is the oldest mixed agent available. it may be used orally or parenterally. however, because of its irritant properties, the injection of pentazocine subcutaneously is not recommended. miscellaneous tramadol is a centrally acting analgesic whose mechanism of action is predominantly based on blockade of serotonin reuptake. tramadol has also been found to inhibit norepinephrine transporter function. because it is only partially antagonized by naloxone, it is believed to be only a weak μ - receptor agonist. the recommended dosage is 50 – 100 mg orally four times daily. toxicity includes association with seizures ; the drug is relatively contraindicated in patients with a history of...
`##ly four times daily. toxicity includes relatively in a of the serious is the of - inhibitor ( ). typically abate after several days of is no clinically respiration or tem thus far given that action of tramadol largely - serve as an adjunct pure opioid treatment of chronic is newer with modest μ significant norepinephrine - inhibiting models, its effects moderately by naloxone but reduced adrenoceptor antagonist. furthermore, norepinephrine`	##ly four times daily. toxicity includes association with seizures ; the drug is relatively contraindicated in patients with a history of epilepsy and for use with other drugs that lower the seizure threshold. another serious risk is the development of sero - tonin syndrome, especially if selective serotonin reuptake inhibitor ( ssri ) antidepressants are being administered ( see chapter 16 ). other side effects include nausea and dizziness, but these symptoms typically abate after several days of therapy. it is surprising that no clinically significant effects on respiration or the cardiovascular sys - tem have thus far been reported. given the fact that the analgesic action of tramadol is largely independent of μ - receptor action, tra - madol may serve as an adjunct with pure opioid agonists in the treatment of chronic neuropathic pain. tapentadol is a newer analgesic with modest μ - opioid receptor affinity and significant norepinephrine reuptake - inhibiting action. in animal mode...
`- action. in its analgesic effects were moderately by strongly adrenoceptor antagonist. porter ( 6 was than of its the transporter ( of tapentadol 2008 been shown to as oxycodone the to gastrointesti complaints nausea. carries risk for for is how in cal to tramadol mechanism based opioid antitussives the effective drugs suppression of this is analgesia. in effect`	- inhibiting action. in animal models, its analgesic effects were only moderately reduced by naloxone but strongly reduced by an α 2 - adrenoceptor antagonist. furthermore, its binding to the norepinephrine trans - porter ( net, see chapter 6 ) was stronger than that of tramadol, whereas its binding to the serotonin transporter ( sert ) was less than that of tramadol. tapentadol was approved in 2008 and has been shown to be as effective as oxycodone in the treatment of moderate to severe pain but with a reduced profile of gastrointesti - nal complaints such as nausea. tapentadol carries risk for seizures in patients with seizure disorders and for the development of sero - tonin syndrome. it is unknown how tapentadol compares in clini - cal utility to tramadol or other analgesics whose mechanism of action is not based primarily on opioid receptor pharmacology. antitussives the opioid analgesics are among the most effective drugs available for the suppression of cough. this effect is oft...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
learning_rate: 2e-05
num_train_epochs: 5
warmup_ratio: 0.1
fp16: True
dataloader_drop_last: True
dataloader_num_workers: 2
load_best_model_at_end: True
push_to_hub: True
hub_model_id: RikoteMaster/retriever_pdf_and_books
hub_strategy: end
hub_private_repo: False

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: True
dataloader_num_workers: 2
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: True
resume_from_checkpoint: None
hub_model_id: RikoteMaster/retriever_pdf_and_books
hub_strategy: end
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss
0.1121	50	0.0343	-
0.2242	100	0.0199	-
0.3363	150	0.0184	-
0.4484	200	0.0188	0.0069
0.5605	250	0.019	-
0.6726	300	0.0155	-
0.7848	350	0.0128	-
0.8969	400	0.0139	0.0048
1.0090	450	0.0151	-
1.1211	500	0.012	-
1.2332	550	0.0144	-
1.3453	600	0.0117	0.0037
1.4574	650	0.0164	-
1.5695	700	0.0099	-
1.6816	750	0.0128	-
1.7937	800	0.0076	0.0035
1.9058	850	0.0098	-
2.0179	900	0.0147	-
2.1300	950	0.0087	-
2.2422	1000	0.012	0.0033
2.3543	1050	0.0106	-
2.4664	1100	0.0176	-
2.5785	1150	0.0123	-
2.6906	1200	0.0122	0.0032
2.8027	1250	0.0126	-
2.9148	1300	0.013	-
3.0269	1350	0.011	-
3.1390	1400	0.0139	0.0031
3.2511	1450	0.01	-
3.3632	1500	0.0122	-
3.4753	1550	0.0094	-
3.5874	1600	0.0122	0.0030
3.6996	1650	0.0147	-
3.8117	1700	0.0126	-
3.9238	1750	0.0125	-
4.0359	1800	0.0138	0.0030
4.1480	1850	0.0105	-
4.2601	1900	0.0107	-
4.3722	1950	0.0179	-
4.4843	2000	0.011	0.0029
4.5964	2050	0.0126	-
4.7085	2100	0.0137	-
4.8206	2150	0.0084	-
4.9327	2200	0.012	0.0029

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.17
Sentence Transformers: 4.1.0
Transformers: 4.52.3
PyTorch: 2.7.0+cu126
Accelerate: 1.7.0
Datasets: 3.6.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}