sentence-transformers How to use anasse15/MNLP_M3_document_encoder with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("anasse15/MNLP_M3_document_encoder")
sentences = [
"Which of the following statements is true regarding the properties of zinc-activated ion channels and quaternary carbon atoms?\nA. Quaternary carbon atoms are primarily involved in the activation of zinc-activated ion channels.\nB. Both zinc-activated ion channels and quaternary carbon atoms are unique to the rat genome.\nC. Zinc-activated ion channels are cation-permeable and can activate spontaneously, while quaternary carbon atoms are found in hydrocarbons with at least five carbon atoms.\nD. Zinc-activated ion channels are exclusively found in the human genome, while quaternary carbon atoms can only exist in linear alkanes.",
"A quaternary carbon is a carbon atom bound to four other carbon atoms. For this reason, quaternary carbon atoms are found only in hydrocarbons having at least five carbon atoms. Quaternary carbon atoms can occur in branched alkanes, but not in linear alkanes.\n\nSynthesis \nThe formation of chiral quaternary carbon centers has been a synthetic challenge. Chemists have developed asymmetric Diels–Alder reactions, Heck reaction, Enyne cyclization, cycloaddition reactions, C–H activation, Allylic substitution, Pauson–Khand reaction, etc. to construct asymmetric quaternary carbons.\n\nReferences \n\nChemical nomenclature\nOrganic chemistry",
"Severe fever with thrombocytopenia syndrome (SFTS) is an emerging infectious disease caused by Dabie bandavirus also known as the SFTS virus, first reported between late March and mid-July 2009 in rural areas of Hubei and Henan provinces in Central China. SFTS has fatality rates ranging from 12% to as high as 30% in some areas. The major clinical symptoms of SFTS are fever, vomiting, diarrhea, multiple organ failure, thrombocytopenia (low platelet count), leucopenia (low white blood cell count), and elevated liver enzyme levels.\n\nVirology\nSFTS virus (SFTSV) is a virus in the order Bunyavirales. Person-to-person transmission was not noted in early reports but has since been documented.\n\nThe life cycle of the SFTSV most likely involves arthropod vectors and animal hosts. Humans appear to be largely accidental hosts. SFTSV has been detected in Haemaphysalis longicornis ticks.\n\nEpidemiology\nSFTS occurs in China's rural areas from March to November with the majority of cases from April to July. In 2013, Japan and Korea also reported several cases with deaths.\n\nIn July 2013, South Korea reported a total of eight deaths since August 2012.\n\nIn July 2017, Japanese doctors reported that a woman had died of SFTS after being bitten by a cat that may have itself infected by a tick. The woman had no visible tick bites, leading doctors to believe that the cat — which died as well — was the transmission vector.\n\nIn early 2020 an outbreak occurred in East China, more than 37 people were found with SFTS in Jiangsu province, while 23 more were found infected in Anhui province in August 2020. Seven people have died.\n\nEvolution\nThe virus originated 50–150 years ago and has undergone a recent population expansion.\n\nHistory\nIn 2009 Xue-jie Yu and colleagues isolated the SFTS virus (SFTSV) from SFTS patients’ blood.\n\nReferences\n\nExternal links \n\nArthropod-borne viral fevers and viral haemorrhagic fevers\nInsect-borne diseases\nZoonoses",
"Lecticans, also known as hyalectans, are a family of proteoglycans (a type protein that is attached to chains of negatively charged polysaccharides) that are components of the extracellular matrix. There are four members of the lectican family: aggrecan, brevican, neurocan, and versican. Lecticans interact with hyaluronic acid and tenascin-R to form a ternary complex.\n\nTissue distribution \n\nAggrecan is a major component of extracellular matrix in cartilage whereas versican is widely expressed in a number of connective tissues including those in vascular smooth muscle, skin epithelial cells, and the cells of central and peripheral nervous system. The expression of neurocan and brevican is largely restricted to neural tissues.\n\nStructure \n\nAll four lecticans contain an N-terminal globular domain (G1 domain) that in turn contains an immunoglobulin V-set domain and a Link domain that binds hyaluronic acid; a long extended central domain (CS) that is modified with covalently attached sulfated glycosaminoglycan chains, and a C-terminal globular domain (G3 domain) containing of one or more EGF repeats, a C-type lectin domain and a CRP-like domain. Aggrecan has in addition a globular domain (G2 domain) that is situated between the G1 and CS domains.\n\nSee also \nHyaladherin\n\nReferences \n\nProtein families"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]