| | --- |
| | pipeline_tag: sentence-similarity |
| | language: en |
| | license: apache-2.0 |
| | tags: |
| | - sentence-transformers |
| | - feature-extraction |
| | - sentence-similarity |
| | - transformers |
| | --- |
| | |
| | # hku-nlp/instructor-large |
| | This is a general embedding model: It maps **any** piece of text (e.g., a title, a sentence, a document, etc.) to a fixed-length vector in test time **without further training**. With instructions, the embeddings are **domain-specific** (e.g., specialized for science, finance, etc.) and **task-aware** (e.g., customized for classification, information retrieval, etc.) |
| |
|
| | The model is easy to use with `sentence-transformer` library. |
| |
|
| | ## Installation |
| | ```bash |
| | git clone https://github.com/HKUNLP/instructor-embedding |
| | cd sentence-transformers |
| | pip install -e . |
| | ``` |
| |
|
| | ## Compute your customized embeddings |
| | Then you can use the model like this to calculate domain-specific and task-aware embeddings: |
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments" |
| | instruction = "Represent the Science title; Input:" |
| | model = SentenceTransformer('hku-nlp/instructor-large') |
| | embeddings = model.encode([[instruction,sentence,0]]) |
| | print(embeddings) |
| | ``` |
| |
|
| | ## Calculate Sentence similarities |
| | You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**. |
| | ```python |
| | from sklearn.metrics.pairwise import cosine_similarity |
| | sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0], |
| | ['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0] |
| | sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0], |
| | ['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0] |
| | embeddings_a = model.encode(sentences_a) |
| | embeddings_b = model.encode(sentences_b) |
| | similarities = cosine_similarity(embeddings_a,embeddings_b) |
| | print(similarities) |
| | ``` |