demdecuong
/

stroke_simcse

Feature Extraction

Model card Files Files and versions

demdecuong commited on May 31, 2021

Commit

ab1128c

·

1 Parent(s): 8800bcc

update readme

Files changed (1) hide show

README.md +39 -1

README.md CHANGED Viewed

	@@ -1 +1,39 @@
1	- ~~hello~~

+This is finetune version of [SimCSE: Simple Contrastive Learning of Sentence Embeddings](https://arxiv.org/abs/2104.08821)
+, train unsupervised on 570K stroke sentences from : stroke books, quora medical, quora's stroke and human annotates.
+### Extract sentence representation
+```
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("demdecuong/stroke_simcse")
+model = AutoModel.from_pretrained("demdecuong/stroke_simcse")
+text = "What are disease related to red stroke's causes?"
+inputs = tokenizer(text, return_tensors='pt')
+outputs = model(**inputs)[1]
+```
+### Build up embedding for database
+```
+database = [
+    'What is the daily checklist for stroke returning home',
+    'What are some tips for stroke adapt new life',
+    'What  should I consider when using nursing-home care'
+]
+embedding = torch.zeros((len(database),768))
+for i in range(len(database)):
+  inputs = tokenizer(database[i], return_tensors="pt")
+  outputs = model(**inputs)[1]
+  embedding[i] = outputs
+print(embedding.shape)
+```
+### Result
+On our Poc testset , which contains pairs of matching question related to stroke from human-generated.
+| Model  | Top-1 Accuracy |
+| ------------- | ------------- |
+| SimCSE (supervised)  | 75.83  |
+| SimCSE (ours)  | 76.66  |