Dingyun-Huang commited on
Commit
ca4c85f
·
verified ·
1 Parent(s): 02bd72c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -8
README.md CHANGED
@@ -6,15 +6,44 @@ tags:
6
  - feature-extraction
7
  - sentence-similarity
8
  - transformers
9
-
 
 
 
 
 
 
 
10
  ---
11
 
12
- # Dingyun-Huang/oe-sbert-raw-mean
13
 
14
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
15
 
16
  <!--- Describe your model here -->
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Usage (Sentence-Transformers)
19
 
20
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
@@ -29,7 +58,7 @@ Then you can use the model like this:
29
  from sentence_transformers import SentenceTransformer
30
  sentences = ["This is an example sentence", "Each sentence is converted"]
31
 
32
- model = SentenceTransformer('Dingyun-Huang/oe-sbert-raw-mean')
33
  embeddings = model.encode(sentences)
34
  print(embeddings)
35
  ```
@@ -55,8 +84,8 @@ def mean_pooling(model_output, attention_mask):
55
  sentences = ['This is an example sentence', 'Each sentence is converted']
56
 
57
  # Load model from HuggingFace Hub
58
- tokenizer = AutoTokenizer.from_pretrained('Dingyun-Huang/oe-sbert-raw-mean')
59
- model = AutoModel.from_pretrained('Dingyun-Huang/oe-sbert-raw-mean')
60
 
61
  # Tokenize sentences
62
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -78,18 +107,38 @@ print(sentence_embeddings)
78
 
79
  <!--- Describe how your model was evaluated -->
80
 
81
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=Dingyun-Huang/oe-sbert-raw-mean)
82
 
83
 
84
 
85
  ## Full Model Architecture
86
  ```
87
  SentenceTransformer(
88
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
89
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
90
  )
91
  ```
92
 
93
  ## Citing & Authors
94
 
95
- <!--- Describe where people can find more information -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - feature-extraction
7
  - sentence-similarity
8
  - transformers
9
+ - optoelectronics
10
+ license: mit
11
+ datasets:
12
+ - CambridgeMolecularEngineering/oe-ttl-abs-303k
13
+ language:
14
+ - en
15
+ base_model:
16
+ - bert-base-uncased
17
  ---
18
 
19
+ # Dingyun-Huang/oe-sroberta-embedding
20
 
21
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
22
 
23
  <!--- Describe your model here -->
24
 
25
+
26
+ **The OE-BERT model is domain adapted from bert-base-uncased over research literature in optoelectronics. The adapted model is then fine-tuned on abstracts and titles of optoelectronics research articles for embedding capabilities.**
27
+
28
+ ## Model Details
29
+
30
+ ### Model Description
31
+
32
+ <!-- Provide a longer summary of what this model is. -->
33
+
34
+ - **Language(s) (NLP):** English
35
+ - **Adapted from model:** bert-base-uncased
36
+
37
+
38
+
39
+ ### Model Sources
40
+
41
+ <!-- Provide the basic links for the model. -->
42
+
43
+ - **Repository:** [OptoelectronicsLM-codebase (GitHub)](https://github.com/Dingyun-Huang/OptoelectronicsLM-codebase)
44
+ - **Paper:** [
45
+ Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications](https://pubs.acs.org/doi/10.1021/acs.jcim.4c02029)
46
+
47
  ## Usage (Sentence-Transformers)
48
 
49
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
 
58
  from sentence_transformers import SentenceTransformer
59
  sentences = ["This is an example sentence", "Each sentence is converted"]
60
 
61
+ model = SentenceTransformer('Dingyun-Huang/oe-sroberta-embedding')
62
  embeddings = model.encode(sentences)
63
  print(embeddings)
64
  ```
 
84
  sentences = ['This is an example sentence', 'Each sentence is converted']
85
 
86
  # Load model from HuggingFace Hub
87
+ tokenizer = AutoTokenizer.from_pretrained('Dingyun-Huang/oe-sroberta-embedding')
88
+ model = AutoModel.from_pretrained('Dingyun-Huang/oe-sroberta-embedding')
89
 
90
  # Tokenize sentences
91
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
107
 
108
  <!--- Describe how your model was evaluated -->
109
 
110
+ For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=Dingyun-Huang/oe-sroberta-embedding)
111
 
112
 
113
 
114
  ## Full Model Architecture
115
  ```
116
  SentenceTransformer(
117
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
118
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
119
  )
120
  ```
121
 
122
  ## Citing & Authors
123
 
124
+ <!--- Describe where people can find more information -->
125
+ **BibTeX:**
126
+ ```bibtex
127
+ @article{doi:10.1021/acs.jcim.4c02029,
128
+ author = {Huang, Dingyun and Cole, Jacqueline M.},
129
+ title = {Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications},
130
+ journal = {Journal of Chemical Information and Modeling},
131
+ volume = {65},
132
+ number = {5},
133
+ pages = {2476-2486},
134
+ year = {2025},
135
+ doi = {10.1021/acs.jcim.4c02029},
136
+ note ={PMID: 39933074},
137
+ URL = {
138
+ https://doi.org/10.1021/acs.jcim.4c02029
139
+ },
140
+ eprint = {
141
+ https://doi.org/10.1021/acs.jcim.4c02029
142
+ }
143
+ }
144
+ ```