Pablo Uribe commited on
Commit
1bc56e3
·
1 Parent(s): 1cdff2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -1
README.md CHANGED
@@ -1 +1,60 @@
1
- BERT model fine-tuned on Chilean Science Technology Engineering and Mathematics (STEM) lessons.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # BERT-STEM
2
+
3
+ BERT model fine-tuned on Science Technology Engineering and Mathematics (STEM) lessons.
4
+
5
+ ## Install:
6
+
7
+ To install from pip:
8
+
9
+ ```
10
+ pip install bertstem
11
+ ```
12
+
13
+ ## Quickstart
14
+
15
+ To encode sentences and get embedding matrix for embedding layers:
16
+
17
+ ```python
18
+ from BERT_STEM.BertSTEM import *
19
+ bert = BertSTEM()
20
+
21
+ # Example dataframe with text in spanish
22
+ data = {'col_1': [3, 2, 1],
23
+ 'col_2': ['hola como estan', 'alumnos queridos', 'vamos a hablar de matematicas']}
24
+
25
+ df = pd.DataFrame.from_dict(data)
26
+
27
+ # Encode sentences using BertSTEM:
28
+ bert._encode_df(df, column='col_2', encoding='sum')
29
+
30
+ # Get embedding matrix:
31
+
32
+ embedding_matrix = bert.get_embedding_matrix()
33
+ ```
34
+
35
+ To use it from HuggingFace:
36
+
37
+ ```python
38
+ from BERT_STEM.Encode import *
39
+ import pandas as pd
40
+ import transformers
41
+
42
+ # Download spanish BERTSTEM:
43
+ model = transformers.BertModel.from_pretrained("pablouribe/bertstem")
44
+
45
+ # Download spanish tokenizer:
46
+ tokenizer = transformers.BertTokenizerFast.from_pretrained("dccuchile/bert-base-spanish-wwm-uncased",
47
+ do_lower_case=True,
48
+ add_special_tokens = False)
49
+
50
+ # Example dataframe with text in spanish
51
+ data = {'col_1': [3, 2, 1],
52
+ 'col_2': ['hola como estan', 'alumnos queridos', 'vamos a hablar de matematicas']}
53
+
54
+ df = pd.DataFrame.from_dict(data)
55
+
56
+ # Encode sentences using BertSTEM:
57
+ sentence_encoder(df, model, tokenizer, column = 'col_2', encoding = 'sum')
58
+ ```
59
+
60
+