| # BERT-STEM | |
| BERT model fine-tuned on Science Technology Engineering and Mathematics (STEM) lessons. | |
| ## Install: | |
| To install from pip: | |
| ``` | |
| pip install bertstem | |
| ``` | |
| ## Quickstart | |
| To encode sentences and get embedding matrix for embedding layers: | |
| ```python | |
| from BERT_STEM.BertSTEM import * | |
| bert = BertSTEM() | |
| # Example dataframe with text in spanish | |
| data = {'col_1': [3, 2, 1], | |
| 'col_2': ['hola como estan', 'alumnos queridos', 'vamos a hablar de matematicas']} | |
| df = pd.DataFrame.from_dict(data) | |
| # Encode sentences using BertSTEM: | |
| bert._encode_df(df, column='col_2', encoding='sum') | |
| # Get embedding matrix: | |
| embedding_matrix = bert.get_embedding_matrix() | |
| ``` | |
| To use it from HuggingFace: | |
| ```python | |
| from BERT_STEM.Encode import * | |
| import pandas as pd | |
| import transformers | |
| # Download spanish BERTSTEM: | |
| model = transformers.BertModel.from_pretrained("pablouribe/bertstem") | |
| # Download spanish tokenizer: | |
| tokenizer = transformers.BertTokenizerFast.from_pretrained("dccuchile/bert-base-spanish-wwm-uncased", | |
| do_lower_case=True, | |
| add_special_tokens = False) | |
| # Example dataframe with text in spanish | |
| data = {'col_1': [3, 2, 1], | |
| 'col_2': ['hola como estan', 'alumnos queridos', 'vamos a hablar de matematicas']} | |
| df = pd.DataFrame.from_dict(data) | |
| # Encode sentences using BertSTEM: | |
| sentence_encoder(df, model, tokenizer, column = 'col_2', encoding = 'sum') | |
| ``` | |