| # kwang2049/TSDAE-twitterpara2nli_stsb |
| This is a model from the paper ["TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning"](https://arxiv.org/abs/2104.06979). This model was only trained with the TSDAE objective on twitterpara in an unsupervised manner. Training procedure of this model: |
| 1. Initialized with [bert-base-uncased](https://huggingface.co/bert-base-uncased); |
| 2. Unsupervised training on twitterpara with the TSDAE objective; |
| |
| The pooling method is CLS-pooling. |
| |
| ## Usage |
| To use this model, an convenient way is through [SentenceTransformers](https://github.com/UKPLab/sentence-transformers). So please install it via: |
| ```bash |
| pip install sentence-transformers |
| ``` |
| And then load the model and use it to encode sentences: |
| ```python |
| from sentence_transformers import SentenceTransformer, models |
| dataset = 'twitterpara' |
| model_name_or_path = f'kwang2049/TSDAE-{dataset}' |
| model = SentenceTransformer(model_name_or_path) |
| model[1] = models.Pooling(model[0].get_word_embedding_dimension(), pooling_mode='cls') # Note this model uses CLS-pooling |
| sentence_embeddings = model.encode(['This is the first sentence.', 'This is the second one.']) |
| ``` |
| ## Evaluation |
| To evaluate the model against the datasets used in the paper, please install our evaluation toolkit [USEB](https://github.com/UKPLab/useb): |
| ```bash |
| pip install useb # Or git clone and pip install . |
| python -m useb.downloading all # Download both training and evaluation data |
| ``` |
| And then do the evaluation: |
| ```python |
| from sentence_transformers import SentenceTransformer, models |
| import torch |
| from useb import run_on |
| dataset = 'twitterpara' |
| model_name_or_path = f'kwang2049/TSDAE-{dataset}' |
| model = SentenceTransformer(model_name_or_path) |
| model[1] = models.Pooling(model[0].get_word_embedding_dimension(), pooling_mode='cls') # Note this model uses CLS-pooling |
| @torch.no_grad() |
| def semb_fn(sentences) -> torch.Tensor: |
| return torch.Tensor(model.encode(sentences, show_progress_bar=False)) |
| result = run_on( |
| dataset, |
| semb_fn=semb_fn, |
| eval_type='test', |
| data_eval_path='data-eval' |
| ) |
| ``` |
| |
| ## Training |
| Please refer to [the page of TSDAE training](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/TSDAE) in SentenceTransformers. |
| |
| ## Cite & Authors |
| If you use the code for evaluation, feel free to cite our publication [TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning](https://arxiv.org/abs/2104.06979): |
| ```bibtex |
| @article{wang-2021-TSDAE, |
| title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning", |
| author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna", |
| journal= "arXiv preprint arXiv:2104.06979", |
| month = "4", |
| year = "2021", |
| url = "https://arxiv.org/abs/2104.06979", |
| } |
| ``` |