| | --- |
| | language: |
| | - "en" |
| | license: mit |
| | datasets: |
| | - glue |
| | metrics: |
| | - Classification accuracy |
| | --- |
| | |
| |
|
| | # Model Card for WeightWatcher/albert-large-v2-mnli |
| | This model was finetuned on the GLUE/mnli task, based on the pretrained |
| | albert-large-v2 model. Hyperparameters were (largely) taken from the following |
| | publication, with some minor exceptions. |
| |
|
| | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
| | https://arxiv.org/abs/1909.11942 |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| | - **Developed by:** https://huggingface.co/cdhinrichs |
| | - **Model type:** Text Sequence Classification |
| | - **Language(s) (NLP):** English |
| | - **License:** MIT |
| | - **Finetuned from model:** https://huggingface.co/albert-large-v2 |
| |
|
| | ## Uses |
| | Text classification, research and development. |
| |
|
| | ### Out-of-Scope Use |
| | Not intended for production use. |
| | See https://huggingface.co/albert-large-v2 |
| |
|
| | ## Bias, Risks, and Limitations |
| | See https://huggingface.co/albert-large-v2 |
| |
|
| | ### Recommendations |
| | See https://huggingface.co/albert-large-v2 |
| |
|
| |
|
| | ## How to Get Started with the Model |
| |
|
| | Use the code below to get started with the model. |
| |
|
| | ```python |
| | from transformers import AlbertForSequenceClassification |
| | model = AlbertForSequenceClassification.from_pretrained("WeightWatcher/albert-large-v2-mnli") |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| | See https://huggingface.co/datasets/glue#mnli |
| |
|
| | MNLI is a classification task, and a part of the GLUE benchmark. |
| |
|
| |
|
| | ### Training Procedure |
| | Adam optimization was used on the pretrained ALBERT model at |
| | https://huggingface.co/albert-large-v2. |
| |
|
| | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
| | https://arxiv.org/abs/1909.11942 |
| |
|
| |
|
| | #### Training Hyperparameters |
| | Training hyperparameters, (Learning Rate, Batch Size, ALBERT dropout rate, |
| | Classifier Dropout Rate, Warmup Steps, Training Steps,) were taken from Table |
| | A.4 in, |
| |
|
| | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
| | https://arxiv.org/abs/1909.11942 |
| |
|
| | Max sequence length (MSL) was set to 128, differing from the above. |
| |
|
| |
|
| | ## Evaluation |
| | Classification accuracy is used to evaluate model performance. |
| |
|
| |
|
| | ### Testing Data, Factors & Metrics |
| |
|
| | #### Testing Data |
| | See https://huggingface.co/datasets/glue#mnli |
| |
|
| | #### Metrics |
| | Classification accuracy |
| |
|
| | ### Results |
| | Training classification accuracy: 0.9567916639080015 |
| |
|
| | Evaluation classification accuracy: 0.86571574121243 |
| |
|
| |
|
| | ## Environmental Impact |
| | The model was finetuned on a single user workstation with a single GPU. CO2 |
| | impact is expected to be minimal. |
| |
|
| |
|