| | --- |
| | license: mit |
| | language: |
| | - en |
| | --- |
| | # MacBERTh |
| |
|
| | This model is a Historical Language Model for English coming from the [MacBERTh project](https://macberth.netlify.app/). |
| |
|
| | The architecture is based on BERT base uncased from the original BERT pre-training codebase. |
| | The training material comes from different sources including: |
| |
|
| | - EEBO |
| | - ECCO |
| | - COHA |
| | - CLMET3.1 |
| | - EVANS |
| | - Hansard Corpus |
| |
|
| | with a total word count of approximately 3.9B tokens. |
| |
|
| | Details and evaluation can be found in the accompanying publications: |
| | - [MacBERTh: Development and Evaluation of a Historically Pre-trained Language Model for English (1450-1950)](https://aclanthology.org/2021.nlp4dh-1.4/) |
| | - [Adapting vs. Pre-training Language Models for Historical Languages](https://doi.org/10.46298/jdmdh.9152) |