emanjavacas
/

MacBERTh

Model card Files Files and versions

MacBERTh / README.md

emanjavacas's picture

Update README.md

dfa93d9 over 2 years ago

|

history blame contribute delete

770 Bytes

	---
	license: mit
	language:
	- en
	---
	# MacBERTh

	This model is a Historical Language Model for English coming from the [MacBERTh project](https://macberth.netlify.app/).

	The architecture is based on BERT base uncased from the original BERT pre-training codebase.
	The training material comes from different sources including:

	- EEBO
	- ECCO
	- COHA
	- CLMET3.1
	- EVANS
	- Hansard Corpus

	with a total word count of approximately 3.9B tokens.

	Details and evaluation can be found in the accompanying publications:
	- [MacBERTh: Development and Evaluation of a Historically Pre-trained Language Model for English (1450-1950)](https://aclanthology.org/2021.nlp4dh-1.4/)
	- [Adapting vs. Pre-training Language Models for Historical Languages](https://doi.org/10.46298/jdmdh.9152)