| language: | |
| - en | |
| - hi | |
| - multilingual | |
| license: cc-by-sa-4.0 | |
| # en-hi-codemixed | |
| This is a masked language model, based on the CamemBERT model architecture. | |
| en-hi-codemixed model was trained from scratch on English, Hindi, and codemixed English-Hindi | |
| corpora for 40 epochs. | |
| The corpora used consists of primarily web crawled data, including codemixed tweets, and focuses on conversational | |
| language and covid-19 pandemic. | |