Instructions to use jonfd/convbert-base-igc-is with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jonfd/convbert-base-igc-is with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="jonfd/convbert-base-igc-is")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("jonfd/convbert-base-igc-is") model = AutoModel.from_pretrained("jonfd/convbert-base-igc-is") - Notebooks
- Google Colab
- Kaggle
Training scripts ?
Hi,
we were using this model for training of Icelandic Homographs. The results were quite good. See https://github.com/grammatek/IceHoc.
I'd be interested in the training scripts of this LM. Especially if it comes to dataset preparation and cleaning. Would you share those scripts ?
Kv,
Daniel.
Hi Daniel,
Happy to hear that the model performed so well on homograph classification. When pre-training the model, I followed Stefan Schweter's instructions:
https://github.com/stefan-it/turkish-bert/blob/master/convbert/CHEATSHEET.md
https://github.com/stefan-it/turkish-bert/blob/master/electra/CHEATSHEET.md
I used the pre-training script from the ConvBERT repository. Since the pre-training corpus (i.e., the Icelandic Gigaword Corpus) doesn't contain any web-crawled or noisy documents, I didn't perform any filtering or cleaning beforehand.
Best regards,
Jón