File size: 5,538 Bytes
e4433c7
 
 
 
 
 
 
4710747
e4433c7
 
4710747
532f5e7
4710747
a664a37
4710747
 
 
 
 
 
 
1d843f6
0a90595
 
 
4710747
a664a37
0df4576
4710747
 
 
 
 
 
 
 
 
 
0df4576
2d493b5
4710747
a664a37
0df4576
4710747
a664a37
dedfd57
2a6fd21
dedfd57
 
 
 
 
 
 
 
 
373f5aa
 
 
c65d304
a664a37
8a237c0
 
dedfd57
2892de2
dedfd57
4710747
a664a37
4710747
 
f4f9d05
4710747
 
a664a37
0df4576
 
 
 
 
532f5e7
 
1a8bdb1
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: README
emoji: 📊
colorFrom: green
colorTo: red
sdk: static
pinned: false
license: afl-3.0
---


Useful HF resources and fantastic contributors for Dutch NLP are

## Individuals
* [Pieter Delobelle](https://huggingface.co/pdelobelle), [homepage](https://pieter.ai/) and [git](https://github.com/ipieter)
* [Bram van Roy](https://huggingface.co/BramVanroy) and [homepage](https://bramvanroy.github.io/)
* [Robin Smits](https://huggingface.co/robinsmits) and [git](https://github.com/robinsmits)
* [Janneke van de Zwaan](https://huggingface.co/jvdzwaan/ocrpostcorrection-task-1) and [git](https://github.com/jvdzwaan)
* [Yeb Havinga](https://huggingface.co/yhavinga) and [git](https://github.com/yhavinga)
* [Wietse de Vries](https://huggingface.co/wietsedv) and [git](https://github.com/wietsedv)
* [François Remy](https://huggingface.co/FremyCompany), [homepage](http://fremycompany.com) and [git](https://github.com/FremyCompany)
* [Maarten Grootendorst](https://huggingface.co/MaartenGr), [homepage](https://www.maartengrootendorst.com/) and [git](https://github.com/MaartenGr) 
* [Piek Vossen](https://vossen.info/) and [git](https://github.com/piekvossen)
* [Eva Rombouts](https://huggingface.co/ekrombouts) and [git](https://github.com/ekrombouts)
* [Joeran Bosma](https://huggingface.co/joeranbosma/) and [git](https://github.com/joeranbosma)

## Organisations
* [University Medical Center Utrecht](https://github.com/umcu)
* [NLPtown](https://huggingface.co/nlptown) and [homepage](http://nlp.town/)
* [doc2query](https://huggingface.co/doc2query)
* [LT3, language and translation technology team, University of Gent](https://huggingface.co/LT3) and [homepage](https://lt3.ugent.be/)
* [Textgain](https://huggingface.co/textgain) and [homepage](https://www.textgain.com/)
* [ML6](https://huggingface.co/ml6team), [homepage](https://ml6.eu/) and [git](https://github.com/ml6team)
* [CLiPS](https://huggingface.co/clips), [homepage](https://www.uantwerpen.be/en/research-groups/clips/) and [git](https://github.com/clips)
* [DTAI Research Group, KU Leuven](https://huggingface.co/DTAI-KULeuven), [homepage](https://dtai.cs.kuleuven.be/) and [git](https://github.com/ML-KULeuven)
* [GroNLP](https://huggingface.co/GroNLP), [homepage](https://www.rug.nl/research/clcg/research/cl/)
* [CLTL](https://huggingface.co/CLTL), [homepage](http://cltl.nl) and [git](https://github.com/CLTL)
* [Nederlands Forensic Institute](https://huggingface.co/NetherlandsForensicInstitute), [homepage](https://forensicinstitute.nl/) and [git](https://github.com/NetherlandsForensicInstitute)
* [Integraal Kanker centrum Nederland (iKNL)](https://github.com/iknl)
* [Erasmus Medical Informatics](https://github.com/mi-erasmusmc)

## NLP Libraries relevant for (Dutch) clinical NLP:
* [Clinlp](https://github.com/umcu/clinlp)

## Encoder models
* [*RobBERT 2023*](https://huggingface.co/DTAI-KULeuven/robbert-2023-dutch-base)
* [*BERTje*](https://huggingface.co/GroNLP/bert-base-dutch-cased)
* [*BelabBERT*](https://huggingface.co/jwouts/belabBERT_115k)
* [**MedRoBERTa.nl**](https://huggingface.co/CLTL/MedRoBERTa.nl)
* [**CardioBERTa.nl**](https://huggingface.co/UMCU/CardioBERTa.nl_clinical)
* [**CardioDeBERTa.nl**](https://huggingface.co/UMCU/CardioDeBERTa.nl)
* [**DRAGON-longformer-large-domain-specific**](https://huggingface.co/joeranbosma/dragon-longformer-large-domain-specific)
* [**DRAGON-longformer-base-domain-specific**](https://huggingface.co/joeranbosma/dragon-longformer-base-domain-specific)
* [**DRAGON-roberta-large-domain-specific**](https://huggingface.co/joeranbosma/dragon-roberta-large-domain-specific)
* [**DRAGON-roberta-base-domain-specific**](https://huggingface.co/joeranbosma/dragon-roberta-base-domain-specific)
* [**DRAGON-bert-base-domain-specific**](https://huggingface.co/joeranbosma/dragon-bert-base-domain-specific)

## Contrastive encoder models
* [BioLord 2023-M Dutch](https://huggingface.co/FremyCompany/BioLORD-2023-M-Dutch-InContext-v1)
 
## Decoder models
* [*GPT-2 on mC4*](https://huggingface.co/yhavinga/gpt2-large-dutch), [GPT-2 finetuned on Dutch](https://huggingface.co/GroNLP/gpt2-medium-dutch-embeddings)
* [*GPT-neo on mC4*](https://huggingface.co/yhavinga/gpt-neo-1.3B-dutch)
* [*GEITje (based on Mistral)*](https://github.com/Rijgersberg/GEITje)
* [*Fietje (based on Phi-2)*](https://huggingface.co/BramVanroy/fietje-2), [**Zust_fietje**](https://huggingface.co/ekrombouts/zuster_fietje)
* [**J1**](https://huggingface.co/Juvoly/J1-Llama-8B-exp)

## NTMs
* [NLLB200](https://huggingface.co/facebook/nllb-200-3.3B)
* [UL2, en-nl](https://huggingface.co/yhavinga/ul2-large-en-nl), [UL2, nl-en](https://huggingface.co/yhavinga/ul2-large-dutch-english)
* [OPUS MT, en-nl](https://huggingface.co/Helsinki-NLP/opus-mt-en-nl), [OPUS MT, nl-en](https://huggingface.co/Helsinki-NLP/opus-mt-nl-en), [OPUS MT Healthcare, nl-en](https://huggingface.co/FremyCompany/opus-mt-nl-en-healthcare)
* [Llama 2 MT, nl-en](https://huggingface.co/kaitchup/Llama-2-7b-mt-Dutch-to-English)

## Datasets

* [SoNaR](https://taalmaterialen.ivdnt.org/download/tstc-sonar-corpus/)
* [COW](https://rolandschaefer.net/archives/142)
* [mc4 cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned)
* [TWnC](https://research.utwente.nl/en/publications/twnc-a-multifaceted-dutch-news-corpus)
* [Gigacorpus](http://gigacorpus.nl/)
* [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX)
* [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)
* [FineWeb 2](https://github.com/huggingface/fineweb-2)