vpelloin commited on
Commit
44fc0e2
·
verified ·
1 Parent(s): a436b23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -7
README.md CHANGED
@@ -3,7 +3,12 @@ license: cc-by-nc-sa-2.0
3
  language:
4
  - fr
5
  pipeline_tag: feature-extraction
6
- library_name: fairseq
 
 
 
 
 
7
  ---
8
 
9
  # Pantagruel: Unified Self-Supervised Encoders for French Text and Speech
@@ -27,14 +32,14 @@ Pantagruel text encoders are trained on large-scale French text corpora, includi
27
 
28
  The table below presents the accuracy of the natural language inference task on the French XNLI dataset.
29
 
30
- | **HuggingFace name**| **Model name (paper)** | **Arch/ Params** | **Pretrained dataset** | **Accuracy on XNLI (FR) (dev / test)** |
31
  |----------|------------------------|-----------------|----------------------|---------------------------------------|
32
- | text-base-camtok-wiki | Pantagruel-B-camtok-Wk | Base / 110M | French Wikipedia 2019 (4GB) | 76.94% / 77.43% |
33
  | text-base-wiki | Pantagruel-B-Wk | Base / 125M | French Wikipedia 2019 (4GB) | 77.40% / 78.41% |
34
- | text-base-wiki-mlm | Pantagruel-B-Wk-MLM | Base / 125M | French Wikipedia 2019 (4GB) | 78.25% / 78.41% |
35
- | text-base-camtok-oscar | Pantagruel-B-camtok-Osc | Base / 110M | OSCAR 2019 (138GB) | 80.40% / 80.53% |
36
- | text-base-oscar-mlm | Pantagruel-B-Osc-MLM | Base / 125M | OSCAR 2019 (138GB) | 81.11% / 81.52% |
37
- | text-base-croissant-mlm | Pantagruel-B-Crs-MLM | Base / 125M | croissantLLM (1.5GB) | 81.05% / 80.69% |
38
 
39
  For more downstream tasks and evaluation datasets, please refer to [our paper](https://arxiv.org/abs/2601.05911).
40
 
 
3
  language:
4
  - fr
5
  pipeline_tag: feature-extraction
6
+ library_name: transformers
7
+ tags:
8
+ - data2vec2
9
+ - JEPA
10
+ - text
11
+ - fairseq
12
  ---
13
 
14
  # Pantagruel: Unified Self-Supervised Encoders for French Text and Speech
 
32
 
33
  The table below presents the accuracy of the natural language inference task on the French XNLI dataset.
34
 
35
+ | **HuggingFace name**| **Model name (paper)** | **Arch/ Params** | **Pretrained dataset** | **Accuracy on XNLI (FR) (dev / test)** |
36
  |----------|------------------------|-----------------|----------------------|---------------------------------------|
37
+ | [text-base-camtok-wiki](https://huggingface.co/PantagrueLLM/text-base-camtok-wiki) | Pantagruel-B-camtok-Wk | Base / 110M | French Wikipedia 2019 (4GB) | 76.94% / 77.43% |
38
  | text-base-wiki | Pantagruel-B-Wk | Base / 125M | French Wikipedia 2019 (4GB) | 77.40% / 78.41% |
39
+ | [text-base-wiki-mlm](https://huggingface.co/PantagrueLLM/text-base-wiki-mlm) | Pantagruel-B-Wk-MLM | Base / 125M | French Wikipedia 2019 (4GB) | 78.25% / 78.41% |
40
+ | [text-base-camtok-oscar](https://huggingface.co/PantagrueLLM/text-base-camtok-oscar) | Pantagruel-B-camtok-Osc | Base / 110M | OSCAR 2019 (138GB) | 80.40% / 80.53% |
41
+ | [text-base-oscar-mlm](https://huggingface.co/PantagrueLLM/text-base-oscar-mlm) | Pantagruel-B-Osc-MLM | Base / 125M | OSCAR 2019 (138GB) | 81.11% / 81.52% |
42
+ | [text-base-croissant-mlm](https://huggingface.co/PantagrueLLM/text-base-croissant-mlm) | Pantagruel-B-Crs-MLM | Base / 125M | croissantLLM (1.5GB) | 81.05% / 80.69% |
43
 
44
  For more downstream tasks and evaluation datasets, please refer to [our paper](https://arxiv.org/abs/2601.05911).
45