HippolyteP commited on
Commit
bf3c992
·
1 Parent(s): e1e2cee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -9
README.md CHANGED
@@ -11,28 +11,99 @@ license: cc-by-sa-4.0
11
 
12
  This page houses `Helium 6B` models trained using either a sequential pretraining on temporally ordered data or using a standard pretraining on shuffled data. The architecture is derived from [Helium 2B](https://huggingface.co/kyutai/helium-1-2b).
13
 
14
- ## Models Details
15
 
 
 
16
 
 
17
 
18
- ### Uses
19
 
20
- As described in the [paper](),
21
- ### Licensing
22
 
23
- Helium 6B models are licensed under the CC-BY-SA 4.0 license.
24
-
 
 
 
 
 
 
 
 
 
 
25
 
26
- ## Usage
27
 
 
 
 
28
 
 
29
 
30
- ```python
31
 
 
 
 
 
 
 
 
 
32
  ```
33
 
34
-
 
 
 
 
 
 
 
 
 
 
 
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ## Citations
37
 
38
  If you use one of these models, please cite:
 
11
 
12
  This page houses `Helium 6B` models trained using either a sequential pretraining on temporally ordered data or using a standard pretraining on shuffled data. The architecture is derived from [Helium 2B](https://huggingface.co/kyutai/helium-1-2b).
13
 
 
14
 
15
+ - **Developed by:** Kyutai
16
+ - **Model type:** Large Language Model
17
 
18
+ ## Uses
19
 
20
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
21
 
22
+ ### Direct Use
 
23
 
24
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
25
+
26
+ The intended use of the Helium model is research and development of natural language processing systems, including but not limited to language generation and understanding.
27
+ For most downstream use cases, the model should be aligned with supervised fine-tuning, RLHF or related methods.
28
+
29
+ ### Out-of-Scope Use
30
+
31
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
32
+
33
+ The model should not be used in other languages than the ones on which it was trained.
34
+ The model is not intended to be used for any malicious or illegal activities of any kind.
35
+ The model was not fine-tuned to follow instructions, and thus should not be used as such.
36
 
37
+ ## Bias, Risks, and Limitations
38
 
39
+ Helium-1 is a base language model, which was not aligned to human preferences.
40
+ As such, the model can generate incorrect, biased, harmful or generally unhelpful content.
41
+ Thus, the model should not be used for downstream applications without further alignment, evaluations and mitigations of risks.
42
 
43
+ ## How to Get Started with the Model
44
 
45
+ Use the code below to get started with the model.
46
 
47
+ ```python
48
+ import torch
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+
51
+ model_id = "kyutai/Sequential_Helium_6B"
52
+
53
+ model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
54
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
55
  ```
56
 
57
+ To load a specific checkpoint, e.g. the last checkpoint from the sequential pretraining (cool-downed) before any 2025 data:
58
+
59
+ ```python
60
+ model = AutoModelForCausalLM.from_pretrained(model_id, subfolder='sequential_2024').cuda()
61
+ ```
62
+ ## Training Details
63
+
64
+ ### Training Data
65
+
66
+ Helium-6B checkpoints were trained on data from Common Crawl, which was preprocessed with the [dactory](https://github.com/kyutai-labs/dactory) library.
67
+
68
+
69
 
70
+ ## Evaluation
71
+
72
+
73
+
74
+
75
+ #### Testing Data
76
+
77
+ The model was evaluated using [OLMES](https://arxiv.org/abs/2406.08446) a LLM evaluation benchmark based on, MMLU, ARC Easy & Challenge, Open Book QA, Common Sense QA, Physical Interaction QA, Social Interaction QA, HellaSwag, WinoGrande and BoolQA.
78
+
79
+
80
+
81
+ #### English Results
82
+
83
+ | Benchmark | Sequential-Helium-6B | Shuffled-Helium-6B (2.5T tokens) |
84
+ |--------------|:------:|:------:|
85
+ | | | | | | |
86
+ | MMLU | 58.8 | 56.4 |
87
+ | ARC E | 87.6 | 86.7 |
88
+ | ARC C | 74.5 | 72.1 |
89
+ | OBQA | 72.8 | 73.2 |
90
+ | CSQA | 73.1 | 74.3 |
91
+ | PIQA | 80.3 | 80.2 |
92
+ | SIQA | 67.0 | 66.2 |
93
+ | HS | 79.1 | 81.3 |
94
+ | WG | 73.0 | 73.1 |
95
+ | BoolQA | 83.9 | 83.9 |
96
+ | | | |
97
+ | OLMES | 75.0 | 74.7 |
98
+
99
+
100
+ ### Uses
101
+
102
+ As described in the [paper](),
103
+ ### Licensing
104
+
105
+ Helium 6B models are licensed under the CC-BY-SA 4.0 license.
106
+
107
  ## Citations
108
 
109
  If you use one of these models, please cite: