Update README.md
Browse files
README.md
CHANGED
|
@@ -43,7 +43,7 @@ https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/tem
|
|
| 43 |
* [Neural Network Architecture](#neural-network-architecture)
|
| 44 |
* [Training Hyperparameters](#training-hyperparameters)
|
| 45 |
1. [Main Pre-training](#1-main-pre-training)
|
| 46 |
-
2. [Context Extension](#2-context-extension)
|
| 47 |
3. [Annealing](#3-annealing)
|
| 48 |
* [Training Logs and Learning Curves](#training-logs-and-learning-curves)
|
| 49 |
<!-- * [Evaluation](#evaluation) -->
|
|
@@ -135,8 +135,8 @@ model = transformers.AutoModelForCausalLM.from_pretrained(model_name,
|
|
| 135 |
where `revision` can be one of:
|
| 136 |
* "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": every 5000 steps for the first pre-training steps (with a context length of 4096).
|
| 137 |
* "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": every 25000 steps from 25k to 750k steps.
|
| 138 |
-
* "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context extension and annealing.
|
| 139 |
-
* "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context extension (with a context length of 32000).
|
| 140 |
|
| 141 |
## Training Details
|
| 142 |
|
|
@@ -218,7 +218,7 @@ Training hyperparameters in torch/Megatron-DeepSpeed were as follows:
|
|
| 218 |
| Pipeline Parallelism (with 512 GPUs) | 4 |
|
| 219 |
| Data Parallelism (with 512 GPUs) | 32 |
|
| 220 |
|
| 221 |
-
#### 2. Context Extension
|
| 222 |
|
| 223 |
Training hyperparameters are the same as above, with the following changes:
|
| 224 |
| **Hyperparameter** | **Value** |
|
|
@@ -229,13 +229,21 @@ Training hyperparameters are the same as above, with the following changes:
|
|
| 229 |
| Context length | 32 000 |
|
| 230 |
| Batch size | 128 |
|
| 231 |
| Learning rate | 2e-5 |
|
|
|
|
| 232 |
| Tensor Parallelism (with 128 GPUs) | 4 |
|
| 233 |
| Pipeline Parallelism (with 128 GPUs) | 4 |
|
| 234 |
| Data Parallelism (with 128 GPUs) | 8 |
|
| 235 |
|
| 236 |
#### 3. Annealing
|
| 237 |
|
| 238 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
|
| 240 |
### Training Logs and Learning Curves
|
| 241 |
|
|
@@ -283,7 +291,7 @@ Main results are summarized in the following figures:
|
|
| 283 |
#### Pretraining
|
| 284 |

|
| 285 |
|
| 286 |
-
#### Context Extension
|
| 287 |

|
| 288 |
|
| 289 |
#### Annealing
|
|
@@ -296,19 +304,37 @@ Lucie-7B is a language model trained solely to predict the most probable next wo
|
|
| 296 |
|
| 297 |
## Citation
|
| 298 |
|
| 299 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 300 |
|
| 301 |
|
| 302 |
## Acknowledgements
|
| 303 |
|
| 304 |
This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
|
| 305 |
|
| 306 |
-
Lucie-7B was created by members of [LINAGORA](https://labs.linagora.com/) and OpenLLM-France community, including in alphabetical order:
|
|
|
|
|
|
|
| 307 |
Christophe Cerisara (LORIA),
|
| 308 |
Evan Dufraisse (CEA),
|
| 309 |
Julie Hunter (LINAGORA),
|
| 310 |
Jean-Pierre Lorré (LINAGORA),
|
| 311 |
Jérôme Louradour (LINAGORA),
|
|
|
|
| 312 |
Michel-Marie Maudet (LINAGORA),
|
| 313 |
Olivier Gouvert (LINAGORA), and
|
| 314 |
Yaya Sy (LORIA).
|
|
@@ -329,4 +355,3 @@ for their helpful input.
|
|
| 329 |
## Contact
|
| 330 |
|
| 331 |
contact@openllm-france.fr
|
| 332 |
-
|
|
|
|
| 43 |
* [Neural Network Architecture](#neural-network-architecture)
|
| 44 |
* [Training Hyperparameters](#training-hyperparameters)
|
| 45 |
1. [Main Pre-training](#1-main-pre-training)
|
| 46 |
+
2. [Context Length Extension](#2-context-extension)
|
| 47 |
3. [Annealing](#3-annealing)
|
| 48 |
* [Training Logs and Learning Curves](#training-logs-and-learning-curves)
|
| 49 |
<!-- * [Evaluation](#evaluation) -->
|
|
|
|
| 135 |
where `revision` can be one of:
|
| 136 |
* "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": every 5000 steps for the first pre-training steps (with a context length of 4096).
|
| 137 |
* "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": every 25000 steps from 25k to 750k steps.
|
| 138 |
+
* "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context length extension and annealing.
|
| 139 |
+
* "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context length extension (with a context length of 32000).
|
| 140 |
|
| 141 |
## Training Details
|
| 142 |
|
|
|
|
| 218 |
| Pipeline Parallelism (with 512 GPUs) | 4 |
|
| 219 |
| Data Parallelism (with 512 GPUs) | 32 |
|
| 220 |
|
| 221 |
+
#### 2. Context Length Extension
|
| 222 |
|
| 223 |
Training hyperparameters are the same as above, with the following changes:
|
| 224 |
| **Hyperparameter** | **Value** |
|
|
|
|
| 229 |
| Context length | 32 000 |
|
| 230 |
| Batch size | 128 |
|
| 231 |
| Learning rate | 2e-5 |
|
| 232 |
+
| Learning rate schedule | constant |
|
| 233 |
| Tensor Parallelism (with 128 GPUs) | 4 |
|
| 234 |
| Pipeline Parallelism (with 128 GPUs) | 4 |
|
| 235 |
| Data Parallelism (with 128 GPUs) | 8 |
|
| 236 |
|
| 237 |
#### 3. Annealing
|
| 238 |
|
| 239 |
+
Training hyperparameters are the same as for context length extension, with the following changes:
|
| 240 |
+
| **Hyperparameter** | **Value** |
|
| 241 |
+
|------------------------|------------|
|
| 242 |
+
| Total \# samples| 156 250 (5B tokens) |
|
| 243 |
+
| Total \# steps | 1 220 |
|
| 244 |
+
| Learning rate schedule | linear annealing |
|
| 245 |
+
| Maximum Learning rate | 3e-5 |
|
| 246 |
+
| Final Learning rate | 0 |
|
| 247 |
|
| 248 |
### Training Logs and Learning Curves
|
| 249 |
|
|
|
|
| 291 |
#### Pretraining
|
| 292 |

|
| 293 |
|
| 294 |
+
#### Context Length Extension
|
| 295 |

|
| 296 |
|
| 297 |
#### Annealing
|
|
|
|
| 304 |
|
| 305 |
## Citation
|
| 306 |
|
| 307 |
+
When using the Lucie-7B model, please cite the following paper:
|
| 308 |
+
|
| 309 |
+
✍ Olivier Gouvert, Julie Hunter, Jérôme Louradour,
|
| 310 |
+
Evan Dufraisse, Yaya Sy, Pierre-Carl Langlais, Anastasia Stasenko,
|
| 311 |
+
Laura Rivière, Christophe Cerisara, Jean-Pierre Lorré (2025)
|
| 312 |
+
Lucie-7B LLM and its training dataset
|
| 313 |
+
```bibtex
|
| 314 |
+
@misc{openllm2023claire,
|
| 315 |
+
title={Lucie-7B LLM and its training dataset:
|
| 316 |
+
open resources for multilingual language generation},
|
| 317 |
+
author={Olivier Gouvert and Julie Hunter and Jérôme Louradour and Evan Dufraisse and Yaya Sy and Pierre-Carl Langlais and Anastasia Stasenko and Laura Rivière and Christophe Cerisara and Jean-Pierre Lorré},
|
| 318 |
+
year={2025},
|
| 319 |
+
archivePrefix={arXiv},
|
| 320 |
+
primaryClass={cs.CL}
|
| 321 |
+
}
|
| 322 |
+
```
|
| 323 |
|
| 324 |
|
| 325 |
## Acknowledgements
|
| 326 |
|
| 327 |
This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
|
| 328 |
|
| 329 |
+
Lucie-7B was created by members of [LINAGORA](https://labs.linagora.com/) and the [OpenLLM-France](https://www.openllm-france.fr/) community, including in alphabetical order:
|
| 330 |
+
Agustin Martin Picard (IRT),
|
| 331 |
+
Thibaut Boissin (IRT),
|
| 332 |
Christophe Cerisara (LORIA),
|
| 333 |
Evan Dufraisse (CEA),
|
| 334 |
Julie Hunter (LINAGORA),
|
| 335 |
Jean-Pierre Lorré (LINAGORA),
|
| 336 |
Jérôme Louradour (LINAGORA),
|
| 337 |
+
Lucas Hervier (IRT),
|
| 338 |
Michel-Marie Maudet (LINAGORA),
|
| 339 |
Olivier Gouvert (LINAGORA), and
|
| 340 |
Yaya Sy (LORIA).
|
|
|
|
| 355 |
## Contact
|
| 356 |
|
| 357 |
contact@openllm-france.fr
|
|
|