|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
datasets: |
|
|
- AlexHung29629/nllb_processed |
|
|
model-index: |
|
|
- name: out_nllb |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl) |
|
|
<details><summary>See axolotl config</summary> |
|
|
|
|
|
axolotl version: `0.12.0.dev0` |
|
|
```yaml |
|
|
base_model: out_khanacademy |
|
|
remove_unused_columns: true |
|
|
auto_resume_from_checkpoints: true |
|
|
plugins: |
|
|
- axolotl.integrations.liger.LigerPlugin |
|
|
#- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin |
|
|
|
|
|
liger_rope: true |
|
|
liger_rms_norm: true |
|
|
liger_glu_activation: true |
|
|
liger_fused_linear_cross_entropy: true |
|
|
|
|
|
unfrozen_parameters: |
|
|
- ^\S+layers\S+$ |
|
|
- ^\S+norm\S+$ |
|
|
|
|
|
|
|
|
datasets: |
|
|
- path: AlexHung29629/nllb_processed |
|
|
split: train[:1_000_000] |
|
|
type: chat_template |
|
|
chat_template: jinja |
|
|
chat_template_jinja: "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'system') %}{{message['content'] + '\n'}}{% elif (message['role'] == 'user') %}{{'Source: ' + '\n' + message['content'] + '\n' + '\nTarget:\n'}}{% elif message['role'] == 'assistant' %}{{message['content'] + '</s>' + '\n'}}{% endif %}{% endfor %}" |
|
|
roles_to_train: ['user', 'assistant'] |
|
|
|
|
|
#test_datasets: |
|
|
# - path: HuggingFaceTB/cosmopedia |
|
|
# name: khanacademy |
|
|
# split: train[-100:] |
|
|
# type: |
|
|
# system_prompt: "" |
|
|
# field_system: |
|
|
# field_instruction: prompt |
|
|
# field_output: text |
|
|
# format: "User: {instruction}\n\nAssistant: " |
|
|
# no_input_format: "User: {instruction}\n\nAssistant: " |
|
|
|
|
|
sample_packing_bin_size: 500 |
|
|
dataset_prepared_path: data_prep_nllb |
|
|
output_dir: ./out_nllb |
|
|
dataloader_num_workers: 1 |
|
|
dataloader_pin_memory: true |
|
|
shuffle_merged_datasets: false |
|
|
|
|
|
sequence_len: 8192 |
|
|
eval_sequence_len: 2048 |
|
|
sample_packing: true |
|
|
eval_sample_packing: true |
|
|
pad_to_sequence_len: true |
|
|
|
|
|
use_tensorboard: true |
|
|
use_wandb: true |
|
|
# Set the name of your wandb run |
|
|
wandb_name: nllb |
|
|
# Your wandb project name |
|
|
wandb_project: Draft_Tiny |
|
|
|
|
|
gradient_accumulation_steps: 1 |
|
|
|
|
|
micro_batch_size: 1 |
|
|
num_epochs: 1 |
|
|
#eval_steps: 500 |
|
|
save_steps: 1000 |
|
|
save_total_limit: 1 |
|
|
save_only_model: false |
|
|
optimizer: adamw_8bit |
|
|
adam_beta1: 0.9 |
|
|
adam_beta2: 0.95 |
|
|
adam_epsilon: 1e-6 |
|
|
lr_scheduler: constant_with_warmup |
|
|
learning_rate: 0.0003 |
|
|
max_grad_norm: 1.0 |
|
|
|
|
|
bf16: true |
|
|
|
|
|
gradient_checkpointing: true |
|
|
gradient_checkpointing_kwargs: |
|
|
use_reentrant: false |
|
|
|
|
|
torch_compile: true |
|
|
torch_compile_backend: inductor |
|
|
torch_compile_mode: default |
|
|
#flash_attention: true |
|
|
#sdp_attention: true |
|
|
#xformers_attention: true |
|
|
flex_attention: true |
|
|
flex_attn_compile_kwargs: |
|
|
dynamic: false |
|
|
mode: max-autotune-no-cudagraphs |
|
|
|
|
|
warmup_steps: 1 |
|
|
logging_steps: 1 |
|
|
weight_decay: 0.001 |
|
|
|
|
|
|
|
|
special_tokens: |
|
|
bos_token: <s> |
|
|
eos_token: </s> |
|
|
pad_token: <pad> |
|
|
unk_token: <unk> |
|
|
|
|
|
``` |
|
|
|
|
|
</details><br> |
|
|
|
|
|
# out_nllb |
|
|
|
|
|
This model was trained from scratch on the AlexHung29629/nllb_processed dataset. |
|
|
|
|
|
## Model description |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training and evaluation data |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 0.0003 |
|
|
- train_batch_size: 1 |
|
|
- eval_batch_size: 1 |
|
|
- seed: 42 |
|
|
- optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.95) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments |
|
|
- lr_scheduler_type: constant_with_warmup |
|
|
- lr_scheduler_warmup_steps: 2 |
|
|
- training_steps: 13786 |
|
|
|
|
|
### Training results |
|
|
|
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.54.1 |
|
|
- Pytorch 2.7.1+cu128 |
|
|
- Datasets 4.0.0 |
|
|
- Tokenizers 0.21.4 |
|
|
|