EuroLLM
Collection
12 items
•
Updated
•
42
This is the model card for EuroLLM-9B-Instruct-2512, an improved version of utter-project/EuroLLM-9B-Instruct. In comparison with the previous version, this version includes the long-context extension phase and the revamped post-training recipe from utter-project/EuroLLM-22B-Instruct.
axolotl version: 0.12.2
auto_resume_from_checkpoints: true
use_tensorboard: true
base_model: utter-project/EuroLLM-9B-2512
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
dataset_processes: 64
datasets:
- path: utter-project/EuroBlocks-SFT-2512
type: chat_template
split: train
conversation: chatml
field_messages: conversations
message_field_role: role
message_field_content: content
roles_to_train: ["assistant"]
train_on_eos: all
chat_template_jinja: "{% for message in messages %}{% if message['role'] == 'assistant' %}{% set role = 'assistant' %}{% else %}{% set role = message['role'] %}{% endif %}<|im_start|>{{ role }}\n{{ message['content'] | trim }}<|im_end|>\n{% endfor %}{% if add_generation_prompt %}{{'<|im_start|>assistant\n'}}{% endif %}"
output_dir: checkpoints
val_set_size: 0
sequence_len: 32768
sample_packing: true
pad_to_sequence_len: true
# sequence_parallel_degree: 4
# heads_k_stride: 1
# ring_attn_func:
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true
# N_GPUS * GRAD_ACC_STEPS * MICRO_BATCH_SIZE * SEQ_LEN = tokens/step ->
# Assuming 32 gpus (32 * 2 * 2 * 32k = 4 096 000 tokens/step)
gradient_accumulation_steps: 2
micro_batch_size: 2
eval_batch_size: 1
num_epochs: 5
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 1e-5
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
logging_steps: 1
flash_attention: true
flash_attn_cross_entropy: false
flash_attn_rms_norm: false
flash_attn_fuse_qkv: false
flash_attn_fuse_mlp: false
warmup_steps: 125
eval_sample_packing: False
save_steps: 500
save_total_limit: 2
deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.01
special_tokens:
eos_token: "<|im_end|>"
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "utter-project/EuroLLM-9B-Instruct-2512"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
messages = [
{
"role": "system",
"content": "You are EuroLLM --- an AI assistant specialized in European languages that provides safe, educational and helpful answers.",
},
{
"role": "user", "content": "What is the capital of Portugal? How would you describe it?"
},
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
EuroLLM-9B has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).
Base model
utter-project/EuroLLM-9B-2512