Can't reproduce the model

by mobicham - opened Jan 29, 2024

Jan 29, 2024

•

edited Jan 29, 2024

Hello, thank you for making this model available!

I tried to reproduce the model using the settings from the model card. While the model works fine (follows instructions and properly stops at eos_token, etc.) I don't get the same results on the arc_challenge@25:

-phi-2   : 0.6109
-phi-sft : 0.6280
-attempt : 0.616

In terms of training settings, the only difference is that I use gradient accumulation to mimic the batch size of 64 on a single GPU:

n_epochs             = 2
batch_size           = 1
effective_batch_size = 64 
grad_acc             = max(1, int(effective_batch_size/batch_size))
training_arguments = TrainingArguments(
    output_dir=".",
    num_train_epochs=n_epochs,
    per_device_train_batch_size=batch_size, 
    gradient_accumulation_steps=grad_acc,
    logging_steps=1,
    learning_rate=2e-5, 
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    weight_decay=0.001,
    max_steps=-1,
    save_steps=10000000,
    bf16=True
)

I am using the model from refs/pr/23 revision, bfloat16, training only attention and mlp layers (weights + bias). Training the whole model or with LoRA gives worse.

model = transformers.AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16, 
                                                           flash_attn=True, flash_rotary=True, fused_dense=True, 
                                                           device_map='cuda', 
                                                           revision="refs/pr/23")

Any tips are highly appreciated, thank you in advance!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment