Setting max_layers_to_freeze overrun in MTLClassifier

#581
by MajorasMeow - opened

Hi,

I noticed that MTLClassifier seem to ignore max_layers_to_freeze. I set min = 0 max = 2 and it still runs some trials with e.g. max layers =11.
Or maybe I got sth wrong?
Also is there a way to fix only certain hyperparameters?

Thanks so far!

Settings:

hyperparameters = {
"learning_rate": {"type": "float", "low": 1e-5, "high": 1e-3, "log": True},
"warmup_ratio": {"type": "float", "low": 0.01, "high": 0.02},
"weight_decay": {"type": "float", "low": 0.01, "high": 0.1},
"dropout_rate": {"type": "float", "low": 0.0, "high": 0.7},
"lr_scheduler_type": {"type": "categorical", "choices": ["cosine"]},
"task_weights": {"type": "float", "low": 0.1, "high": 2.0},
}

mc = MTLClassifier(
task_columns=task_columns,
study_name=study_name,
pretrained_path=pretrained_path,
train_path=train_path,
val_path=val_path,
test_path=test_path,
model_save_path=model_save_path,
results_dir=results_dir,
tensorboard_log_dir=tensorboard_log_dir,
hyperparameters=hyperparameters,
manual_hyperparameters = manual_hyperparameters,
use_manual_hyperparameters = False,
max_layers_to_freeze = {"min":0, "max": 2},
distributed_training=use_distributed,
master_addr= socket.gethostname() if use_distributed else None,
master_port= port if use_distributed else None,
n_trials= n_trials,
epochs=epochs,
batch_size=batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
gradient_clipping=True,
max_grad_norm=1.0,
seed=seed
)

Thank you for your question. It appears you are setting max_layers_to_freeze correctly; could you provide the information that indicated 11 layers were frozen in one of the trials?

For the specific hyperparameters to fix, you could submit a range that only includes one value for the hyperparameters you'd like to fix, or set manual hyperparameters if you would like to fix them all.

To be sure I ran the script again via slurm job and checked the .err which prints:

[I 2026-02-04 11:49:04,839] Trial 0 finished with value: 1.0696380138397217 and parameters: {'learning_rate': 2.3581745784331423e-05, 'warmup_ratio': 0.017128341224997498, 'weight_decay': 0.052142819635632824, 'dropout_rate': 0.5468565833396168, 'lr_scheduler_type': 'cosine', 'task_weight_0': 0.725170815253848, 'task_weight_1': 0.17016888533474578, 'max_layers_to_freeze': 10}. Best is trial 0 with value: 1.0696380138397217.

I think I found the bug in train.py:

max_layers_to_freeze needs to be in trial_config, which is a copy of config. But later in line 478 it just checks if it is there and then sets the min/max by the range of the pretrained model?

Line 450:

trial_config = config.copy()

From line 478:

# Set appropriate max layers to freeze based on pretrained model
    if "max_layers_to_freeze" in trial_config:
        freeze_range = get_layer_freeze_range(trial_config["pretrained_path"])
        trial_config["max_layers_to_freeze"] = int(trial.suggest_int(
            "max_layers_to_freeze", 
            freeze_range["min"], 
            freeze_range["max"]
        ))

Fix, which worked for me:
I changed the block above to:

if "max_layers_to_freeze" in trial_config:
        if trial_config["max_layers_to_freeze"] is None:
        # infer range from pretrained model
            freeze_range = get_layer_freeze_range(trial_config["pretrained_path"])
            trial_config["max_layers_to_freeze"] = int(
                trial.suggest_int(
                    "max_layers_to_freeze",
                    freeze_range["min"],
                    freeze_range["max"],
                )
        )
        else:
        # user-specified range
            min_freeze = trial_config["max_layers_to_freeze"]["min"]
            max_freeze = trial_config["max_layers_to_freeze"]["max"]

            trial_config["max_layers_to_freeze"] = int(
                trial.suggest_int("max_layers_to_freeze", min_freeze, max_freeze)
            )

output:
[I 2026-02-04 16:52:24,314] Trial 0 finished with value: 1.4905354976654053 and parameters: {'learning_rate': 0.00018664834802988663, 'warmup_ratio': 0.011826725807678032, 'weight_decay': 0.05924153264075307, 'dropout_rate': 0.506554993838568, 'lr_scheduler_type': 'cosine', 'task_weight_0': 0.3662899464984072, 'task_weight_1': 1.4941439968992316, 'max_layers_to_freeze': 2}. Best is trial 0 with value: 1.4905354976654053.

Thank you so much for identifying this. Since you found the fix, it would be great if you could submit a pull request to make this change.

Will do!
Thanks

ctheodoris changed discussion status to closed

Sign up or log in to comment