Setting max_layers_to_freeze overrun in MTLClassifier
Hi,
I noticed that MTLClassifier seem to ignore max_layers_to_freeze. I set min = 0 max = 2 and it still runs some trials with e.g. max layers =11.
Or maybe I got sth wrong?
Also is there a way to fix only certain hyperparameters?
Thanks so far!
Settings:
hyperparameters = {
"learning_rate": {"type": "float", "low": 1e-5, "high": 1e-3, "log": True},
"warmup_ratio": {"type": "float", "low": 0.01, "high": 0.02},
"weight_decay": {"type": "float", "low": 0.01, "high": 0.1},
"dropout_rate": {"type": "float", "low": 0.0, "high": 0.7},
"lr_scheduler_type": {"type": "categorical", "choices": ["cosine"]},
"task_weights": {"type": "float", "low": 0.1, "high": 2.0},
}
mc = MTLClassifier(
task_columns=task_columns,
study_name=study_name,
pretrained_path=pretrained_path,
train_path=train_path,
val_path=val_path,
test_path=test_path,
model_save_path=model_save_path,
results_dir=results_dir,
tensorboard_log_dir=tensorboard_log_dir,
hyperparameters=hyperparameters,
manual_hyperparameters = manual_hyperparameters,
use_manual_hyperparameters = False,
max_layers_to_freeze = {"min":0, "max": 2},
distributed_training=use_distributed,
master_addr= socket.gethostname() if use_distributed else None,
master_port= port if use_distributed else None,
n_trials= n_trials,
epochs=epochs,
batch_size=batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
gradient_clipping=True,
max_grad_norm=1.0,
seed=seed
)
Thank you for your question. It appears you are setting max_layers_to_freeze correctly; could you provide the information that indicated 11 layers were frozen in one of the trials?
For the specific hyperparameters to fix, you could submit a range that only includes one value for the hyperparameters you'd like to fix, or set manual hyperparameters if you would like to fix them all.
To be sure I ran the script again via slurm job and checked the .err which prints:
[I 2026-02-04 11:49:04,839] Trial 0 finished with value: 1.0696380138397217 and parameters: {'learning_rate': 2.3581745784331423e-05, 'warmup_ratio': 0.017128341224997498, 'weight_decay': 0.052142819635632824, 'dropout_rate': 0.5468565833396168, 'lr_scheduler_type': 'cosine', 'task_weight_0': 0.725170815253848, 'task_weight_1': 0.17016888533474578, 'max_layers_to_freeze': 10}. Best is trial 0 with value: 1.0696380138397217.
I think I found the bug in train.py:
max_layers_to_freeze needs to be in trial_config, which is a copy of config. But later in line 478 it just checks if it is there and then sets the min/max by the range of the pretrained model?
Line 450:
trial_config = config.copy()
From line 478:
# Set appropriate max layers to freeze based on pretrained model
if "max_layers_to_freeze" in trial_config:
freeze_range = get_layer_freeze_range(trial_config["pretrained_path"])
trial_config["max_layers_to_freeze"] = int(trial.suggest_int(
"max_layers_to_freeze",
freeze_range["min"],
freeze_range["max"]
))
Fix, which worked for me:
I changed the block above to:
if "max_layers_to_freeze" in trial_config:
if trial_config["max_layers_to_freeze"] is None:
# infer range from pretrained model
freeze_range = get_layer_freeze_range(trial_config["pretrained_path"])
trial_config["max_layers_to_freeze"] = int(
trial.suggest_int(
"max_layers_to_freeze",
freeze_range["min"],
freeze_range["max"],
)
)
else:
# user-specified range
min_freeze = trial_config["max_layers_to_freeze"]["min"]
max_freeze = trial_config["max_layers_to_freeze"]["max"]
trial_config["max_layers_to_freeze"] = int(
trial.suggest_int("max_layers_to_freeze", min_freeze, max_freeze)
)
output:
[I 2026-02-04 16:52:24,314] Trial 0 finished with value: 1.4905354976654053 and parameters: {'learning_rate': 0.00018664834802988663, 'warmup_ratio': 0.011826725807678032, 'weight_decay': 0.05924153264075307, 'dropout_rate': 0.506554993838568, 'lr_scheduler_type': 'cosine', 'task_weight_0': 0.3662899464984072, 'task_weight_1': 1.4941439968992316, 'max_layers_to_freeze': 2}. Best is trial 0 with value: 1.4905354976654053.
Thank you so much for identifying this. Since you found the fix, it would be great if you could submit a pull request to make this change.
Will do!
Thanks