transformers / docs /source /ko /hpo_train.md
AbdulElahGwaith's picture
Upload folder using huggingface_hub
a9bd396 verified

Trainer APIλ₯Ό μ‚¬μš©ν•œ ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 [[hyperparameter-search-using-trainer-api]]

πŸ€— Transformersμ—μ„œλŠ” πŸ€— Transformers λͺ¨λΈμ„ ν•™μŠ΅μ‹œν‚€λŠ”λ° μ΅œμ ν™”λœ [Trainer] 클래슀λ₯Ό μ œκ³΅ν•˜κΈ° λ•Œλ¬Έμ—, μ‚¬μš©μžλŠ” 직접 ν›ˆλ ¨ 루프λ₯Ό μž‘μ„±ν•  ν•„μš” 없이 λ”μš± κ°„νŽΈν•˜κ²Œ ν•™μŠ΅μ„ μ‹œν‚¬ 수 μžˆμŠ΅λ‹ˆλ‹€. λ˜ν•œ, [Trainer]λŠ” ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색을 μœ„ν•œ APIλ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€. 이 λ¬Έμ„œμ—μ„œ 이 APIλ₯Ό ν™œμš©ν•˜λŠ” 방법을 μ˜ˆμ‹œμ™€ ν•¨κ»˜ λ³΄μ—¬λ“œλ¦¬κ² μŠ΅λ‹ˆλ‹€.

ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 λ°±μ—”λ“œ [[hyperparameter-search-backend]]

[Trainer]λŠ” ν˜„μž¬ μ•„λž˜ 4κ°€μ§€ ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 λ°±μ—”λ“œλ₯Ό μ§€μ›ν•©λ‹ˆλ‹€: optuna와, raytune, wandb μž…λ‹ˆλ‹€.

ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 λ°±μ—”λ“œλ‘œ μ‚¬μš©ν•˜κΈ° 전에 μ•„λž˜μ˜ λͺ…λ Ήμ–΄λ₯Ό μ‚¬μš©ν•˜μ—¬ λΌμ΄λΈŒλŸ¬λ¦¬λ“€μ„ μ„€μΉ˜ν•˜μ„Έμš”.

pip install optuna/wandb/ray[tune]

μ˜ˆμ œμ—μ„œ ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색을 ν™œμ„±ν™”ν•˜λŠ” 방법 [[how-to-enable-hyperparameter-search-in-example]]

ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 곡간을 μ •μ˜ν•˜μ„Έμš”. ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 λ°±μ—”λ“œλ§ˆλ‹€ μ„œλ‘œ λ‹€λ₯Έ ν˜•μ‹μ΄ ν•„μš”ν•©λ‹ˆλ‹€.

optuna의 경우, ν•΄λ‹Ή object_parameter λ¬Έμ„œλ₯Ό μ°Έμ‘°ν•˜μ—¬ μ•„λž˜μ™€ 같이 μž‘μ„±ν•˜μ„Έμš”:

>>> def optuna_hp_space(trial):
...     return {
...         "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
...         "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]),
...     }

raytune의 경우, ν•΄λ‹Ή object_parameter λ¬Έμ„œλ₯Ό μ°Έμ‘°ν•˜μ—¬ μ•„λž˜μ™€ 같이 μž‘μ„±ν•˜μ„Έμš”:

>>> def ray_hp_space(trial):
...     return {
...         "learning_rate": tune.loguniform(1e-6, 1e-4),
...         "per_device_train_batch_size": tune.choice([16, 32, 64, 128]),
...     }

wandb의 경우, ν•΄λ‹Ή object_parameter λ¬Έμ„œλ₯Ό μ°Έμ‘°ν•˜μ—¬ μ•„λž˜μ™€ 같이 μž‘μ„±ν•˜μ„Έμš”:

>>> def wandb_hp_space(trial):
...     return {
...         "method": "random",
...         "metric": {"name": "objective", "goal": "minimize"},
...         "parameters": {
...             "learning_rate": {"distribution": "uniform", "min": 1e-6, "max": 1e-4},
...             "per_device_train_batch_size": {"values": [16, 32, 64, 128]},
...         },
...     }

model_init ν•¨μˆ˜λ₯Ό μ •μ˜ν•˜κ³  이λ₯Ό [Trainer]에 μ „λ‹¬ν•˜μ„Έμš”. μ•„λž˜λŠ” κ·Έ μ˜ˆμ‹œμž…λ‹ˆλ‹€.

>>> def model_init(trial):
...     return AutoModelForSequenceClassification.from_pretrained(
...         model_args.model_name_or_path,
...         from_tf=bool(".ckpt" in model_args.model_name_or_path),
...         config=config,
...         cache_dir=model_args.cache_dir,
...         revision=model_args.model_revision,
...     )

μ•„λž˜μ™€ 같이 model_init ν•¨μˆ˜, ν›ˆλ ¨ 인수, ν›ˆλ ¨ 및 ν…ŒμŠ€νŠΈ 데이터셋, 그리고 평가 ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•˜μ—¬ [Trainer]λ₯Ό μƒμ„±ν•˜μ„Έμš”:

>>> trainer = Trainer(
...     model=None,
...     args=training_args,
...     train_dataset=small_train_dataset,
...     eval_dataset=small_eval_dataset,
...     compute_metrics=compute_metrics,
...     processing_class=tokenizer,
...     model_init=model_init,
...     data_collator=data_collator,
... )

ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색을 ν˜ΈμΆœν•˜κ³ , 졜적의 μ‹œν—˜ λ§€κ°œλ³€μˆ˜λ₯Ό κ°€μ Έμ˜€μ„Έμš”. λ°±μ—”λ“œλŠ” "optuna"/"wandb"/"ray" μ€‘μ—μ„œ 선택할 수 μžˆμŠ΅λ‹ˆλ‹€. λ°©ν–₯은 "minimize" λ˜λŠ” "maximize" 쀑 μ„ νƒν•˜λ©°, λͺ©ν‘œλ₯Ό μ΅œμ†Œν™”ν•  것인지 μ΅œλŒ€ν™”ν•  것인지λ₯Ό κ²°μ •ν•©λ‹ˆλ‹€.

μžμ‹ λ§Œμ˜ compute_objective ν•¨μˆ˜λ₯Ό μ •μ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ§Œμ•½ 이 ν•¨μˆ˜λ₯Ό μ •μ˜ν•˜μ§€ μ•ŠμœΌλ©΄, κΈ°λ³Έ compute_objectiveκ°€ 호좜되고, f1κ³Ό 같은 평가 μ§€ν‘œμ˜ 합이 λͺ©ν‘―κ°’μœΌλ‘œ λ°˜ν™˜λ©λ‹ˆλ‹€.

>>> best_trial = trainer.hyperparameter_search(
...     direction="maximize",
...     backend="optuna",
...     hp_space=optuna_hp_space,
...     n_trials=20,
...     compute_objective=compute_objective,
... )

DDP λ―Έμ„Έ 쑰정을 μœ„ν•œ ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 [[hyperparameter-search-for-ddp-finetune]]

ν˜„μž¬, DDP(Distributed Data Parallelism; λΆ„μ‚° 데이터 λ³‘λ ¬μ²˜λ¦¬)λ₯Ό μœ„ν•œ ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색은 optuna κ°€λŠ₯ν•©λ‹ˆλ‹€. μ΅œμƒμœ„ ν”„λ‘œμ„ΈμŠ€κ°€ ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 과정을 μ‹œμž‘ν•˜κ³  κ·Έ κ²°κ³Όλ₯Ό λ‹€λ₯Έ ν”„λ‘œμ„ΈμŠ€μ— μ „λ‹¬ν•©λ‹ˆλ‹€.