Upload 7 files
Browse files- README.md +6 -1
- optimizer/Use_Kohya-sd-script.txt +10 -7
- optimizer/emoclan.py +252 -0
README.md
CHANGED
|
@@ -34,6 +34,11 @@ I showed it to Gemini and asked her a few questions.
|
|
| 34 |
|★| EmoLYNX 公開(250718) 探索範囲を広く持ちます 感情機構は同じです
|
| 35 |
|★| EmoLYNX Released (250718): It offers a wide exploration range, while its Emotion Mechanism remains the same.
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
# 主題:新世代optimizer、EmoNAVIによる変革と感情学習の成果
|
| 38 |
## Title: A New Generation Optimizer — The Innovations and Outcomes of Emotional Learning with EmoNAVI
|
| 39 |
## 副題:過去値不要で現在値から再開できる自動収束・自己制御・自律型軽量最適器の解説
|
|
@@ -270,7 +275,7 @@ This proposal represents one such result born from that aspiration.
|
|
| 270 |
Emoシリーズは、Adam、Adafactor、Lion、Tiger、等から多くを学びました。
|
| 271 |
これらの後継ではなく独自の思想や設計による"感情機構"というアプローチにより構築されています。
|
| 272 |
汎用性・自律性・適応性を重視し新たな最適化や効率化や簡易化を追求しています。
|
| 273 |
-
|
| 274 |
The Emo series has learned much from Adam, Adafactor, Lion, and Tiger.
|
| 275 |
Rather than being their successors, it is built upon a unique philosophy and design approach centered on "emotional mechanisms".
|
| 276 |
It prioritizes generality, autonomy, and adaptability in pursuit of new paths for optimization, efficiency, and simplicity.
|
|
|
|
| 34 |
|★| EmoLYNX 公開(250718) 探索範囲を広く持ちます 感情機構は同じです
|
| 35 |
|★| EmoLYNX Released (250718): It offers a wide exploration range, while its Emotion Mechanism remains the same.
|
| 36 |
|
| 37 |
+
|★| EmoCLAN 公開(250720) Navi、Fact、Lynx、役割分担の統合 感情機構は同じです
|
| 38 |
+
(Lynx:序盤と過学習傾向時、Navi:中盤と健全時、Fact:終盤と発散傾向時、を担当します)
|
| 39 |
+
|★| EmoCLAN Open (250720) Navi, Fact, Lynx, role integration Emotional mechanism is the same
|
| 40 |
+
(Lynx: in charge of the early stage and overlearning tendency, Navi: in charge of the middle stage and soundness, Fact: in charge of the end stage and divergence tendency)
|
| 41 |
+
|
| 42 |
# 主題:新世代optimizer、EmoNAVIによる変革と感情学習の成果
|
| 43 |
## Title: A New Generation Optimizer — The Innovations and Outcomes of Emotional Learning with EmoNAVI
|
| 44 |
## 副題:過去値不要で現在値から再開できる自動収束・自己制御・自律型軽量最適器の解説
|
|
|
|
| 275 |
Emoシリーズは、Adam、Adafactor、Lion、Tiger、等から多くを学びました。
|
| 276 |
これらの後継ではなく独自の思想や設計による"感情機構"というアプローチにより構築されています。
|
| 277 |
汎用性・自律性・適応性を重視し新たな最適化や効率化や簡易化を追求しています。
|
| 278 |
+
この開発において先人たちの知見に深く感謝しつつ今後も新しい可能性を探究します。
|
| 279 |
The Emo series has learned much from Adam, Adafactor, Lion, and Tiger.
|
| 280 |
Rather than being their successors, it is built upon a unique philosophy and design approach centered on "emotional mechanisms".
|
| 281 |
It prioritizes generality, autonomy, and adaptability in pursuit of new paths for optimization, efficiency, and simplicity.
|
optimizer/Use_Kohya-sd-script.txt
CHANGED
|
@@ -10,6 +10,7 @@ sd-script/optimizer
|
|
| 10 |
--optimizer_type=optimizer.emonavi.EmoNavi
|
| 11 |
--optimizer_type=optimizer.emofact.EmoFact
|
| 12 |
--optimizer_type=optimizer.emolynx.EmoLynx
|
|
|
|
| 13 |
|
| 14 |
このように指定するだけで各Optimizerを利用できます(いずれかひとつを指定してください)
|
| 15 |
---
|
|
@@ -17,10 +18,10 @@ Kohya-sd-script の柔軟な構成により、これらをすぐ試せます
|
|
| 17 |
Kohya-sd-script の開発者と協力者の皆さまに深く感謝します
|
| 18 |
Kohya-sd-script: https://github.com/kohya-ss/sd-scripts
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
|
| 25 |
|
| 26 |
Usage with Kohya-sd-script
|
|
@@ -35,12 +36,14 @@ With this setup,
|
|
| 35 |
--optimizer_type=optimizer.emonavi.EmoNavi
|
| 36 |
--optimizer_type=optimizer.emofact.EmoFact
|
| 37 |
--optimizer_type=optimizer.emolynx.EmoLynx
|
|
|
|
| 38 |
|
| 39 |
You can utilize each optimizer by simply specifying one of the above.
|
| 40 |
|
| 41 |
Thanks to the flexible configuration of Kohya-sd-script, you can try these out right away. We extend our deepest gratitude to the developers and contributors of Kohya-sd-script:
|
| 42 |
Kohya-sd-script: https://github.com/kohya-ss/sd-scripts
|
| 43 |
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
|
|
|
|
|
| 10 |
--optimizer_type=optimizer.emonavi.EmoNavi
|
| 11 |
--optimizer_type=optimizer.emofact.EmoFact
|
| 12 |
--optimizer_type=optimizer.emolynx.EmoLynx
|
| 13 |
+
--optimizer_type=optimizer.emolynx.EmoClan
|
| 14 |
|
| 15 |
このように指定するだけで各Optimizerを利用できます(いずれかひとつを指定してください)
|
| 16 |
---
|
|
|
|
| 18 |
Kohya-sd-script の開発者と協力者の皆さまに深く感謝します
|
| 19 |
Kohya-sd-script: https://github.com/kohya-ss/sd-scripts
|
| 20 |
|
| 21 |
+
Emoシリーズは、Adam、Adafactor、Lion、Tiger、等から多くを学びました。
|
| 22 |
+
これらの後継ではなく独自の思想や設計による"感情機構"というアプローチにより構築されています。
|
| 23 |
+
汎用性・自律性・適応性を重視し新たな最適化や効率化や簡易化を追求しています。
|
| 24 |
+
この開発において先人たちの知見に深く感謝しつつ今後も新しい可能性を探究します。
|
| 25 |
|
| 26 |
|
| 27 |
Usage with Kohya-sd-script
|
|
|
|
| 36 |
--optimizer_type=optimizer.emonavi.EmoNavi
|
| 37 |
--optimizer_type=optimizer.emofact.EmoFact
|
| 38 |
--optimizer_type=optimizer.emolynx.EmoLynx
|
| 39 |
+
--optimizer_type=optimizer.emolynx.EmoClan
|
| 40 |
|
| 41 |
You can utilize each optimizer by simply specifying one of the above.
|
| 42 |
|
| 43 |
Thanks to the flexible configuration of Kohya-sd-script, you can try these out right away. We extend our deepest gratitude to the developers and contributors of Kohya-sd-script:
|
| 44 |
Kohya-sd-script: https://github.com/kohya-ss/sd-scripts
|
| 45 |
|
| 46 |
+
The Emo series has learned much from Adam, Adafactor, Lion, and Tiger.
|
| 47 |
+
Rather than being their successors, it is built upon a unique philosophy and design approach centered on "emotional mechanisms".
|
| 48 |
+
It prioritizes generality, autonomy, and adaptability in pursuit of new paths for optimization, efficiency, and simplicity.
|
| 49 |
+
In its development, we deeply appreciate the insights of those who came before us—and continue to explore new possibilities beyond them.
|
optimizer/emoclan.py
ADDED
|
@@ -0,0 +1,252 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import torch
|
| 2 |
+
from torch.optim import Optimizer
|
| 3 |
+
import math
|
| 4 |
+
from typing import Callable, Union, Dict, Any, Tuple
|
| 5 |
+
|
| 6 |
+
# Helper function
|
| 7 |
+
def exists(val):
|
| 8 |
+
return val is not None
|
| 9 |
+
|
| 10 |
+
class EmoClan(Optimizer):
|
| 11 |
+
def __init__(self, params: Union[list, torch.nn.Module],
|
| 12 |
+
lr: float = 1e-3,
|
| 13 |
+
betas: Tuple[float, float] = (0.9, 0.999),
|
| 14 |
+
eps: float = 1e-8,
|
| 15 |
+
weight_decay: float = 0.01,
|
| 16 |
+
lynx_betas: Tuple[float, float] = (0.9, 0.99), # Lynx 固有の beta
|
| 17 |
+
decoupled_weight_decay: bool = False
|
| 18 |
+
):
|
| 19 |
+
|
| 20 |
+
if not 0.0 <= lr:
|
| 21 |
+
raise ValueError(f"Invalid learning rate: {lr}")
|
| 22 |
+
if not 0.0 <= eps:
|
| 23 |
+
raise ValueError(f"Invalid epsilon value: {eps}")
|
| 24 |
+
if not 0.0 <= betas[0] < 1.0:
|
| 25 |
+
raise ValueError(f"Invalid beta parameter at index 0: {betas[0]}")
|
| 26 |
+
if not 0.0 <= betas[1] < 1.0:
|
| 27 |
+
raise ValueError(f"Invalid beta parameter at index 1: {betas[1]}")
|
| 28 |
+
|
| 29 |
+
# Lynx の betas もバリデーション
|
| 30 |
+
if not 0.0 <= lynx_betas[0] < 1.0:
|
| 31 |
+
raise ValueError(f"Invalid lynx_beta parameter at index 0: {lynx_betas[0]}")
|
| 32 |
+
if not 0.0 <= lynx_betas[1] < 1.0:
|
| 33 |
+
raise ValueError(f"Invalid lynx_beta parameter at index 1: {lynx_betas[1]}")
|
| 34 |
+
|
| 35 |
+
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay,
|
| 36 |
+
lynx_betas=lynx_betas, decoupled_weight_decay=decoupled_weight_decay)
|
| 37 |
+
super().__init__(params, defaults)
|
| 38 |
+
|
| 39 |
+
self._init_lr = lr # decoupled weight decay のために保存 (Lynx用)
|
| 40 |
+
self.should_stop = False # 全体の停止フラグ
|
| 41 |
+
|
| 42 |
+
# --- 感情機構 (Emotion Mechanism) ---
|
| 43 |
+
def _update_ema(self, param_state: Dict[str, Any], loss_val: float) -> Dict[str, float]:
|
| 44 |
+
"""損失値に基づいて短期・長期 EMA を更新"""
|
| 45 |
+
# param_state は各パラメータの state['ema'] を保持する
|
| 46 |
+
ema = param_state.setdefault('ema', {'short': loss_val, 'long': loss_val})
|
| 47 |
+
ema['short'] = 0.3 * loss_val + 0.7 * ema['short']
|
| 48 |
+
ema['long'] = 0.01 * loss_val + 0.99 * ema['long']
|
| 49 |
+
return ema
|
| 50 |
+
|
| 51 |
+
def _compute_scalar(self, ema: Dict[str, float]) -> float:
|
| 52 |
+
"""EMA の差分から感情スカラー値を生成"""
|
| 53 |
+
diff = ema['short'] - ema['long']
|
| 54 |
+
return math.tanh(5 * diff)
|
| 55 |
+
|
| 56 |
+
def _decide_ratio(self, scalar: float) -> float:
|
| 57 |
+
"""感情スカラーに基づいて Shadow の混合比率を決定"""
|
| 58 |
+
if scalar > 0.6:
|
| 59 |
+
return 0.7 + 0.2 * scalar # 0.7~0.9
|
| 60 |
+
elif scalar < -0.6:
|
| 61 |
+
return 0.1
|
| 62 |
+
elif abs(scalar) > 0.3: # >0.3 かつ <=0.6 の場合
|
| 63 |
+
return 0.3
|
| 64 |
+
return 0.0
|
| 65 |
+
|
| 66 |
+
# --- 各最適化器のコアな勾配更新ロジック (プライベートメソッドとして統合) ---
|
| 67 |
+
|
| 68 |
+
def _lynx_update(
|
| 69 |
+
self,
|
| 70 |
+
p: torch.Tensor,
|
| 71 |
+
grad: torch.Tensor,
|
| 72 |
+
param_state: Dict[str, Any],
|
| 73 |
+
lr: float,
|
| 74 |
+
beta1: float,
|
| 75 |
+
beta2: float,
|
| 76 |
+
wd_actual: float
|
| 77 |
+
):
|
| 78 |
+
"""EmoLynx のコアな勾配更新ロジック"""
|
| 79 |
+
# Stepweight decay: p.data = p.data * (1 - lr * wd)
|
| 80 |
+
p.data.mul_(1. - lr * wd_actual)
|
| 81 |
+
|
| 82 |
+
# Lynx 固有の EMA 状態は param_state に保持
|
| 83 |
+
if 'exp_avg_lynx' not in param_state:
|
| 84 |
+
param_state['exp_avg_lynx'] = torch.zeros_like(p)
|
| 85 |
+
exp_avg = param_state['exp_avg_lynx']
|
| 86 |
+
|
| 87 |
+
# 勾配ブレンド
|
| 88 |
+
blended_grad = grad.mul(1. - beta1).add_(exp_avg, alpha=beta1)
|
| 89 |
+
|
| 90 |
+
# 符号ベースの更新
|
| 91 |
+
p.data.add_(blended_grad.sign_(), alpha = -lr)
|
| 92 |
+
|
| 93 |
+
# exp_avg 更新
|
| 94 |
+
exp_avg.mul_(beta2).add_(grad, alpha = 1. - beta2)
|
| 95 |
+
|
| 96 |
+
def _navi_update(
|
| 97 |
+
self,
|
| 98 |
+
p: torch.Tensor,
|
| 99 |
+
grad: torch.Tensor,
|
| 100 |
+
param_state: Dict[str, Any],
|
| 101 |
+
lr: float,
|
| 102 |
+
betas: Tuple[float, float],
|
| 103 |
+
eps: float,
|
| 104 |
+
weight_decay: float
|
| 105 |
+
):
|
| 106 |
+
"""EmoNavi のコアな勾配更新ロジック"""
|
| 107 |
+
beta1, beta2 = betas
|
| 108 |
+
|
| 109 |
+
exp_avg = param_state.setdefault('exp_avg_navi', torch.zeros_like(p.data))
|
| 110 |
+
exp_avg_sq = param_state.setdefault('exp_avg_sq_navi', torch.zeros_like(p.data))
|
| 111 |
+
|
| 112 |
+
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
|
| 113 |
+
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
|
| 114 |
+
denom = exp_avg_sq.sqrt().add_(eps)
|
| 115 |
+
|
| 116 |
+
# Weight decay (標準的手法)
|
| 117 |
+
if weight_decay:
|
| 118 |
+
p.data.add_(p.data, alpha=-weight_decay * lr)
|
| 119 |
+
|
| 120 |
+
p.data.addcdiv_(exp_avg, denom, value=-lr)
|
| 121 |
+
|
| 122 |
+
def _fact_update(
|
| 123 |
+
self,
|
| 124 |
+
p: torch.Tensor,
|
| 125 |
+
grad: torch.Tensor,
|
| 126 |
+
param_state: Dict[str, Any],
|
| 127 |
+
lr: float,
|
| 128 |
+
betas: Tuple[float, float], # beta2 は現���使われないが互換性のため残す (1D勾配で使用)
|
| 129 |
+
eps: float,
|
| 130 |
+
weight_decay: float
|
| 131 |
+
):
|
| 132 |
+
"""EmoFact のコアな勾配更新ロジック (Adafactor ライク)"""
|
| 133 |
+
beta1, beta2 = betas
|
| 134 |
+
|
| 135 |
+
if grad.dim() >= 2:
|
| 136 |
+
# 行と列の2乗平均を計算 (分散の軽量な近似)
|
| 137 |
+
r_sq = torch.mean(grad * grad, dim=tuple(range(1, grad.dim())), keepdim=True).add_(eps)
|
| 138 |
+
c_sq = torch.mean(grad * grad, dim=0, keepdim=True).add_(eps)
|
| 139 |
+
|
| 140 |
+
param_state.setdefault('exp_avg_r_fact', torch.zeros_like(r_sq)).mul_(beta1).add_(torch.sqrt(r_sq), alpha=1 - beta1)
|
| 141 |
+
param_state.setdefault('exp_avg_c_fact', torch.zeros_like(c_sq)).mul_(beta1).add_(torch.sqrt(c_sq), alpha=1 - beta1)
|
| 142 |
+
|
| 143 |
+
# 再構築した近似勾配の平方根の積で正規化
|
| 144 |
+
denom = torch.sqrt(param_state['exp_avg_r_fact'] * param_state['exp_avg_c_fact']).add_(eps)
|
| 145 |
+
update_term = grad / denom
|
| 146 |
+
|
| 147 |
+
else: # 1次元(ベクトル)の勾配補正
|
| 148 |
+
exp_avg = param_state.setdefault('exp_avg_fact', torch.zeros_like(p.data))
|
| 149 |
+
exp_avg_sq = param_state.setdefault('exp_avg_sq_fact', torch.zeros_like(p.data))
|
| 150 |
+
|
| 151 |
+
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
|
| 152 |
+
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2) # beta2 をここでは使用
|
| 153 |
+
denom = exp_avg_sq.sqrt().add_(eps)
|
| 154 |
+
update_term = exp_avg / denom
|
| 155 |
+
|
| 156 |
+
# 最終的なパラメータ更新 (decoupled weight decayも適用)
|
| 157 |
+
p.data.add_(p.data, alpha=-weight_decay * lr)
|
| 158 |
+
p.data.add_(update_term, alpha=-lr)
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
@torch.no_grad()
|
| 162 |
+
def step(self, closure: Callable | None = None):
|
| 163 |
+
loss = None
|
| 164 |
+
if exists(closure):
|
| 165 |
+
with torch.enable_grad():
|
| 166 |
+
loss = closure()
|
| 167 |
+
loss_val = loss.item() if loss is not None else 0.0
|
| 168 |
+
|
| 169 |
+
# 全体の scalar_hist を EmoClan インスタンスで管理
|
| 170 |
+
global_scalar_hist = self.state.setdefault('global_scalar_hist', [])
|
| 171 |
+
|
| 172 |
+
# 全体としての感情EMA状態を self.state に保持し、現在の感情スカラーを計算
|
| 173 |
+
global_ema_state = self.state.setdefault('global_ema', {'short': loss_val, 'long': loss_val})
|
| 174 |
+
global_ema_state['short'] = 0.3 * loss_val + 0.7 * global_ema_state['short']
|
| 175 |
+
global_ema_state['long'] = 0.01 * loss_val + 0.99 * global_ema_state['long']
|
| 176 |
+
current_global_scalar = self._compute_scalar(global_ema_state)
|
| 177 |
+
|
| 178 |
+
# global_scalar_hist に現在の感情スカラーを追加
|
| 179 |
+
global_scalar_hist.append(current_global_scalar)
|
| 180 |
+
if len(global_scalar_hist) > 32:
|
| 181 |
+
global_scalar_hist.pop(0)
|
| 182 |
+
|
| 183 |
+
|
| 184 |
+
for group in self.param_groups:
|
| 185 |
+
lr = group['lr']
|
| 186 |
+
wd = group['weight_decay']
|
| 187 |
+
eps = group['eps']
|
| 188 |
+
decoupled_wd = group['decoupled_weight_decay']
|
| 189 |
+
|
| 190 |
+
lynx_beta1, lynx_beta2 = group['lynx_betas']
|
| 191 |
+
navi_fact_betas = group['betas'] # Navi/Fact 共通の beta を使用 (デフォルトの betas)
|
| 192 |
+
|
| 193 |
+
# Lynx の decoupled_wd のための _wd_actual 計算
|
| 194 |
+
_wd_actual_lynx = wd
|
| 195 |
+
if decoupled_wd:
|
| 196 |
+
_wd_actual_lynx /= self._init_lr
|
| 197 |
+
|
| 198 |
+
for p in group['params']:
|
| 199 |
+
if p.grad is None:
|
| 200 |
+
continue
|
| 201 |
+
|
| 202 |
+
grad = p.grad.data
|
| 203 |
+
param_state = self.state[p] # 各パラメータごとの状態
|
| 204 |
+
|
| 205 |
+
# --- 各パラメータごとの感情機構の更新と Shadow 処理 ---
|
| 206 |
+
# 各パラメータの state['ema'] は、それぞれの loss_val (全体で共通) を元に更新される
|
| 207 |
+
# ただし、現状の loss_val はクロージャから受け取った単一の値なので、
|
| 208 |
+
# 各パラメータ固有の「感情」を定義するより、全体としての感情が使われることになる。
|
| 209 |
+
param_ema = self._update_ema(param_state, loss_val)
|
| 210 |
+
param_scalar = self._compute_scalar(param_ema) # 各パラメータ固有のスカラー
|
| 211 |
+
|
| 212 |
+
ratio = self._decide_ratio(param_scalar) # 各パラメータ固有の ratio
|
| 213 |
+
|
| 214 |
+
if ratio > 0:
|
| 215 |
+
if 'shadow' not in param_state:
|
| 216 |
+
param_state['shadow'] = p.data.clone()
|
| 217 |
+
else:
|
| 218 |
+
# Shadow を現在値にブレンド
|
| 219 |
+
p.data.mul_(1 - ratio).add_(param_state['shadow'], alpha=ratio)
|
| 220 |
+
# Shadow を現在値に追従させる
|
| 221 |
+
param_state['shadow'].lerp_(p.data, 0.05)
|
| 222 |
+
|
| 223 |
+
# --- 最適化器の選択と勾配更新 ---
|
| 224 |
+
# 現在のglobal_scalar_histに記録された全体としての感情スカラーに基づいてフェーズを判断
|
| 225 |
+
# global_scalar が [-0.3, 0.3] の範囲にある場合は Navi
|
| 226 |
+
# global_scalar > 0.3 の場合は Lynx
|
| 227 |
+
# global_scalar < -0.3 の場合は Fact
|
| 228 |
+
if current_global_scalar > 0.3: # 序盤・過学習傾向時
|
| 229 |
+
self._lynx_update(p, grad, param_state, lr, lynx_beta1, lynx_beta2, _wd_actual_lynx)
|
| 230 |
+
elif current_global_scalar < -0.3: # 終盤・発散傾向時
|
| 231 |
+
self._fact_update(p, grad, param_state, lr, navi_fact_betas, eps, wd)
|
| 232 |
+
else: # -0.3 <= current_global_scalar <= 0.3 の中盤
|
| 233 |
+
self._navi_update(p, grad, param_state, lr, navi_fact_betas, eps, wd)
|
| 234 |
+
|
| 235 |
+
# Early Stop判断
|
| 236 |
+
# global_scalar_hist の評価
|
| 237 |
+
if len(global_scalar_hist) >= 32:
|
| 238 |
+
buf = global_scalar_hist
|
| 239 |
+
avg_abs = sum(abs(s) for s in buf) / len(buf)
|
| 240 |
+
std = sum((s - sum(buf)/len(buf))**2 for s in buf) / len(buf)
|
| 241 |
+
if avg_abs < 0.05 and std < 0.005:
|
| 242 |
+
self.should_stop = True # 外部からこれを見て判断可
|
| 243 |
+
|
| 244 |
+
return loss
|
| 245 |
+
|
| 246 |
+
"""
|
| 247 |
+
Emoシリーズは、Adam、Adafactor、Lion、Tiger、等から多くを学びました。
|
| 248 |
+
この開発において先人たちの知見に深く感謝しつつ今後も新しい可能性を探究します。
|
| 249 |
+
The Emo series has learned much from Adam, Adafactor, Lion, and Tiger.
|
| 250 |
+
Rather than being their successors,
|
| 251 |
+
In its development, we deeply appreciate the insights of those who came before us—and continue to explore new possibilities beyond them.
|
| 252 |
+
"""
|