Delete optimizer
Browse files- optimizer/How-to-Use-EmoNAVI(ENG).txt +0 -59
- optimizer/Use_Kohya-sd-script.txt +0 -49
- optimizer/__init__.py +0 -0
- optimizer/emoclan.py +0 -252
- optimizer/emofact.py +0 -117
- optimizer/emolynx.py +0 -129
- optimizer/emonavi.py +0 -96
- optimizer/学習の進め方(日本語).txt +0 -52
optimizer/How-to-Use-EmoNAVI(ENG).txt
DELETED
|
@@ -1,59 +0,0 @@
|
|
| 1 |
-
How to Use EmoNAVI, EmoFact, EmoLynx, and EmoClan
|
| 2 |
-
|
| 3 |
-
The EmoNavi series is designed to be scheduler-independent. This means you don't necessarily need a scheduler, and even if you use one, it's generally fine because the system automatically adjusts its settings to manage the learning process.
|
| 4 |
-
|
| 5 |
-
However, if your goal is to grasp fine details quickly, we recommend using your preferred scheduler, such as Cosine-Restart.
|
| 6 |
-
|
| 7 |
-
Understanding the Learning Rate: It's Not Just Intensity
|
| 8 |
-
|
| 9 |
-
Many people think of the learning rate setting as "learning intensity." However, it's actually more like a filter—it dictates how the VAE's latent space is perceived.
|
| 10 |
-
|
| 11 |
-
Imagine a translucent plastic plate. When the learning rate is high, the image you see through this plate is a blurrier "overview," appearing as a rough distribution of light or large masses. When the learning rate is low, the image is "detailed," with increased transparency and clearer representation. In essence, the learning rate can be thought of as "resolution"—it's like adjusting the degree of blurriness by controlling the plate's transparency.
|
| 12 |
-
|
| 13 |
-
This explains why a high learning rate is better and faster for grasping overviews. With less information to process (because the details are blurred out), the system learns basic patterns quickly. Conversely, a low learning rate involves more information, thus requiring more time to fully grasp everything. If the learning rate is too low, there's an overwhelming amount of information, leading to the training period ending without sufficient learning, and resulting in subpar outcomes.
|
| 14 |
-
|
| 15 |
-
It's important to note that this concept of "learning rate" isn't exclusive to the EmoNavi series; it applies to other optimizers as well. Please keep this in mind for your future training endeavors.
|
| 16 |
-
|
| 17 |
-
EmoNavi Series: Smart Learning with or Without Schedulers
|
| 18 |
-
|
| 19 |
-
As mentioned earlier, if you use a scheduler with the EmoNavi series, you might capture details earlier than with a constant learning rate.
|
| 20 |
-
|
| 21 |
-
Alternatively, you can skip the scheduler entirely and opt for "additional training" sessions at a lower learning rate. This is easily done without needing to manage transfer parameters, which is a simplified feature not commonly found in other optimizers.
|
| 22 |
-
|
| 23 |
-
The EmoNavi series is designed to prevent overfitting even when running at a constant learning rate. It automatically adjusts to avoid exceeding a certain threshold. Therefore, it won't learn more than necessary. If it detects that it's nearing the overfitting zone, it will adjust. After learning the general outline, it's not that it stops learning; rather, it learns only what's necessary, which might make it seem like progress has slowed compared to the initial overview-learning phase.
|
| 24 |
-
|
| 25 |
-
If you find that your training isn't progressing beyond a certain point, try an additional training session with a lower learning rate. This often allows the system to rapidly absorb all the finer details.
|
| 26 |
-
|
| 27 |
-
Conclusion
|
| 28 |
-
|
| 29 |
-
We hope this explanation helps you acquire valuable know-how for setting up your training, not just for the EmoNavi series. We believe it will be beneficial to all of you. Thank you for reading to the end.
|
| 30 |
-
|
| 31 |
-
postscript
|
| 32 |
-
|
| 33 |
-
I'd like to explain the learning rate in an easy-to-understand way, so you can truly grasp its concept.
|
| 34 |
-
|
| 35 |
-
You can think of the learning rate like reading speed.
|
| 36 |
-
Imagine this: a high learning rate is like skim reading (or speed reading), while a low learning rate is like perusing (or close reading).
|
| 37 |
-
|
| 38 |
-
The scheduler manages this, much like a learning schedule.
|
| 39 |
-
EmoNAVI has a "shadow" function that encourages the model to review and reflect on its own learning progress.
|
| 40 |
-
With EmoNAVI, you have a choice: you can allow external guidance to determine the learning path and let the model's autonomy supplement it, or you can rely solely on its autonomy.
|
| 41 |
-
|
| 42 |
-
Here are some other analogies:
|
| 43 |
-
High learning rate: Like shooting a photo from a distance, giving you an overview where details are fuzzy.
|
| 44 |
-
Low learning rate: Like shooting up close, capturing accurate details.
|
| 45 |
-
Think of autofocus as being handled by the scheduler and the "shadow" function.
|
| 46 |
-
From another perspective:
|
| 47 |
-
|
| 48 |
-
When aiming for detailed expressions, you can also consider increasing the amount of training data or increasing the number of iterations.
|
| 49 |
-
As the number of iterations increases, detailed features are gradually accumulated.
|
| 50 |
-
|
| 51 |
-
However, color representation largely depends on the performance of the VAE (Variational Autoencoder).
|
| 52 |
-
To accurately reflect colors, the only options are to improve the VAE's performance itself or to use teacher data that correctly reflects colors.
|
| 53 |
-
|
| 54 |
-
Furthermore, the "shadow" function also acts like an autofocus system.
|
| 55 |
-
It's a mechanism that allows the model to review and reflect on its own learning, essentially learning from its own experience.
|
| 56 |
-
This means it captures one feature, learns from it, then identifies another, and the process repeats.
|
| 57 |
-
Consequently, its "focus" (or understanding) continuously evolves and adapts.
|
| 58 |
-
|
| 59 |
-
That concludes the additional explanation. Thank you for reading to the end!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
optimizer/Use_Kohya-sd-script.txt
DELETED
|
@@ -1,49 +0,0 @@
|
|
| 1 |
-
Kohya-sd-script での使用法
|
| 2 |
-
|
| 3 |
-
これら Emoシリーズ を Kohya-sd-script で簡単につかうには、
|
| 4 |
-
このフォルダをこのまま Kohya-sd-script の "sd-script" フォルダに配置してください
|
| 5 |
-
|
| 6 |
-
sd-script/optimizer
|
| 7 |
-
|
| 8 |
-
この配置にした場合、
|
| 9 |
-
|
| 10 |
-
--optimizer_type=optimizer.emonavi.EmoNavi
|
| 11 |
-
--optimizer_type=optimizer.emofact.EmoFact
|
| 12 |
-
--optimizer_type=optimizer.emolynx.EmoLynx
|
| 13 |
-
--optimizer_type=optimizer.emoclan.EmoClan
|
| 14 |
-
|
| 15 |
-
このように指定するだけで各Optimizerを利用できます(いずれかひとつを指定してください)
|
| 16 |
-
---
|
| 17 |
-
Kohya-sd-script の柔軟な構成により、これらをすぐ試せます
|
| 18 |
-
Kohya-sd-script の開発者と協力者の皆さまに深く感謝します
|
| 19 |
-
Kohya-sd-script: https://github.com/kohya-ss/sd-scripts
|
| 20 |
-
|
| 21 |
-
Emoシリーズは、Adam、Adafactor、Lion、Tiger、等から多くを学びました。
|
| 22 |
-
これらの後継ではなく独自の思想や設計による"感情機構"というアプローチにより構築されています。
|
| 23 |
-
汎用性・自律性・適応性を重視し新たな最適化や効率化や簡易化を追求しています。
|
| 24 |
-
この開発において先人たちの知見に深く感謝しつつ今後も新しい可能性を探究します。
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
Usage with Kohya-sd-script
|
| 28 |
-
|
| 29 |
-
To easily use these Emo series with Kohya-sd-script,
|
| 30 |
-
simply place this folder as-is into the "sd-scripts" folder within your Kohya-sd-script installation:
|
| 31 |
-
|
| 32 |
-
sd-scripts/optimizer
|
| 33 |
-
|
| 34 |
-
With this setup,
|
| 35 |
-
|
| 36 |
-
--optimizer_type=optimizer.emonavi.EmoNavi
|
| 37 |
-
--optimizer_type=optimizer.emofact.EmoFact
|
| 38 |
-
--optimizer_type=optimizer.emolynx.EmoLynx
|
| 39 |
-
--optimizer_type=optimizer.emoclan.EmoClan
|
| 40 |
-
|
| 41 |
-
You can utilize each optimizer by simply specifying one of the above.
|
| 42 |
-
|
| 43 |
-
Thanks to the flexible configuration of Kohya-sd-script, you can try these out right away. We extend our deepest gratitude to the developers and contributors of Kohya-sd-script:
|
| 44 |
-
Kohya-sd-script: https://github.com/kohya-ss/sd-scripts
|
| 45 |
-
|
| 46 |
-
The Emo series has learned much from Adam, Adafactor, Lion, and Tiger.
|
| 47 |
-
Rather than being their successors, it is built upon a unique philosophy and design approach centered on "emotional mechanisms".
|
| 48 |
-
It prioritizes generality, autonomy, and adaptability in pursuit of new paths for optimization, efficiency, and simplicity.
|
| 49 |
-
In its development, we deeply appreciate the insights of those who came before us—and continue to explore new possibilities beyond them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
optimizer/__init__.py
DELETED
|
File without changes
|
optimizer/emoclan.py
DELETED
|
@@ -1,252 +0,0 @@
|
|
| 1 |
-
import torch
|
| 2 |
-
from torch.optim import Optimizer
|
| 3 |
-
import math
|
| 4 |
-
from typing import Callable, Union, Dict, Any, Tuple
|
| 5 |
-
|
| 6 |
-
# Helper function
|
| 7 |
-
def exists(val):
|
| 8 |
-
return val is not None
|
| 9 |
-
|
| 10 |
-
class EmoClan(Optimizer):
|
| 11 |
-
def __init__(self, params: Union[list, torch.nn.Module],
|
| 12 |
-
lr: float = 1e-3,
|
| 13 |
-
betas: Tuple[float, float] = (0.9, 0.999),
|
| 14 |
-
eps: float = 1e-8,
|
| 15 |
-
weight_decay: float = 0.01,
|
| 16 |
-
lynx_betas: Tuple[float, float] = (0.9, 0.99), # Lynx 固有の beta
|
| 17 |
-
decoupled_weight_decay: bool = False
|
| 18 |
-
):
|
| 19 |
-
|
| 20 |
-
if not 0.0 <= lr:
|
| 21 |
-
raise ValueError(f"Invalid learning rate: {lr}")
|
| 22 |
-
if not 0.0 <= eps:
|
| 23 |
-
raise ValueError(f"Invalid epsilon value: {eps}")
|
| 24 |
-
if not 0.0 <= betas[0] < 1.0:
|
| 25 |
-
raise ValueError(f"Invalid beta parameter at index 0: {betas[0]}")
|
| 26 |
-
if not 0.0 <= betas[1] < 1.0:
|
| 27 |
-
raise ValueError(f"Invalid beta parameter at index 1: {betas[1]}")
|
| 28 |
-
|
| 29 |
-
# Lynx の betas もバリデーション
|
| 30 |
-
if not 0.0 <= lynx_betas[0] < 1.0:
|
| 31 |
-
raise ValueError(f"Invalid lynx_beta parameter at index 0: {lynx_betas[0]}")
|
| 32 |
-
if not 0.0 <= lynx_betas[1] < 1.0:
|
| 33 |
-
raise ValueError(f"Invalid lynx_beta parameter at index 1: {lynx_betas[1]}")
|
| 34 |
-
|
| 35 |
-
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay,
|
| 36 |
-
lynx_betas=lynx_betas, decoupled_weight_decay=decoupled_weight_decay)
|
| 37 |
-
super().__init__(params, defaults)
|
| 38 |
-
|
| 39 |
-
self._init_lr = lr # decoupled weight decay のために保存 (Lynx用)
|
| 40 |
-
self.should_stop = False # 全体の停止フラグ
|
| 41 |
-
|
| 42 |
-
# --- 感情機構 (Emotion Mechanism) ---
|
| 43 |
-
def _update_ema(self, param_state: Dict[str, Any], loss_val: float) -> Dict[str, float]:
|
| 44 |
-
"""損失値に基づいて短期・長期 EMA を更新"""
|
| 45 |
-
# param_state は各パラメータの state['ema'] を保持する
|
| 46 |
-
ema = param_state.setdefault('ema', {'short': loss_val, 'long': loss_val})
|
| 47 |
-
ema['short'] = 0.3 * loss_val + 0.7 * ema['short']
|
| 48 |
-
ema['long'] = 0.01 * loss_val + 0.99 * ema['long']
|
| 49 |
-
return ema
|
| 50 |
-
|
| 51 |
-
def _compute_scalar(self, ema: Dict[str, float]) -> float:
|
| 52 |
-
"""EMA の差分から感情スカラー値を生成"""
|
| 53 |
-
diff = ema['short'] - ema['long']
|
| 54 |
-
return math.tanh(5 * diff)
|
| 55 |
-
|
| 56 |
-
def _decide_ratio(self, scalar: float) -> float:
|
| 57 |
-
"""感情スカラーに基づいて Shadow の混合比率を決定"""
|
| 58 |
-
if scalar > 0.6:
|
| 59 |
-
return 0.7 + 0.2 * scalar # 0.7~0.9
|
| 60 |
-
elif scalar < -0.6:
|
| 61 |
-
return 0.1
|
| 62 |
-
elif abs(scalar) > 0.3: # >0.3 かつ <=0.6 の場合
|
| 63 |
-
return 0.3
|
| 64 |
-
return 0.0
|
| 65 |
-
|
| 66 |
-
# --- 各最適化器のコアな勾配更新ロジック (プライベートメソッドとして統合) ---
|
| 67 |
-
|
| 68 |
-
def _lynx_update(
|
| 69 |
-
self,
|
| 70 |
-
p: torch.Tensor,
|
| 71 |
-
grad: torch.Tensor,
|
| 72 |
-
param_state: Dict[str, Any],
|
| 73 |
-
lr: float,
|
| 74 |
-
beta1: float,
|
| 75 |
-
beta2: float,
|
| 76 |
-
wd_actual: float
|
| 77 |
-
):
|
| 78 |
-
"""EmoLynx のコアな勾配更新ロジック"""
|
| 79 |
-
# Stepweight decay: p.data = p.data * (1 - lr * wd)
|
| 80 |
-
p.data.mul_(1. - lr * wd_actual)
|
| 81 |
-
|
| 82 |
-
# Lynx 固有の EMA 状態は param_state に保持
|
| 83 |
-
if 'exp_avg_lynx' not in param_state:
|
| 84 |
-
param_state['exp_avg_lynx'] = torch.zeros_like(p)
|
| 85 |
-
exp_avg = param_state['exp_avg_lynx']
|
| 86 |
-
|
| 87 |
-
# 勾配ブレンド
|
| 88 |
-
blended_grad = grad.mul(1. - beta1).add_(exp_avg, alpha=beta1)
|
| 89 |
-
|
| 90 |
-
# 符号ベースの更新
|
| 91 |
-
p.data.add_(blended_grad.sign_(), alpha = -lr)
|
| 92 |
-
|
| 93 |
-
# exp_avg 更新
|
| 94 |
-
exp_avg.mul_(beta2).add_(grad, alpha = 1. - beta2)
|
| 95 |
-
|
| 96 |
-
def _navi_update(
|
| 97 |
-
self,
|
| 98 |
-
p: torch.Tensor,
|
| 99 |
-
grad: torch.Tensor,
|
| 100 |
-
param_state: Dict[str, Any],
|
| 101 |
-
lr: float,
|
| 102 |
-
betas: Tuple[float, float],
|
| 103 |
-
eps: float,
|
| 104 |
-
weight_decay: float
|
| 105 |
-
):
|
| 106 |
-
"""EmoNavi のコアな勾配更新ロジック"""
|
| 107 |
-
beta1, beta2 = betas
|
| 108 |
-
|
| 109 |
-
exp_avg = param_state.setdefault('exp_avg_navi', torch.zeros_like(p.data))
|
| 110 |
-
exp_avg_sq = param_state.setdefault('exp_avg_sq_navi', torch.zeros_like(p.data))
|
| 111 |
-
|
| 112 |
-
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
|
| 113 |
-
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
|
| 114 |
-
denom = exp_avg_sq.sqrt().add_(eps)
|
| 115 |
-
|
| 116 |
-
# Weight decay (標準的手法)
|
| 117 |
-
if weight_decay:
|
| 118 |
-
p.data.add_(p.data, alpha=-weight_decay * lr)
|
| 119 |
-
|
| 120 |
-
p.data.addcdiv_(exp_avg, denom, value=-lr)
|
| 121 |
-
|
| 122 |
-
def _fact_update(
|
| 123 |
-
self,
|
| 124 |
-
p: torch.Tensor,
|
| 125 |
-
grad: torch.Tensor,
|
| 126 |
-
param_state: Dict[str, Any],
|
| 127 |
-
lr: float,
|
| 128 |
-
betas: Tuple[float, float], # beta2 は現���使われないが互換性のため残す (1D勾配で使用)
|
| 129 |
-
eps: float,
|
| 130 |
-
weight_decay: float
|
| 131 |
-
):
|
| 132 |
-
"""EmoFact のコアな勾配更新ロジック (Adafactor ライク)"""
|
| 133 |
-
beta1, beta2 = betas
|
| 134 |
-
|
| 135 |
-
if grad.dim() >= 2:
|
| 136 |
-
# 行と列の2乗平均を計算 (分散の軽量な近似)
|
| 137 |
-
r_sq = torch.mean(grad * grad, dim=tuple(range(1, grad.dim())), keepdim=True).add_(eps)
|
| 138 |
-
c_sq = torch.mean(grad * grad, dim=0, keepdim=True).add_(eps)
|
| 139 |
-
|
| 140 |
-
param_state.setdefault('exp_avg_r_fact', torch.zeros_like(r_sq)).mul_(beta1).add_(torch.sqrt(r_sq), alpha=1 - beta1)
|
| 141 |
-
param_state.setdefault('exp_avg_c_fact', torch.zeros_like(c_sq)).mul_(beta1).add_(torch.sqrt(c_sq), alpha=1 - beta1)
|
| 142 |
-
|
| 143 |
-
# 再構築した近似勾配の平方根の積で正規化
|
| 144 |
-
denom = torch.sqrt(param_state['exp_avg_r_fact'] * param_state['exp_avg_c_fact']).add_(eps)
|
| 145 |
-
update_term = grad / denom
|
| 146 |
-
|
| 147 |
-
else: # 1次元(ベクトル)の勾配補正
|
| 148 |
-
exp_avg = param_state.setdefault('exp_avg_fact', torch.zeros_like(p.data))
|
| 149 |
-
exp_avg_sq = param_state.setdefault('exp_avg_sq_fact', torch.zeros_like(p.data))
|
| 150 |
-
|
| 151 |
-
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
|
| 152 |
-
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2) # beta2 をここでは使用
|
| 153 |
-
denom = exp_avg_sq.sqrt().add_(eps)
|
| 154 |
-
update_term = exp_avg / denom
|
| 155 |
-
|
| 156 |
-
# 最終的なパラメータ更新 (decoupled weight decayも適用)
|
| 157 |
-
p.data.add_(p.data, alpha=-weight_decay * lr)
|
| 158 |
-
p.data.add_(update_term, alpha=-lr)
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
@torch.no_grad()
|
| 162 |
-
def step(self, closure: Callable | None = None):
|
| 163 |
-
loss = None
|
| 164 |
-
if exists(closure):
|
| 165 |
-
with torch.enable_grad():
|
| 166 |
-
loss = closure()
|
| 167 |
-
loss_val = loss.item() if loss is not None else 0.0
|
| 168 |
-
|
| 169 |
-
# 全体の scalar_hist を EmoClan インスタンスで管理
|
| 170 |
-
global_scalar_hist = self.state.setdefault('global_scalar_hist', [])
|
| 171 |
-
|
| 172 |
-
# 全体としての感情EMA状態を self.state に保持し、現在の感情スカラーを計算
|
| 173 |
-
global_ema_state = self.state.setdefault('global_ema', {'short': loss_val, 'long': loss_val})
|
| 174 |
-
global_ema_state['short'] = 0.3 * loss_val + 0.7 * global_ema_state['short']
|
| 175 |
-
global_ema_state['long'] = 0.01 * loss_val + 0.99 * global_ema_state['long']
|
| 176 |
-
current_global_scalar = self._compute_scalar(global_ema_state)
|
| 177 |
-
|
| 178 |
-
# global_scalar_hist に現在の感情スカラーを追加
|
| 179 |
-
global_scalar_hist.append(current_global_scalar)
|
| 180 |
-
if len(global_scalar_hist) > 32:
|
| 181 |
-
global_scalar_hist.pop(0)
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
for group in self.param_groups:
|
| 185 |
-
lr = group['lr']
|
| 186 |
-
wd = group['weight_decay']
|
| 187 |
-
eps = group['eps']
|
| 188 |
-
decoupled_wd = group['decoupled_weight_decay']
|
| 189 |
-
|
| 190 |
-
lynx_beta1, lynx_beta2 = group['lynx_betas']
|
| 191 |
-
navi_fact_betas = group['betas'] # Navi/Fact 共通の beta を使用 (デフォルトの betas)
|
| 192 |
-
|
| 193 |
-
# Lynx の decoupled_wd のための _wd_actual 計算
|
| 194 |
-
_wd_actual_lynx = wd
|
| 195 |
-
if decoupled_wd:
|
| 196 |
-
_wd_actual_lynx /= self._init_lr
|
| 197 |
-
|
| 198 |
-
for p in group['params']:
|
| 199 |
-
if p.grad is None:
|
| 200 |
-
continue
|
| 201 |
-
|
| 202 |
-
grad = p.grad.data
|
| 203 |
-
param_state = self.state[p] # 各パラメータごとの状態
|
| 204 |
-
|
| 205 |
-
# --- 各パラメータごとの感情機構の更新と Shadow 処理 ---
|
| 206 |
-
# 各パラメータの state['ema'] は、それぞれの loss_val (全体で共通) を元に更新される
|
| 207 |
-
# ただし、現状の loss_val はクロージャから受け取った単一の値なので、
|
| 208 |
-
# 各パラメータ固有の「感情」を定義するより、全体としての感情が使われることになる。
|
| 209 |
-
param_ema = self._update_ema(param_state, loss_val)
|
| 210 |
-
param_scalar = self._compute_scalar(param_ema) # 各パラメータ固有のスカラー
|
| 211 |
-
|
| 212 |
-
ratio = self._decide_ratio(param_scalar) # 各パラメータ固有の ratio
|
| 213 |
-
|
| 214 |
-
if ratio > 0:
|
| 215 |
-
if 'shadow' not in param_state:
|
| 216 |
-
param_state['shadow'] = p.data.clone()
|
| 217 |
-
else:
|
| 218 |
-
# Shadow を現在値にブレンド
|
| 219 |
-
p.data.mul_(1 - ratio).add_(param_state['shadow'], alpha=ratio)
|
| 220 |
-
# Shadow を現在値に追従させる
|
| 221 |
-
param_state['shadow'].lerp_(p.data, 0.05)
|
| 222 |
-
|
| 223 |
-
# --- 最適化器の選択と勾配更新 ---
|
| 224 |
-
# 現在のglobal_scalar_histに記録された全体としての感情スカラーに基づいてフェーズを判断
|
| 225 |
-
# global_scalar が [-0.3, 0.3] の範囲にある場合は Navi
|
| 226 |
-
# global_scalar > 0.3 の場合は Lynx
|
| 227 |
-
# global_scalar < -0.3 の場合は Fact
|
| 228 |
-
if current_global_scalar > 0.3: # 序盤・過学習傾向時
|
| 229 |
-
self._lynx_update(p, grad, param_state, lr, lynx_beta1, lynx_beta2, _wd_actual_lynx)
|
| 230 |
-
elif current_global_scalar < -0.3: # 終盤・発散傾向時
|
| 231 |
-
self._fact_update(p, grad, param_state, lr, navi_fact_betas, eps, wd)
|
| 232 |
-
else: # -0.3 <= current_global_scalar <= 0.3 の中盤
|
| 233 |
-
self._navi_update(p, grad, param_state, lr, navi_fact_betas, eps, wd)
|
| 234 |
-
|
| 235 |
-
# Early Stop判断
|
| 236 |
-
# global_scalar_hist の評価
|
| 237 |
-
if len(global_scalar_hist) >= 32:
|
| 238 |
-
buf = global_scalar_hist
|
| 239 |
-
avg_abs = sum(abs(s) for s in buf) / len(buf)
|
| 240 |
-
std = sum((s - sum(buf)/len(buf))**2 for s in buf) / len(buf)
|
| 241 |
-
if avg_abs < 0.05 and std < 0.005:
|
| 242 |
-
self.should_stop = True # 外部からこれを見て判断可
|
| 243 |
-
|
| 244 |
-
return loss
|
| 245 |
-
|
| 246 |
-
"""
|
| 247 |
-
Emoシリーズは、Adam、Adafactor、Lion、Tiger、等から多くを学びました。
|
| 248 |
-
この開発において先人たちの知見に深く感謝しつつ今後も新しい可能性を探究します。
|
| 249 |
-
The Emo series has learned much from Adam, Adafactor, Lion, and Tiger.
|
| 250 |
-
Rather than being their successors,
|
| 251 |
-
In its development, we deeply appreciate the insights of those who came before us—and continue to explore new possibilities beyond them.
|
| 252 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
optimizer/emofact.py
DELETED
|
@@ -1,117 +0,0 @@
|
|
| 1 |
-
import torch
|
| 2 |
-
from torch.optim import Optimizer
|
| 3 |
-
import math
|
| 4 |
-
|
| 5 |
-
class EmoFact(Optimizer):
|
| 6 |
-
# クラス定義&初期化
|
| 7 |
-
def __init__(self, params, lr=1e-3, betas=(0.9, 0.999),
|
| 8 |
-
eps=1e-8, weight_decay=0.01):
|
| 9 |
-
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
|
| 10 |
-
super().__init__(params, defaults)
|
| 11 |
-
|
| 12 |
-
# 感情EMA更新(緊張と安静)
|
| 13 |
-
def _update_ema(self, state, loss_val):
|
| 14 |
-
ema = state.setdefault('ema', {})
|
| 15 |
-
ema['short'] = 0.3 * loss_val + 0.7 * ema.get('short', loss_val)
|
| 16 |
-
ema['long'] = 0.01 * loss_val + 0.99 * ema.get('long', loss_val)
|
| 17 |
-
return ema
|
| 18 |
-
|
| 19 |
-
# 感情スカラー値生成(EMA差分、滑らかな非線形スカラー、tanh 5 * diff で鋭敏さ強調)
|
| 20 |
-
def _compute_scalar(self, ema):
|
| 21 |
-
diff = ema['short'] - ema['long']
|
| 22 |
-
return math.tanh(5 * diff)
|
| 23 |
-
|
| 24 |
-
# Shadow混合比率(> 0.6:70〜90%、 < 0.6:10%、 > 0.3:30%、 平時:0%)
|
| 25 |
-
def _decide_ratio(self, scalar):
|
| 26 |
-
if scalar > 0.6:
|
| 27 |
-
return 0.7 + 0.2 * scalar
|
| 28 |
-
elif scalar < -0.6:
|
| 29 |
-
return 0.1
|
| 30 |
-
elif abs(scalar) > 0.3:
|
| 31 |
-
return 0.3
|
| 32 |
-
return 0.0
|
| 33 |
-
|
| 34 |
-
# 損失取得(損失値 loss_val を数値化、感情判定に使用、存在しないパラメータ(更新不要)はスキップ)
|
| 35 |
-
@torch.no_grad()
|
| 36 |
-
def step(self, closure=None):
|
| 37 |
-
loss = closure() if closure is not None else None
|
| 38 |
-
loss_val = loss.item() if loss is not None else 0.0
|
| 39 |
-
|
| 40 |
-
for group in self.param_groups:
|
| 41 |
-
for p in group['params']:
|
| 42 |
-
if p.grad is None:
|
| 43 |
-
continue
|
| 44 |
-
|
| 45 |
-
grad = p.grad.data
|
| 46 |
-
state = self.state[p]
|
| 47 |
-
|
| 48 |
-
# 感情EMA更新・スカラー生成 (既存ロジックを維持)
|
| 49 |
-
ema = self._update_ema(state, loss_val)
|
| 50 |
-
scalar = self._compute_scalar(ema)
|
| 51 |
-
ratio = self._decide_ratio(scalar)
|
| 52 |
-
|
| 53 |
-
# shadow_param:必要時のみ更新 (既存ロジックを維持)
|
| 54 |
-
if ratio > 0:
|
| 55 |
-
if 'shadow' not in state:
|
| 56 |
-
state['shadow'] = p.data.clone()
|
| 57 |
-
else:
|
| 58 |
-
p.data.mul_(1 - ratio).add_(state['shadow'], alpha=ratio)
|
| 59 |
-
state['shadow'].lerp_(p.data, 0.05)
|
| 60 |
-
|
| 61 |
-
# --- 新しい勾配補正ロジック ---
|
| 62 |
-
# 行列の形状が2次元以上の場合、分散情報ベースのAB近似を使用
|
| 63 |
-
if grad.dim() >= 2:
|
| 64 |
-
# 行と列の2乗平均を計算 (分散の軽量な近似)
|
| 65 |
-
r_sq = torch.mean(grad * grad, dim=tuple(range(1, grad.dim())), keepdim=True).add_(group['eps'])
|
| 66 |
-
c_sq = torch.mean(grad * grad, dim=0, keepdim=True).add_(group['eps'])
|
| 67 |
-
|
| 68 |
-
# 分散情報から勾配の近似行列を生成
|
| 69 |
-
# AB行列として見立てたものを直接生成し更新項を計算する
|
| 70 |
-
# A = sqrt(r_sq), B = sqrt(c_sq) とすることでAB行列の近似を再現
|
| 71 |
-
# これをEMAで平滑化する
|
| 72 |
-
beta1, beta2 = group['betas']
|
| 73 |
-
|
| 74 |
-
state.setdefault('exp_avg_r', torch.zeros_like(r_sq)).mul_(beta1).add_(torch.sqrt(r_sq), alpha=1 - beta1)
|
| 75 |
-
state.setdefault('exp_avg_c', torch.zeros_like(c_sq)).mul_(beta1).add_(torch.sqrt(c_sq), alpha=1 - beta1)
|
| 76 |
-
|
| 77 |
-
# 再構築した近似勾配の平方根の積で正規化
|
| 78 |
-
# これにより2次モーメントのような役割を果たす
|
| 79 |
-
denom = torch.sqrt(state['exp_avg_r'] * state['exp_avg_c']).add_(group['eps'])
|
| 80 |
-
|
| 81 |
-
# 最終的な更新項を計算
|
| 82 |
-
update_term = grad / denom
|
| 83 |
-
|
| 84 |
-
# 1次元(ベクトル)の勾配補正(decoupled weight decay 構造に近い)
|
| 85 |
-
else:
|
| 86 |
-
exp_avg = state.setdefault('exp_avg', torch.zeros_like(p.data))
|
| 87 |
-
exp_avg_sq = state.setdefault('exp_avg_sq', torch.zeros_like(p.data))
|
| 88 |
-
beta1, beta2 = group['betas']
|
| 89 |
-
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
|
| 90 |
-
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
|
| 91 |
-
denom = exp_avg_sq.sqrt().add_(group['eps'])
|
| 92 |
-
update_term = exp_avg / denom
|
| 93 |
-
|
| 94 |
-
# 最終的なパラメータ更新 (decoupled weight decayも適用)
|
| 95 |
-
p.data.add_(p.data, alpha=-group['weight_decay'] * group['lr'])
|
| 96 |
-
p.data.add_(update_term, alpha=-group['lr'])
|
| 97 |
-
|
| 98 |
-
# --- Early Stop ロジック (既存ロジックを維持) ---
|
| 99 |
-
hist = self.state.setdefault('scalar_hist', [])
|
| 100 |
-
hist.append(scalar)
|
| 101 |
-
if len(hist) > 32:
|
| 102 |
-
hist.pop(0)
|
| 103 |
-
|
| 104 |
-
# Early Stop判断
|
| 105 |
-
if len(self.state['scalar_hist']) >= 32:
|
| 106 |
-
buf = self.state['scalar_hist']
|
| 107 |
-
avg_abs = sum(abs(s) for s in buf) / len(buf)
|
| 108 |
-
std = sum((s - sum(buf)/len(buf))**2 for s in buf) / len(buf)
|
| 109 |
-
if avg_abs < 0.05 and std < 0.005:
|
| 110 |
-
self.should_stop = True
|
| 111 |
-
|
| 112 |
-
return loss
|
| 113 |
-
|
| 114 |
-
"""
|
| 115 |
-
Fact is inspired by Adafactor,
|
| 116 |
-
and its VRAM-friendly design is something everyone loves.
|
| 117 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
optimizer/emolynx.py
DELETED
|
@@ -1,129 +0,0 @@
|
|
| 1 |
-
import torch
|
| 2 |
-
from torch.optim import Optimizer
|
| 3 |
-
import math
|
| 4 |
-
from typing import Tuple, Callable, Union
|
| 5 |
-
|
| 6 |
-
# Helper function (Lynx)
|
| 7 |
-
def exists(val):
|
| 8 |
-
return val is not None
|
| 9 |
-
|
| 10 |
-
class EmoLynx(Optimizer):
|
| 11 |
-
# クラス定義&初期化
|
| 12 |
-
def __init__(self, params: Union[list, torch.nn.Module], lr=1e-3, betas=(0.9, 0.99),
|
| 13 |
-
# lynx用ベータ・互換性の追加(lynx用beta1・beta2)
|
| 14 |
-
eps=1e-8, weight_decay=0.01, decoupled_weight_decay: bool = False):
|
| 15 |
-
|
| 16 |
-
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
|
| 17 |
-
super().__init__(params, defaults)
|
| 18 |
-
|
| 19 |
-
# lynxに応じてウェイト減衰のため保存
|
| 20 |
-
self._init_lr = lr
|
| 21 |
-
self.decoupled_wd = decoupled_weight_decay
|
| 22 |
-
self.should_stop = False # 停止フラグの初期化
|
| 23 |
-
|
| 24 |
-
# 感情EMA更新(緊張と安静)
|
| 25 |
-
def _update_ema(self, state, loss_val):
|
| 26 |
-
ema = state.setdefault('ema', {})
|
| 27 |
-
ema['short'] = 0.3 * loss_val + 0.7 * ema.get('short', loss_val)
|
| 28 |
-
ema['long'] = 0.01 * loss_val + 0.99 * ema.get('long', loss_val)
|
| 29 |
-
return ema
|
| 30 |
-
|
| 31 |
-
# 感情スカラー値生成(EMA差分、滑らかな非線形スカラー、tanh 5 * diff で鋭敏さ強調)
|
| 32 |
-
def _compute_scalar(self, ema):
|
| 33 |
-
diff = ema['short'] - ema['long']
|
| 34 |
-
return math.tanh(5 * diff)
|
| 35 |
-
|
| 36 |
-
# Shadow混合比率(> 0.6:70〜90%、 < 0.6:10%、 > 0.3:30%、 平時:0%)
|
| 37 |
-
def _decide_ratio(self, scalar):
|
| 38 |
-
if scalar > 0.6:
|
| 39 |
-
return 0.7 + 0.2 * scalar
|
| 40 |
-
elif scalar < -0.6:
|
| 41 |
-
return 0.1
|
| 42 |
-
elif abs(scalar) > 0.3:
|
| 43 |
-
return 0.3
|
| 44 |
-
return 0.0
|
| 45 |
-
|
| 46 |
-
# 損失取得(損失値 loss_val を数値化、感情判定に使用、存在しないパラメータ(更新不要)はスキップ)
|
| 47 |
-
@torch.no_grad()
|
| 48 |
-
def step(self, closure: Callable | None = None): # クロージャの型ヒントを追加
|
| 49 |
-
loss = None
|
| 50 |
-
if exists(closure): # 一貫性のためにexistsヘルパーを使う
|
| 51 |
-
with torch.enable_grad():
|
| 52 |
-
loss = closure()
|
| 53 |
-
loss_val = loss.item() if loss is not None else 0.0
|
| 54 |
-
|
| 55 |
-
for group in self.param_groups:
|
| 56 |
-
# リンクス共通パラメータ抽出
|
| 57 |
-
lr, wd, beta1, beta2 = group['lr'], group['weight_decay'], *group['betas']
|
| 58 |
-
|
| 59 |
-
# ウェイト減衰の処理を分離 (from lynx)
|
| 60 |
-
_wd_actual = wd
|
| 61 |
-
if self.decoupled_wd:
|
| 62 |
-
_wd_actual /= self._init_lr # 非連結時ウェイト減衰調整
|
| 63 |
-
|
| 64 |
-
for p in filter(lambda p: exists(p.grad), group['params']): # PGチェックにフィルタ
|
| 65 |
-
|
| 66 |
-
grad = p.grad # PG直接使用(計算に".data"不要)
|
| 67 |
-
state = self.state[p]
|
| 68 |
-
|
| 69 |
-
# EMA更新・スカラー生成(EMA差分からスカラーを生成しスパイク比率を決定)
|
| 70 |
-
ema = self._update_ema(state, loss_val)
|
| 71 |
-
scalar = self._compute_scalar(ema)
|
| 72 |
-
ratio = self._decide_ratio(scalar)
|
| 73 |
-
|
| 74 |
-
# shadow_param:必要時のみ更新(スパイク部分に現在値を5%ずつ追従させる動的履歴)
|
| 75 |
-
if ratio > 0:
|
| 76 |
-
if 'shadow' not in state:
|
| 77 |
-
state['shadow'] = p.data.clone()
|
| 78 |
-
else:
|
| 79 |
-
p.data.mul_(1 - ratio).add_(state['shadow'], alpha=ratio)
|
| 80 |
-
state['shadow'].lerp_(p.data, 0.05)
|
| 81 |
-
# lynx更新前 p.data で shadow 更新(現在値を5%ずつ追従)
|
| 82 |
-
# p.data.mul_(1 - ratio).add_(state['shadow'], alpha=ratio)
|
| 83 |
-
# EmoNavi: p.data = p.data * (1-ratio) + shadow * ratio
|
| 84 |
-
|
| 85 |
-
# --- Start Lynx Gradient Update Logic ---
|
| 86 |
-
|
| 87 |
-
# lynx初期化(exp_avg_sq)
|
| 88 |
-
if 'exp_avg' not in state:
|
| 89 |
-
state['exp_avg'] = torch.zeros_like(p)
|
| 90 |
-
exp_avg = state['exp_avg']
|
| 91 |
-
|
| 92 |
-
# Stepweight decay (from lynx): p.data = p.data * (1 - lr * wd)
|
| 93 |
-
# decoupled_wd 考慮 _wd_actual 使用(EmoNaviのwdは最後に適用)
|
| 94 |
-
p.data.mul_(1. - lr * _wd_actual)
|
| 95 |
-
|
| 96 |
-
# 勾配ブレンド
|
| 97 |
-
# m_t = beta1 * exp_avg_prev + (1 - beta1) * grad
|
| 98 |
-
blended_grad = grad.mul(1. - beta1).add_(exp_avg, alpha=beta1)
|
| 99 |
-
|
| 100 |
-
# p: p.data = p.data - lr * sign(blended_grad)
|
| 101 |
-
p.data.add_(blended_grad.sign_(), alpha = -lr)
|
| 102 |
-
|
| 103 |
-
# exp_avg = beta2 * exp_avg + (1 - beta2) * grad
|
| 104 |
-
exp_avg.mul_(beta2).add_(grad, alpha = 1. - beta2)
|
| 105 |
-
|
| 106 |
-
# --- End Lynx Gradient Update Logic ---
|
| 107 |
-
|
| 108 |
-
# Early Stop用 scalar記録(バッファ共通で管理/最大32件保持/動静評価)
|
| 109 |
-
# この部分は p.state ではなく self.state ���アクセスする
|
| 110 |
-
hist = self.state.setdefault('scalar_hist', [])
|
| 111 |
-
hist.append(scalar)
|
| 112 |
-
if len(hist) > 32:
|
| 113 |
-
hist.pop(0)
|
| 114 |
-
|
| 115 |
-
# Early Stop判断(静けさの合図) - This part is outside the inner loop
|
| 116 |
-
if len(self.state['scalar_hist']) >= 32:
|
| 117 |
-
buf = self.state['scalar_hist']
|
| 118 |
-
avg_abs = sum(abs(s) for s in buf) / len(buf)
|
| 119 |
-
std = sum((s - sum(buf)/len(buf))**2 for s in buf) / len(buf)
|
| 120 |
-
if avg_abs < 0.05 and std < 0.005:
|
| 121 |
-
self.should_stop = True # 💡 外部からこれを見て判断可
|
| 122 |
-
|
| 123 |
-
return loss
|
| 124 |
-
|
| 125 |
-
"""
|
| 126 |
-
Lynx was developed with inspiration from Lion and Tiger,
|
| 127 |
-
which we deeply respect for their lightweight and intelligent design.
|
| 128 |
-
Lynx also integrates EmoNAVI to enhance its capabilities.
|
| 129 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
optimizer/emonavi.py
DELETED
|
@@ -1,96 +0,0 @@
|
|
| 1 |
-
import torch
|
| 2 |
-
from torch.optim import Optimizer
|
| 3 |
-
import math
|
| 4 |
-
|
| 5 |
-
class EmoNavi(Optimizer):
|
| 6 |
-
# クラス定義&初期化
|
| 7 |
-
def __init__(self, params, lr=1e-3, betas=(0.9, 0.999),
|
| 8 |
-
eps=1e-8, weight_decay=0.01):
|
| 9 |
-
defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
|
| 10 |
-
super().__init__(params, defaults)
|
| 11 |
-
# 感情EMA更新(緊張と安静)
|
| 12 |
-
def _update_ema(self, state, loss_val):
|
| 13 |
-
ema = state.setdefault('ema', {})
|
| 14 |
-
ema['short'] = 0.3 * loss_val + 0.7 * ema.get('short', loss_val)
|
| 15 |
-
ema['long'] = 0.01 * loss_val + 0.99 * ema.get('long', loss_val)
|
| 16 |
-
return ema
|
| 17 |
-
# 感情スカラー値生成(EMA差分、滑らかな非線形スカラー、tanh 5 * diff で鋭敏さ強調)
|
| 18 |
-
def _compute_scalar(self, ema):
|
| 19 |
-
diff = ema['short'] - ema['long']
|
| 20 |
-
return math.tanh(5 * diff)
|
| 21 |
-
# Shadow混合比率(> 0.6:70〜90%、 < 0.6:10%、 > 0.3:30%、 平時:0%)
|
| 22 |
-
def _decide_ratio(self, scalar):
|
| 23 |
-
if scalar > 0.6:
|
| 24 |
-
return 0.7 + 0.2 * scalar
|
| 25 |
-
elif scalar < -0.6:
|
| 26 |
-
return 0.1
|
| 27 |
-
elif abs(scalar) > 0.3:
|
| 28 |
-
return 0.3
|
| 29 |
-
return 0.0
|
| 30 |
-
# 損失取得(損失値 loss_val を数値化、感情判定に使用、存在しないパラメータ(更新不要)はスキップ)
|
| 31 |
-
@torch.no_grad()
|
| 32 |
-
def step(self, closure=None):
|
| 33 |
-
loss = closure() if closure is not None else None
|
| 34 |
-
loss_val = loss.item() if loss is not None else 0.0
|
| 35 |
-
|
| 36 |
-
for group in self.param_groups:
|
| 37 |
-
for p in group['params']:
|
| 38 |
-
if p.grad is None:
|
| 39 |
-
continue
|
| 40 |
-
|
| 41 |
-
grad = p.grad.data
|
| 42 |
-
state = self.state[p]
|
| 43 |
-
|
| 44 |
-
# EMA更新・スカラー生成(EMA差分からスカラーを生成しスパイク比率を決定)
|
| 45 |
-
ema = self._update_ema(state, loss_val)
|
| 46 |
-
scalar = self._compute_scalar(ema)
|
| 47 |
-
ratio = self._decide_ratio(scalar)
|
| 48 |
-
|
| 49 |
-
# shadow_param:必要時のみ更新(スパイク部分に現在値を5%ずつ追従させる動的履歴)
|
| 50 |
-
if ratio > 0:
|
| 51 |
-
if 'shadow' not in state:
|
| 52 |
-
state['shadow'] = p.data.clone()
|
| 53 |
-
else:
|
| 54 |
-
p.data.mul_(1 - ratio).add_(state['shadow'], alpha=ratio)
|
| 55 |
-
state['shadow'].lerp_(p.data, 0.05)
|
| 56 |
-
|
| 57 |
-
# スカラー生成:短期と長期EMAの差分から信号を得る(高ぶりの強さ)
|
| 58 |
-
# 混合比率:スカラーが閾値を超える場合にのみ計算される(信頼できる感情信号かどうかの選別)
|
| 59 |
-
# → スカラー値が小さい場合は ratio = 0 となり、shadow混合は行われない
|
| 60 |
-
# → 信頼できる強い差分のときのみ感情機構が発動する(暗黙の信頼度判定)
|
| 61 |
-
|
| 62 |
-
# 1次・2次モーメントを使った勾配補正(decoupled weight decay 構造に近い)
|
| 63 |
-
exp_avg = state.setdefault('exp_avg', torch.zeros_like(p.data))
|
| 64 |
-
exp_avg_sq = state.setdefault('exp_avg_sq', torch.zeros_like(p.data))
|
| 65 |
-
beta1, beta2 = group['betas']
|
| 66 |
-
exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
|
| 67 |
-
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
|
| 68 |
-
denom = exp_avg_sq.sqrt().add_(group['eps'])
|
| 69 |
-
|
| 70 |
-
step_size = group['lr']
|
| 71 |
-
if group['weight_decay']:
|
| 72 |
-
p.data.add_(p.data, alpha=-group['weight_decay'] * step_size)
|
| 73 |
-
p.data.addcdiv_(exp_avg, denom, value=-step_size)
|
| 74 |
-
|
| 75 |
-
# 感情機構の発火が収まり"十分に安定"していることを外部伝達できる(自動停止ロジックではない)
|
| 76 |
-
# Early Stop用 scalar 記録(バッファ共通で管理/最大32件保持/動静評価)
|
| 77 |
-
hist = self.state.setdefault('scalar_hist', [])
|
| 78 |
-
hist.append(scalar)
|
| 79 |
-
if len(hist) > 32:
|
| 80 |
-
hist.pop(0)
|
| 81 |
-
|
| 82 |
-
# Early Stop判断(静けさの合図)
|
| 83 |
-
if len(self.state['scalar_hist']) >= 32:
|
| 84 |
-
buf = self.state['scalar_hist']
|
| 85 |
-
avg_abs = sum(abs(s) for s in buf) / len(buf)
|
| 86 |
-
std = sum((s - sum(buf)/len(buf))**2 for s in buf) / len(buf)
|
| 87 |
-
if avg_abs < 0.05 and std < 0.005:
|
| 88 |
-
self.should_stop = True # 💡 外部からこれを見て判断可
|
| 89 |
-
|
| 90 |
-
# 32ステップ分のスカラー値の静かな条件を満たした時"フラグ" should_stop = True になるだけ
|
| 91 |
-
|
| 92 |
-
return loss
|
| 93 |
-
|
| 94 |
-
# https://github.com/muooon/EmoNavi
|
| 95 |
-
# An emotion-driven optimizer that feels loss and navigates accordingly.
|
| 96 |
-
# Don't think. Feel. Don't stop. Keep running. Believe in what's beyond.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
optimizer/学習の進め方(日本語).txt
DELETED
|
@@ -1,52 +0,0 @@
|
|
| 1 |
-
EmoNAVI、Fact、Lynx / Clan の使い方について
|
| 2 |
-
|
| 3 |
-
emonavi系は、スケジューラーに依存しない設計です、スケジューラーはなくていい、あっても大丈夫、どんな設定でも概ねなんとか自動調整するから、という感じの設計です。
|
| 4 |
-
|
| 5 |
-
しかし短時間で詳細をしっかり習得させたい方は、Cosine-ReStart 等のあなたの普段お使いのスケジューラーを使用してください。
|
| 6 |
-
|
| 7 |
-
<学習率とは?強度ではなく「フィルター」です>
|
| 8 |
-
|
| 9 |
-
学習率の設定は、学習強度と理解している方も多いだろうと想像しますが、この学習率というのは、いわばフィルターです。VAEのLatentをどう見るか?を左右するのが学習率です。
|
| 10 |
-
|
| 11 |
-
想像して欲しいのは"半透明なプラ板"です、学習率が高いとき、このプラ板に透けて映る画像は"概要"です。大まかな光の分布や大きな塊として表現されていると想像してください。学習率の低いときは"詳細"です。透明さが増しクリアな表現になります。ようは学習率は"解像度"と言っていいと思います。透明度によりボケ具合を調整するようなものです。
|
| 12 |
-
|
| 13 |
-
学習率の高いとき概要を覚えるのが得意で早いのはこのためです。情報も少ないので早期に習得できます。逆に学習率の低いときは情報も多いため習得するまでの時間が増える。というワケですね。低すぎるときは情報過多のため習得できずに訓練期間を終えてしまい結果も薄いものになります。
|
| 14 |
-
|
| 15 |
-
ということで"学習率"についてはemonavi系のみに関わる話ではありませんが、他のoptimizer等でも同様のことなので、今後の学習でも気に留めてください。
|
| 16 |
-
|
| 17 |
-
<EmoNaviシリーズとスケジューラーと追加学習>
|
| 18 |
-
|
| 19 |
-
さてそれで、emonavi系ですが、最初に記したように、スケジューラーを設定すれば、コンスタントよりも早期に詳細を獲得できる可能性があります。
|
| 20 |
-
また、スケジューラーを使用せず"追加学習"で、2回目以降を低学習率で実施することもできます。これは引き継ぎのパラメーター等は不要で簡単に実施できます。ここは他のoptimizerにない簡単化した部分です。
|
| 21 |
-
|
| 22 |
-
emonavi系はコンスタントで回し続けても過学習にはなりません。そこを超えないように調整しているためです。そのため同じ学習率で回し続けても必要以上の学習はしません。そこを超えると過学習領域に近いという判断をします。概要の習得後は何も習得しないのではなく必要な分のみ習得するので、概要の習得前に比べ学習量が少ないため進行しないように見えるだけです。
|
| 23 |
-
|
| 24 |
-
もし学習が一定のところから進まない、と感じたときは、追加学習で低学習率へ変更してください。そうすると詳細を一気に吸収し始めます。
|
| 25 |
-
|
| 26 |
-
<謝意>
|
| 27 |
-
emonavi系に限らず、この説明で学習設定のノウハウの獲得に寄与できれば嬉しいです。皆さまのお役に立てれば幸いです。最後までご覧いただきありがとうございました。
|
| 28 |
-
|
| 29 |
-
<追記>
|
| 30 |
-
学習率について実感を得られるように、わかりやすく伝えたい、と思っています。
|
| 31 |
-
|
| 32 |
-
学習率とは、読書の速さにも置き換えられると思います、
|
| 33 |
-
学習率高:飛ばし読み(速読)、
|
| 34 |
-
学習率低:熟読(精読)、と想像してください。
|
| 35 |
-
スケジューラーはこれを学習予定として管理します
|
| 36 |
-
|
| 37 |
-
emonavi は shadow の機能で、モデル自身の復習や振り返りを促し学習を進行します
|
| 38 |
-
進み方を外部に決めさせ自律で補うか、自律のみに任せるか、になります。
|
| 39 |
-
|
| 40 |
-
他にも、以下のように例えることも可能です、
|
| 41 |
-
学習率高:遠くからの撮影=概要(細部はあいまい)
|
| 42 |
-
学習率低:寄りで撮影=詳細(細部を正確に)
|
| 43 |
-
オートフォーカス:スケジューラー、shadow、
|
| 44 |
-
|
| 45 |
-
別視点からも説明しますと、細部表現を獲得したい場合は教師データを増やす、ことでも可能です。
|
| 46 |
-
繰り返し数が増加することで細部の特徴も少しづつ蓄積される、となります。
|
| 47 |
-
ただし、色についてはVAEの性能に依拠する部分が多く、これを正しく反映できる教師データか、VAEの性能の向上しかありません。
|
| 48 |
-
|
| 49 |
-
それとですね、shadow はオートフォーカスでもありますが、これは学習の振り返り、復習をするもので、自分自身の経験に学ぶ仕組みです、
|
| 50 |
-
ですから、特徴を捉えて学んで、別の特徴を見つけて、、を繰り返す、その結果としてフォーカスもピントも変化し続けるようになります。
|
| 51 |
-
|
| 52 |
-
以上となります。追記も最後までご覧頂いてありがとうございました。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|