| --- |
| tags: |
| - pytorch |
| - safetensors |
| license: mit |
| --- |
| |
| # dm_qwen4b_noise_emulator |
| |
| Laplacian kernel regression noise model for `std_math` prediction in data mixture optimization. |
|
|
| Predicts per-configuration std_math (across seeds) given data mixture proportions, |
| used as a heteroscedastic noise model in Bayesian optimization. |
| |
| ## Architecture |
| |
| - Kernel: Laplacian `K(x, x') = exp(-γ · ||x - x'||₁)` |
| - Support points: 50 training configs |
| - Input features (3): `[if_prop1, math_prop1, math_prop2]` (values in [0, 1]) |
| - Output: predicted std_math (scalar) |
| |
| ## Usage |
| |
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
| from safetensors.torch import load_file |
| |
| |
| class KernelRegressionModel(torch.nn.Module): |
| def __init__(self, dual_coef, X_fit, gamma=0.1): |
| super().__init__() |
| self.gamma = gamma |
| self.register_buffer("dual_coef", dual_coef) |
| self.register_buffer("X_fit", X_fit) |
| |
| def forward(self, x): |
| dist = torch.cdist(x, self.X_fit, p=1) |
| K = torch.exp(-self.gamma * dist) |
| return K @ self.dual_coef |
| |
|
|
| path = hf_hub_download("chewwt/dm_qwen4b_noise_emulator", "noise_model.safetensors") |
| tensors = load_file(path) |
| model = KernelRegressionModel(tensors["dual_coef"], tensors["X_fit"]) |
| model.eval() |
| |
| # x: (batch, 3) float64 tensor, features in [0, 1] |
| x = torch.tensor([[0.3, 0.4, 0.2]], dtype=torch.float64) |
| with torch.no_grad(): |
| sigma = model(x) # predicted std_math |
| ``` |
| |