Spaces:
Sleeping
Sleeping
| # MelodyDeterminism Patch (GPU + determinism + benchmark) | |
| ## Cosa include | |
| - Backend NumPy/CuPy con selezione automatica e RNG deterministico (Philox/PCG64). | |
| - Riduzioni deterministiche (TreeFixed, KahanFixed), softmax robusta, sampling canonico. | |
| - Metadati di tolleranza (max_abs_err / max_rel_err). | |
| - Benchmark di overhead per batch `n` e vocab `v` su CPU/GPU. | |
| - Test edge (logit estremi, maschere, dtypes, invarianti). | |
| ## Setup rapido | |
| 1. Copia `core/` e `tests/` nel tuo Space/Repo. | |
| 2. Unisci `requirements.txt` (aggiungi CuPy se usi GPU in Space). | |
| 3. In `app.py`, importa e usa le funzioni per la tua UI Gradio. | |
| 4. **Space hardware**: imposta una GPU (es. T4/A10) su Hugging Face. | |
| ## Gradio (snippet) | |
| ```python | |
| import gradio as gr | |
| from core.backend import set_seed, backend_name | |
| from core.bench import bench_suite | |
| from core.softmax import softmax_canonical | |
| from core.sampling import sample_canonical | |
| from core.metrics import tol_stats | |
| from core.deterministic import reduce_tree_fixed, sum_kahan_fixed | |
| def run_suite(seed, n, v, dtype): | |
| import numpy as np | |
| set_seed(int(seed)) | |
| # Input sintetico | |
| from core.backend import xp | |
| x = xp.random.standard_normal((int(n), int(v))).astype(getattr(xp, dtype)) | |
| # una riga di esempio per le tolleranze | |
| p = softmax_canonical(x[0]) | |
| idx = sample_canonical(p, seed=seed, token_idx=0) | |
| stats = {"backend": backend_name(), "token0": int(idx)} | |
| return stats | |
| with gr.Blocks(theme=gr.themes.Soft()) as demo: | |
| with gr.Tab("Deterministic"): | |
| seed = gr.Number(42, precision=0, label="Seed") | |
| n = gr.Slider(1, 64, 8, step=1, label="Batch n") | |
| v = gr.Dropdown([1024, 8192, 32768], value=8192, label="Vocab v") | |
| dtype = gr.Radio(["float32", "float64"], value="float32", label="dtype") | |
| run = gr.Button("Esegui suite") | |
| out = gr.JSON(label="Output + metadata") | |
| run.click(run_suite, [seed, n, v, dtype], [out]) | |
| with gr.Tab("Benchmark"): | |
| runb = gr.Button("Benchmark") | |
| table = gr.Dataframe(headers=["n","v","t_std_ms","t_can_ms","overhead_pct"], label="Latenze (ms)") | |
| def _bench(): | |
| return bench_suite() | |
| runb.click(_bench, outputs=[table]) | |
| # demo.queue(concurrency_count=2, max_size=8).launch() | |
| ``` | |
| ## Note deterministiche | |
| - RNG: Philox (GPU) / PCG64 (CPU) con mapping u→searchsorted(CDF, side='left'). | |
| - BLAS: per i test forziamo OMP/MKL threads = 1 per ridurre variabilità. | |
| - Dtypes: preferisci float32; per softmax/riduzioni usiamo accumulo float64. | |
| ## Policy tie-break | |
| `searchsorted(..., side='left')` ⇒ tie-break verso min-id in caso di parità della CDF. | |
| ## Esecuzione test | |
| ``` | |
| pytest -q | |
| ``` | |