Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.8.0
MelodyDeterminism Patch (GPU + determinism + benchmark)
Cosa include
- Backend NumPy/CuPy con selezione automatica e RNG deterministico (Philox/PCG64).
- Riduzioni deterministiche (TreeFixed, KahanFixed), softmax robusta, sampling canonico.
- Metadati di tolleranza (max_abs_err / max_rel_err).
- Benchmark di overhead per batch
ne vocabvsu CPU/GPU. - Test edge (logit estremi, maschere, dtypes, invarianti).
Setup rapido
- Copia
core/etests/nel tuo Space/Repo. - Unisci
requirements.txt(aggiungi CuPy se usi GPU in Space). - In
app.py, importa e usa le funzioni per la tua UI Gradio. - Space hardware: imposta una GPU (es. T4/A10) su Hugging Face.
Gradio (snippet)
import gradio as gr
from core.backend import set_seed, backend_name
from core.bench import bench_suite
from core.softmax import softmax_canonical
from core.sampling import sample_canonical
from core.metrics import tol_stats
from core.deterministic import reduce_tree_fixed, sum_kahan_fixed
def run_suite(seed, n, v, dtype):
import numpy as np
set_seed(int(seed))
# Input sintetico
from core.backend import xp
x = xp.random.standard_normal((int(n), int(v))).astype(getattr(xp, dtype))
# una riga di esempio per le tolleranze
p = softmax_canonical(x[0])
idx = sample_canonical(p, seed=seed, token_idx=0)
stats = {"backend": backend_name(), "token0": int(idx)}
return stats
with gr.Blocks(theme=gr.themes.Soft()) as demo:
with gr.Tab("Deterministic"):
seed = gr.Number(42, precision=0, label="Seed")
n = gr.Slider(1, 64, 8, step=1, label="Batch n")
v = gr.Dropdown([1024, 8192, 32768], value=8192, label="Vocab v")
dtype = gr.Radio(["float32", "float64"], value="float32", label="dtype")
run = gr.Button("Esegui suite")
out = gr.JSON(label="Output + metadata")
run.click(run_suite, [seed, n, v, dtype], [out])
with gr.Tab("Benchmark"):
runb = gr.Button("Benchmark")
table = gr.Dataframe(headers=["n","v","t_std_ms","t_can_ms","overhead_pct"], label="Latenze (ms)")
def _bench():
return bench_suite()
runb.click(_bench, outputs=[table])
# demo.queue(concurrency_count=2, max_size=8).launch()
Note deterministiche
- RNG: Philox (GPU) / PCG64 (CPU) con mapping u→searchsorted(CDF, side='left').
- BLAS: per i test forziamo OMP/MKL threads = 1 per ridurre variabilità.
- Dtypes: preferisci float32; per softmax/riduzioni usiamo accumulo float64.
Policy tie-break
searchsorted(..., side='left') ⇒ tie-break verso min-id in caso di parità della CDF.
Esecuzione test
pytest -q