Spaces:
Sleeping
Sleeping
File size: 2,666 Bytes
493de78 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | # MelodyDeterminism Patch (GPU + determinism + benchmark)
## Cosa include
- Backend NumPy/CuPy con selezione automatica e RNG deterministico (Philox/PCG64).
- Riduzioni deterministiche (TreeFixed, KahanFixed), softmax robusta, sampling canonico.
- Metadati di tolleranza (max_abs_err / max_rel_err).
- Benchmark di overhead per batch `n` e vocab `v` su CPU/GPU.
- Test edge (logit estremi, maschere, dtypes, invarianti).
## Setup rapido
1. Copia `core/` e `tests/` nel tuo Space/Repo.
2. Unisci `requirements.txt` (aggiungi CuPy se usi GPU in Space).
3. In `app.py`, importa e usa le funzioni per la tua UI Gradio.
4. **Space hardware**: imposta una GPU (es. T4/A10) su Hugging Face.
## Gradio (snippet)
```python
import gradio as gr
from core.backend import set_seed, backend_name
from core.bench import bench_suite
from core.softmax import softmax_canonical
from core.sampling import sample_canonical
from core.metrics import tol_stats
from core.deterministic import reduce_tree_fixed, sum_kahan_fixed
def run_suite(seed, n, v, dtype):
import numpy as np
set_seed(int(seed))
# Input sintetico
from core.backend import xp
x = xp.random.standard_normal((int(n), int(v))).astype(getattr(xp, dtype))
# una riga di esempio per le tolleranze
p = softmax_canonical(x[0])
idx = sample_canonical(p, seed=seed, token_idx=0)
stats = {"backend": backend_name(), "token0": int(idx)}
return stats
with gr.Blocks(theme=gr.themes.Soft()) as demo:
with gr.Tab("Deterministic"):
seed = gr.Number(42, precision=0, label="Seed")
n = gr.Slider(1, 64, 8, step=1, label="Batch n")
v = gr.Dropdown([1024, 8192, 32768], value=8192, label="Vocab v")
dtype = gr.Radio(["float32", "float64"], value="float32", label="dtype")
run = gr.Button("Esegui suite")
out = gr.JSON(label="Output + metadata")
run.click(run_suite, [seed, n, v, dtype], [out])
with gr.Tab("Benchmark"):
runb = gr.Button("Benchmark")
table = gr.Dataframe(headers=["n","v","t_std_ms","t_can_ms","overhead_pct"], label="Latenze (ms)")
def _bench():
return bench_suite()
runb.click(_bench, outputs=[table])
# demo.queue(concurrency_count=2, max_size=8).launch()
```
## Note deterministiche
- RNG: Philox (GPU) / PCG64 (CPU) con mapping u→searchsorted(CDF, side='left').
- BLAS: per i test forziamo OMP/MKL threads = 1 per ridurre variabilità.
- Dtypes: preferisci float32; per softmax/riduzioni usiamo accumulo float64.
## Policy tie-break
`searchsorted(..., side='left')` ⇒ tie-break verso min-id in caso di parità della CDF.
## Esecuzione test
```
pytest -q
```
|