Add continuous evaluation + regression tests

#47
by jbakerx - opened

Run the same 50–100 prompt suite

track perplexity, repetition, toxicity/anachronism rate, and human preference sample
This prevents silent quality regressions.

Sign up or log in to comment