Run the same 50–100 prompt suite
track perplexity, repetition, toxicity/anachronism rate, and human preference sampleThis prevents silent quality regressions.
· Sign up or log in to comment