Negative sampling: teach what not to do

#44
by jbakerx - opened

Add pairs where the “bad” output contains:

modern slang
excessive repetition
shallow clichés
Then train preference (DPO) to push away from them.

Sign up or log in to comment