Text Generation
PEFT
Safetensors
Transformers
English
lora
Safetensor
conversational

Negative sampling: teach what not to do

#44
by jbakerx - opened

Add pairs where the “bad” output contains:

modern slang
excessive repetition
shallow clichés
Then train preference (DPO) to push away from them.

Sign up or log in to comment