@etemiz on Hugging Face: "I realized when I ask longer answers to my questions, the models sometimes…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Dec 27, 2025

Post

1811

I realized when I ask longer answers to my questions, the models sometimes produce completely opposite answer. What could be the reason?

I do mostly CPT. Should I convert my dataset to SFT and give longer reasonings too for it to have integrity?

Example: Is the yolk of an egg more beneficial or the white? Answer in 100 words.

Answer: Yolk is more beneficial because ..........

Example: Is the yolk of an egg more beneficial or the white? Answer in 500 words.

Answer: White is more beneficial because ..........

Edit: These happen in temp = 0.0

martinsu

Dec 28, 2025

Since its first token groups(Yolk|White), that sets whole story(in Your example), obviously problem is too high temperature, if one desires more deterministic outcome - set it lower. And since those (Yolk|White) groups are predicted somewhere at start, it doesn't matter for how long the predicted tokens go(how long is output). Next tokens already attend to them and thus generate some "reasoning" that enforce that, as it was instruction-trained to do.

etemiz

Dec 28, 2025

Thanks for the input but these happened all when temp = 0.0

My guess is, since I use mostly datasets generated from voice, the models are one thing when they are talking like a human in day to day life, but completely opposite when they are feeling like a scientist, producing a long text..

Doctor-Chad-PhD

Dec 29, 2025

•

edited Dec 29, 2025

I think SFT would help a lot as you suspected.

The way I see it is that it's actually succeeding at what CPT is good at (pattern matching). Meaning, somewhere in the data set there is data that actually favors White over Yolk and somewhere in your data Yolk is being preferred over White. It doesn't even have the be that obviously defined, but could be indirect.

So what I think happens is this:

Short question (100 words) ===> Matches pattern from Q&A sites and FAQ sections (just as example) ===> This data mentions yolk wins

Long question (500 words) ===> Matches pattern from blog posts and academic articles (also just examples) ===> This data mentions whites wins

So besides cleaning up the data, which is really kind of out of scope because you'd be babysitting your data for every possible length/answer. I think SFT will help.

With SFT it doesn't just learn the patterns but what humans prefer, which is consistency across length. It's basically statistical correlation with CPT vs behavioral alignment with SFT.

There's also a thing called attention drift that you may want to look into, it can be helpful.

etemiz

Dec 31, 2025

Thanks for the tips.
Is giving different answers for different lengths a bad "behavior" and related to SFT than CPT?
Also, should I give two sets of queries and answers in the context (one short one long) to make it learn that when the length changes, the answer should be parallel?
This could be RL too, like bad behavior of non integrity can be penalized...
Is it normal practice to do 2 rounds of questions in SFT or RL?

In this post