@etemiz on Hugging Face: "how to expand your dataset (of articles) without changing the ideas in it? i…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Jan 9

Post

2056

how to expand your dataset (of articles) without changing the ideas in it?

i was doing CPT for a while and got decent results. but what if i want to go for perfection? cover all the areas of misalignment using limited datasets. i have to find a way to multiply the material to successfully combat the material of the rest of the internet.

i want to generate SFT datasets but only on controversial topics, because i have to be efficient with limited resources. first i give a smart LLM a 'ground truth' text. then i give it the following prompts:

- You are a highly skilled academic analyst.

- Analyze this text and find 3 bold claims that could cause controversy and division in public. List the claims and also state why they are debatable. Give numbers to the claims.

- Convert these claims into binary questions (that could be answered by yes/no or this/that).

- Now put these questions in a json format. Please also add the info about which of the answers concur with the original text and the question number.

- Write some supporting arguments for 1st question, with respect to the original text, concurring and confirming the original text. 
There must be about 300 words. You should not mention the text, write it as if you are the one answering the question.

the result is questions and answers with more words along the same ideas. a few sentences of opinions in the beginning, is expanded to lots of words. using this method i can multiply billions of tokens to tens of billions probably and have a more effective training.

next i should do RL maybe. LLMs seem to have all kinds of ideas already installed, yet they don't have the intuition to know which one is true. they can give you a ton of reasons to support anything. given the proper incentives, LLMs then should evolve towards supporting aligned ideas more. the rewards will be like guidance that will kick an LLM towards better answers.

RFTSystems

Jan 9

Hi there,
you can expand it, but the “turn 3 sentences into 300 words” trick mostly gives you more tokens, not more learning. It teaches the model to waffle confidently.

If you want to multiply data without changing the ideas, do this instead (simple + practical):

Don’t just expand words — expand coverage. For each “ground truth” text, generate lots of different tasks:

yes/no Qs + the correct label

“is this supported by the text? (yes/no/not enough info)”

extract the exact line that proves it (evidence span)

rewrite the claim in 5 different ways (hard rephrasing)

“spot what’s wrong with this answer” (hallucination / overclaim / missing evidence)

For controversial stuff, generate both sides. If you only generate supporting arguments, you’re basically training it to rationalise. Better format:
Question → Answer → Evidence → Strongest counterpoint → What we don’t know
That keeps the “idea” but trains honesty and restraint.

Add “not enough info” examples on purpose. This is the big one. Most models fail by answering anyway.

Dedup hard. Synthetic data repeats itself fast. If your outputs look similar, the model just memorises the style.

RL / rewards: don’t reward “aligned ideas”. Reward good behaviour: cites evidence, admits uncertainty, doesn’t invent facts, fairly summarises the other side. Otherwise RL just makes it more stubborn.

That’s basically the efficient path: more labels + constraints + adversarial cases, not longer essays.

hope this is helpful,
RFTSystems, Liam.

etemiz

Jan 10

Thanks, this is insightful.

I liked the "rewrite the claim in 5 different ways". Can be really useful for RAG scenarios.

I liked the idea of detecting hallucination using another aligned LLM, though i don't know how effective it will be.

"not enough info" is probably the hardest. Most LLMs today are trained to say anything rather than being humble, as you said.

In this post