43 16 39

instruction-pretrain

https://huggingface.co/papers/2406.14491

DaixuanC45443

AI & ML interests

Synthetic Instructions for Pre-Training

Recent Activity

updated a dataset 11 days ago

instruction-pretrain/medicine-instruction-augmented-corpora

updated a dataset 11 days ago

instruction-pretrain/general-instruction-augmented-corpora

updated a model 11 days ago

instruction-pretrain/medicine-Llama3-8B

View all activity

Organizations

None yet

updated 2 datasets 11 days ago

instruction-pretrain/medicine-instruction-augmented-corpora

Preview • Updated 11 days ago • 145 • 13

instruction-pretrain/general-instruction-augmented-corpora

Preview • Updated 11 days ago • 30.2k • 20

updated 5 models 11 days ago

updated a dataset 19 days ago

instruction-pretrain/ft-instruction-synthesizer-collection

Viewer • Updated 19 days ago • 249k • 225 • 63

New activity in instruction-pretrain/general-instruction-augmented-corpora 19 days ago

Cannot download ALL data files

#11 opened over 1 year ago by

amezasor

upvoted a paper about 2 months ago

LLM-in-Sandbox Elicits General Agentic Intelligence

Paper • 2601.16206 • Published Jan 22 • 85

upvoted a paper 6 months ago

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 117

upvoted a paper 9 months ago

Reasoning with Exploration: An Entropy Perspective

Paper • 2506.14758 • Published Jun 17, 2025 • 30

New activity in instruction-pretrain/finance-Llama3-8B 10 months ago

How large is the corpus size used for pretraining the finance LLaMA?

#2 opened over 1 year ago by

dhkong

updated a dataset about 1 year ago

AdaptLLM/food-visual-instructions

Viewer • Updated Aug 21, 2025 • 301k • 65 • 3

liked 2 datasets about 1 year ago

tttx/r1-arcagi-successful-trajectories

Viewer • Updated Feb 2, 2025 • 1.46k • 8 • 3

INK-USC/riddle_sense

Updated Jan 18, 2024 • 294 • 26

upvoted a paper about 1 year ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 96

New activity in instruction-pretrain/ft-instruction-synthesizer-collection about 1 year ago

Query about the size of dataset

#4 opened over 1 year ago by

Applauz

New activity in instruction-pretrain/instruction-synthesizer about 1 year ago

Continued pre-training with replay?

#4 opened about 1 year ago by

ostapeno

upvoted a paper about 1 year ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53

instruction-pretrain

AI & ML interests

Recent Activity

Organizations