How is the 50k high quality dataset created?

by Yhyu13 - opened Oct 7, 2023

Oct 7, 2023

•

edited Oct 7, 2023

Would you like to elaborate more on how the 50k high quality documentation answering dataset is created?

Are they bootstrapped from handcrafted questions that are commonly used in DocsGPT, and then used answers generated by e.g. gpt4 or claude2 to pair up a set of Q&As, or are they human generated answers?

I am a bit astonished by the 50k quantities, you usually can hardly find such amount of domain specific data for LoRA fine-tuning.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment