Sharing training data & reproducing training

by xhluca - opened May 28, 2024

May 28, 2024

Congratulations on the paper and score! Since this was trained on public data, would it be possible for you to release the dataset you used to train on Huggingface? It'd also be great to have a training script to reproduce the training, similar to this training script recently released by LLM2Vec:

xhluca changed discussion title from Training data & running the training to Sharing training data & reproducing training May 28, 2024

ajinkya-tejankar

Jun 12, 2024

It would be great to also have access to the unidirectional models listed in the paper for research purposes. Unidirectional models are not far behind bi-directional ones so it would be great to explore them side-by-side.

jpcorb20

Feb 5, 2025

also interested in it!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment