Collection of models and dataset related to MixtureVitae, open and fully reproducible pretraining dataset built from permissive sources