Parallel synthetic data available?

by xezpeleta - opened Jun 13, 2024

Jun 13, 2024

Hi!

In the description I saw the following:

This model was trained from scratch using Marian NMT on a combination of English-Basque datasets totalling 20,523,431 sentence pairs. 9,033,998 sentence pairs were parallel data collected from the web while the remaining 11,489,433 sentence pairs were parallel synthetic data created using the Google Translate translator

Is the parallel synthetic data (created using Google Translate) available on HF datasets?

Thanks!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment