LangTech-MT
/

salamandraTA-7b-instruct-WMT25

@@ -331,8 +331,8 @@ Click the expand button below to see the full list of language pairs and the dat
 This model has been fine-tuned on ~51k instructions, primarily targeting machine translation performance for the languages pairs featured in the WMT 2025 shared task.
 The corpus used for the instruction tuning round was built to focus on paragraph-level translation, context-aware machine translation, and sentence-level translation.
-To construct paragraph-level data, we source from [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus), [NTREX](https://github.com/MicrosoftTranslator/NTREX), and [News-Commentary](https://opus.nlpl.eu/News-Commentary/corpus/version/News-Commentary).
-The paragraph-level data were constructed by concatenating adjacent sentences (randomly grouping 2, 3, or 4) from the same article or document in FLORES-dev, NTREX, and News Commentary.
 Serbian Cyrillic data from FLORES-dev was transliterated into Serbian Latin.
 In addition, we included data from [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.2) that we considered relevant to our tasks.
@@ -374,20 +374,20 @@ Click the expand button below to see the full list of tasks included in the fine
 | en → cs           | 58     | Paragraph-level             | [NTREX](https://github.com/MicrosoftTranslator/NTREX) |
 | cs → de           | 58     | Paragraph-level             | [NTREX](https://github.com/MicrosoftTranslator/NTREX) |
 | en → ru           | 50     | Paragraph-level             | [NTREX](https://github.com/MicrosoftTranslator/NTREX) |
-| en → ar           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → bho          | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → ja           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → uk           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| cs → uk           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| ja → zh           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → zh           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → ko           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → et           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → is           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → sh           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → cs           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| cs → de           | 30     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
-| en → ru           | 21     | Paragraph-level             | [FLORES-200-dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
 | **Total**         | **50,841** |                             |                   |

 This model has been fine-tuned on ~51k instructions, primarily targeting machine translation performance for the languages pairs featured in the WMT 2025 shared task.
 The corpus used for the instruction tuning round was built to focus on paragraph-level translation, context-aware machine translation, and sentence-level translation.
+To construct paragraph-level data, we source from [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus), [NTREX](https://github.com/MicrosoftTranslator/NTREX), and [News-Commentary](https://opus.nlpl.eu/News-Commentary/corpus/version/News-Commentary).
+The paragraph-level data were constructed by concatenating adjacent sentences (randomly grouping 2, 3, or 4) from the same article or document in Flores+200 dev, NTREX, and News Commentary.
 Serbian Cyrillic data from FLORES-dev was transliterated into Serbian Latin.
 In addition, we included data from [TowerBlocks](https://huggingface.co/datasets/Unbabel/TowerBlocks-v0.2) that we considered relevant to our tasks.
 | en → cs           | 58     | Paragraph-level             | [NTREX](https://github.com/MicrosoftTranslator/NTREX) |
 | cs → de           | 58     | Paragraph-level             | [NTREX](https://github.com/MicrosoftTranslator/NTREX) |
 | en → ru           | 50     | Paragraph-level             | [NTREX](https://github.com/MicrosoftTranslator/NTREX) |
+| en → ar           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en → bho          | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en → ja           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en → uk           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| cs → uk           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| ja → zh           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en → zh           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en �� ko           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en → et           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en → is           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en → sh           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en → cs           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| cs → de           | 30     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
+| en → ru           | 21     | Paragraph-level             | [Flores+200 dev](https://huggingface.co/datasets/openlanguagedata/flores_plus) |
 | **Total**         | **50,841** |                             |                   |