minilingua-ai
/

MiniLingua-1b-Instruct

Model card Files Files and versions

aaaksenova commited on Jul 27, 2025

Commit

85bdb11

·

1 Parent(s): 8b525b0

Update README.md

Files changed (1) hide show

README.md +52 -3

README.md CHANGED Viewed

@@ -1,3 +1,52 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# MiniLingua-1b-Instruct
+**MiniLingua-1b-Instruct** is an instruction-tuned multilingual model based on the [MiniLingua-1b](https://huggingface.co/minilingua-ai/MiniLingua-1b) base model. It supports a diverse set of European languages and programming code, making it suitable for instruction-following, multilingual generation, and downstream tasks like question answering, summarisation etc.
+## Supported Languages
+- Bulgarian
+- Czech
+- Dutch
+- English
+- Finnish
+- French
+- German
+- Greek
+- Italian
+- Polish
+- Portuguese
+- Spanish
+- Swedish
+- Programming code
+## Instruction Tuning
+This preview instruction-tuned version of MiniLingua-1b was trained over 1 epoch on 1.2 million instructions from the following high-quality datasets:
+- [CohereLabs/aya_collection_language_split](https://huggingface.co/datasets/CohereLabs/aya_collection_language_split)
+- [MBZUAI/Bactrian-X](https://huggingface.co/datasets/MBZUAI/Bactrian-X)
+- [GAIR/lima](https://huggingface.co/datasets/GAIR/lima)
+- [bigcode/self-oss-instruct-sc2-exec-filter-50k](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k)
+- [minilingua-ai/mcqa-minilingua-sft](https://huggingface.co/datasets/minilingua-ai/mcqa-minilingua-sft)
+The supervised fine-tuning (SFT) was performed on the [Triton Aalto cluster](https://scicomp.aalto.fi/triton/) using 4 H200 GPUs.
+## Intended Use
+This model is a **preview release** intended for:
+- Multilingual instruction following
+- Evaluation and benchmarking
+- Research in low- and high-resource European languages
+## Limitations
+- This version is a first-stage SFT release; alignment steps is not applied.
+- Some languages may show uneven instruction-following ability depending on resource availability and instruction diversity.
+---
+**License**: Apache-2.0