aaaksenova commited on
Commit
85bdb11
·
1 Parent(s): 8b525b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -3
README.md CHANGED
@@ -1,3 +1,52 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # MiniLingua-1b-Instruct
5
+
6
+ **MiniLingua-1b-Instruct** is an instruction-tuned multilingual model based on the [MiniLingua-1b](https://huggingface.co/minilingua-ai/MiniLingua-1b) base model. It supports a diverse set of European languages and programming code, making it suitable for instruction-following, multilingual generation, and downstream tasks like question answering, summarisation etc.
7
+
8
+ ## Supported Languages
9
+
10
+ - Bulgarian
11
+ - Czech
12
+ - Dutch
13
+ - English
14
+ - Finnish
15
+ - French
16
+ - German
17
+ - Greek
18
+ - Italian
19
+ - Polish
20
+ - Portuguese
21
+ - Spanish
22
+ - Swedish
23
+ - Programming code
24
+
25
+ ## Instruction Tuning
26
+
27
+ This preview instruction-tuned version of MiniLingua-1b was trained over 1 epoch on 1.2 million instructions from the following high-quality datasets:
28
+
29
+ - [CohereLabs/aya_collection_language_split](https://huggingface.co/datasets/CohereLabs/aya_collection_language_split)
30
+ - [MBZUAI/Bactrian-X](https://huggingface.co/datasets/MBZUAI/Bactrian-X)
31
+ - [GAIR/lima](https://huggingface.co/datasets/GAIR/lima)
32
+ - [bigcode/self-oss-instruct-sc2-exec-filter-50k](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k)
33
+ - [minilingua-ai/mcqa-minilingua-sft](https://huggingface.co/datasets/minilingua-ai/mcqa-minilingua-sft)
34
+
35
+ The supervised fine-tuning (SFT) was performed on the [Triton Aalto cluster](https://scicomp.aalto.fi/triton/) using 4 H200 GPUs.
36
+
37
+ ## Intended Use
38
+
39
+ This model is a **preview release** intended for:
40
+
41
+ - Multilingual instruction following
42
+ - Evaluation and benchmarking
43
+ - Research in low- and high-resource European languages
44
+
45
+ ## Limitations
46
+
47
+ - This version is a first-stage SFT release; alignment steps is not applied.
48
+ - Some languages may show uneven instruction-following ability depending on resource availability and instruction diversity.
49
+
50
+ ---
51
+
52
+ **License**: Apache-2.0