Pclanglais commited on
Commit
61be651
·
verified ·
1 Parent(s): 7fdc93d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - it
6
+ - de
7
+ - es
8
+ - pl
9
+ license: apache-2.0
10
+ pipeline_tag: text-generation
11
+ tags:
12
+ - transformers
13
+ library_name: transformers
14
+ ---
15
+
16
+ # ⚛️ Monad
17
+
18
+ <div align="center">
19
+ <img src="figures/pleias.jpg" width="60%" alt="Pleias" />
20
+ </div>
21
+
22
+ <p align="center">
23
+ <a href="https://pleias.fr/blog/blogsynth-the-new-data-frontier"><b>Blog announcement</b></a>
24
+ </p>
25
+
26
+ **Monad** is a 56 million parameters generalist Small Reasoning Model, trained on 200 billions tokens from <a href="https://huggingface.co/PleIAs/Baguettotron">SYNTH</a>, a fully open generalist dataset.
27
+
28
+ As of 2025, Monad is the best contender for the smallest viable language models. Despite being less than half of gpt-2, Monad not answers in consistent English but perform significanly beyond chance on MMLU and other major industry benchmarks.
29
+
30
+ <p align="center">
31
+ <img width="80%" src="figures/training_efficiency.jpeg">
32
+ </p>
33
+
34
+ Monad's name is a reference to Leibniz concept and general idea of the smallest possible unit of intelligence.
35
+
36
+ ## Features
37
+ Monad has been natively trained for instructions with thinking traces. We implemented a series of dedicated pipelines for:
38
+ * Memorization of encyclopedic knowledge (50,000 vital articles from Wikipedia), though in this size range hallucinations have to be expected.
39
+ * Retrieval-Augmented Generation with grounding (following on our initial experiments with Pleias-RAG series)
40
+ * Arithmetic and simple math resolution problem
41
+ * Editing tasks
42
+ * Information extraction
43
+ * Creative writing, including unusual synthetic exercises like lipograms or layout poems.
44
+
45
+ Monad is strictly monolingual in English. We trained a new custom tokenizer (likely one of the smallest tokenizer to date, less than 8,000 individual tokens), exclusively trained on SYNTH so that we maintain a relatively good compression ratio.
46
+
47
+ ## Model design and training
48
+ Monad is a 56M parameters decoders with a standard Qwen/Llama-like design, except for its extremely compact size and overall opiniated architecture for depth (with 64 layers)
49
+ <p align="center">
50
+ <img width="80%" src="figures/baguettotron_structure.png">
51
+ </p>
52
+
53
+ Monad was trained on 16 h100 from Jean Zay (compute plan n°A0191016886). Full pre-training took a bit less than 6 hours.
54
+
55
+ ## Evaluation
56
+ Monad attains performance on MMLU significantly beyond chance with close to 30% of positive rate. We also find non-random results on gsm8k (8%) and HotPotQA (8%)
57
+
58
+ To our knowledge, there is no model remotely close in this size range for evaluation comparison. Spiritually and practically, Monad remains unique.
59
+
60
+ ## Use and deployment
61
+ Monad has been trained on the standard instruction style from Qwen.
62
+
63
+ ```xml
64
+ <|im_start|>user
65
+ Who are you?<|im_end|>
66
+ <|im_start|>assistant
67
+ <think>``
68
+
69
+ Monad has no support yet for multi-turn.
70
+
71
+ A major envisioned use case for Monad is explainability, as the model does provide a unique trade-off between observability and actual reasoning performance.
72
+