CofeAI
/

Tele-FLM-1T

@@ -2,10 +2,11 @@
 license: apache-2.0
 ---
-# Tele-FLM
 Tele-FLM-1T (aka FLM-2-1T) is a 1T open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgement capabilities.
-Built upon the decoder-only transformer architecture, it has been trained on approximately 2T tokens.
-Tele-FLM series demonstrate superior performances at its scale, and sometimes surpass larger models.
 In addition to sharing the model weights, we provide the core designs, engineering practices, and training details, anticipating their benefits for both academic and industrial communities.
 ## Model Details
@@ -38,7 +39,7 @@ Based on growth technology, the Tele-FLM-1T model training is divided into three
 - Input and output multiplier
 Consequently, Tele-FLM-1T is largely compatible with Llama architecturally.
-To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
 | Models        | layer<br>number | attention<br>heads | hidden<br>size | ffn hidden<br>size | vocab<br>size | context<br>length | params<br>count |
@@ -56,8 +57,8 @@ All nodes are interconnected via InfiniBand (IB). The training process lasted ar
 ### Software
-Tele-FLM utilizes 3D parallel training, combining the prevailing methodologies: data parallelism, tensor parallelism, and pipeline parallelism.
-The parallel training setup for Tele-FLM is configured as follows: tensor parallel=32, pipeline parallel=28, and data parallel=1.
 ### Relate Work
 [Tele-FLM (52B)](https://huggingface.co/CofeAI/Tele-FLM)

 license: apache-2.0
 ---
+# Tele-FLM-1T
 Tele-FLM-1T (aka FLM-2-1T) is a 1T open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgement capabilities.
+Built upon the decoder-only transformer architecture, it has been trained on approximately 2.3T tokens.
+Tele-FLM-1T, currently the largest size among Tele-FLM series, is build upon Tele-FLM (52B) with superior performances at its scale, is capable of dealing with even harder tasks with better performances in all likelihood.
+For now, it's still under evaluation due to limited computing resouces.
 In addition to sharing the model weights, we provide the core designs, engineering practices, and training details, anticipating their benefits for both academic and industrial communities.
 ## Model Details
 - Input and output multiplier
 Consequently, Tele-FLM-1T is largely compatible with Llama architecturally.
+To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM-1T and released it as open source.
 | Models        | layer<br>number | attention<br>heads | hidden<br>size | ffn hidden<br>size | vocab<br>size | context<br>length | params<br>count |
 ### Software
+Tele-FLM-1T utilizes 3D parallel training, combining the prevailing methodologies: data parallelism, tensor parallelism, and pipeline parallelism.
+The parallel training setup for Tele-FLM-1T is configured as follows: tensor parallel=32, pipeline parallel=28, and data parallel=1.
 ### Relate Work
 [Tele-FLM (52B)](https://huggingface.co/CofeAI/Tele-FLM)