caijie12138 commited on
Commit
23aaeac
·
verified ·
1 Parent(s): 75933c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -12,7 +12,7 @@ This repository contains a series of reference models of varying sizes, released
12
 
13
  ## 💡 Overview
14
 
15
- The core contribution of our paper is the concept of **LLM Density** \\(\rho\\), defined as the ratio of a model's *effective* parameter size \\(\ghat N\\) to its *actual* parameter size \\(N\\). To accurately determine a model's effective size, we must first establish a reliable "ruler"—a scaling law that maps training compute to performance on downstream tasks.
16
 
17
  The models in this repository serve as that "ruler". We trained a series of six models, ranging from **5 million to 800 million parameters**, on a consistent dataset. By measuring their loss on various benchmarks, we fitted a precise scaling function. This function allows us to take any other LLM, measure its performance, and infer its effective parameter size by seeing where it lands on our reference scale.
18
 
 
12
 
13
  ## 💡 Overview
14
 
15
+ The core contribution of our paper is the concept of **LLM Density** \\(\rho\\), defined as the ratio of a model's *effective* parameter size \\(\hat{N}\\) to its *actual* parameter size \\(N\\). To accurately determine a model's effective size, we must first establish a reliable "ruler"—a scaling law that maps training compute to performance on downstream tasks.
16
 
17
  The models in this repository serve as that "ruler". We trained a series of six models, ranging from **5 million to 800 million parameters**, on a consistent dataset. By measuring their loss on various benchmarks, we fitted a precise scaling function. This function allows us to take any other LLM, measure its performance, and infer its effective parameter size by seeing where it lands on our reference scale.
18