Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ This repository contains a series of reference models of varying sizes, released
|
|
| 12 |
|
| 13 |
## 💡 Overview
|
| 14 |
|
| 15 |
-
The core contribution of our paper is the concept of **LLM Density** \\(\rho\\), defined as the ratio of a model's *effective* parameter size \\(\
|
| 16 |
|
| 17 |
The models in this repository serve as that "ruler". We trained a series of six models, ranging from **5 million to 800 million parameters**, on a consistent dataset. By measuring their loss on various benchmarks, we fitted a precise scaling function. This function allows us to take any other LLM, measure its performance, and infer its effective parameter size by seeing where it lands on our reference scale.
|
| 18 |
|
|
|
|
| 12 |
|
| 13 |
## 💡 Overview
|
| 14 |
|
| 15 |
+
The core contribution of our paper is the concept of **LLM Density** \\(\rho\\), defined as the ratio of a model's *effective* parameter size \\(\hat{N}\\) to its *actual* parameter size \\(N\\). To accurately determine a model's effective size, we must first establish a reliable "ruler"—a scaling law that maps training compute to performance on downstream tasks.
|
| 16 |
|
| 17 |
The models in this repository serve as that "ruler". We trained a series of six models, ranging from **5 million to 800 million parameters**, on a consistent dataset. By measuring their loss on various benchmarks, we fitted a precise scaling function. This function allows us to take any other LLM, measure its performance, and infer its effective parameter size by seeing where it lands on our reference scale.
|
| 18 |
|