File size: 1,595 Bytes
dfdc8df
 
c077538
 
 
 
4b7e291
c077538
 
dfdc8df
 
9c241e7
dfdc8df
f2c3f1a
dfdc8df
 
 
1213e01
 
dfdc8df
c077538
882e462
c077538
 
f2c3f1a
 
dfdc8df
 
 
f2c3f1a
dfdc8df
7855acc
 
 
 
 
 
1213e01
7855acc
 
 
f2c3f1a
dfdc8df
f2c3f1a
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
library_name: transformers
tags:
- language-model
license: odc-by
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
---

# Model Card for AICrossSim/clm-60m

A 60M parameter language model trained on `22 * 60M` tokens from FineWeb-Edu dataset.

## Model Details

aixsim-60M is a transformer-based language model with approximately 60 million parameters (embedding layer params excluded). 
It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.

- **Developed by:** AICrossSim
- **Funded by:** [ARIA](https://www.aria.org.uk/)
- **Model type:** Transformer Language Model
- **Language(s) (NLP):** English
- **Tokenizer:** [HuggingFaceTB/cosmo2-tokenizer](https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer)
- **Repository:** [AICrossSim/NewComputeBench](https://github.com/AICrossSim/NewComputeBench)

## Training Details

Experiment setup and training logs can be found at [wandb run](https://wandb.ai/cz98/torchtitan/runs/7kttp3qt?nw=nwusercz98).

## Usage

```python
import transformers

model_name="AICrossSim/clm-60m"
model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
```

## lm-evaluation-harness

| Tasks  |Version|Filter|n-shot|    Metric     |   | Value  |   |Stderr|
|--------|------:|------|-----:|---------------|---|-------:|---|------|
|wikitext|      2|none  |     0|bits_per_byte  |↓  |  1.6693|±  |   N/A|
|        |       |none  |     0|byte_perplexity|↓  |  3.1806|±  |   N/A|
|        |       |none  |     0|word_perplexity|↓  |486.5306|±  |   N/A|