File size: 1,595 Bytes
3da1fb5 10113eb 3da1fb5 10113eb 3da1fb5 10113eb 3da1fb5 10113eb 3da1fb5 10113eb 3da1fb5 10113eb 3da1fb5 10113eb 3da1fb5 10113eb 3da1fb5 10113eb 3da1fb5 10113eb 3da1fb5 10113eb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
library_name: transformers
tags:
- language-model
license: odc-by
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
---
# Model Card for AICrossSim/clm-400m
A 60M parameter language model trained on `22 * 400M` tokens from FineWeb-Edu dataset.
## Model Details
aixsim-400M is a transformer-based language model with approximately 400 million parameters (embedding layer params excluded).
It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.
- **Developed by:** AICrossSim
- **Funded by:** [ARIA](https://www.aria.org.uk/)
- **Model type:** Transformer Language Model
- **Language(s) (NLP):** English
- **Tokenizer:** [HuggingFaceTB/cosmo2-tokenizer](https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer)
- **Repository:** [AICrossSim/NewComputeBench](https://github.com/AICrossSim/NewComputeBench)
## Training Details
Experiment setup and training logs can be found at [wandb run](https://wandb.ai/cz98/torchtitan/runs/cic7m3cx?nw=nwusercz98).
## Usage
```python
import transformers
model_name="AICrossSim/clm-400m"
model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
```
## lm-evaluation-harness
| Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr|
|--------|------:|------|-----:|---------------|---|------:|---|------|
|wikitext| 2|none | 0|bits_per_byte |↓ | 0.9886|± | N/A|
| | |none | 0|byte_perplexity|↓ | 1.9843|± | N/A|
| | |none | 0|word_perplexity|↓ |39.0317|± | N/A| |