|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- language-model |
|
|
license: odc-by |
|
|
datasets: |
|
|
- HuggingFaceFW/fineweb-edu |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Card for AICrossSim/clm-60m |
|
|
|
|
|
A 60M parameter language model trained on `22 * 60M` tokens from FineWeb-Edu dataset. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
aixsim-60M is a transformer-based language model with approximately 60 million parameters (embedding layer params excluded). |
|
|
It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset. |
|
|
|
|
|
- **Developed by:** AICrossSim |
|
|
- **Funded by:** [ARIA](https://www.aria.org.uk/) |
|
|
- **Model type:** Transformer Language Model |
|
|
- **Language(s) (NLP):** English |
|
|
- **Tokenizer:** [HuggingFaceTB/cosmo2-tokenizer](https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer) |
|
|
- **Repository:** [AICrossSim/NewComputeBench](https://github.com/AICrossSim/NewComputeBench) |
|
|
|
|
|
## Training Details |
|
|
|
|
|
Experiment setup and training logs can be found at [wandb run](https://wandb.ai/cz98/torchtitan/runs/7kttp3qt?nw=nwusercz98). |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import transformers |
|
|
|
|
|
model_name="AICrossSim/clm-60m" |
|
|
model = transformers.AutoModelForCausalLM.from_pretrained(model_name) |
|
|
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name) |
|
|
``` |
|
|
|
|
|
## lm-evaluation-harness |
|
|
|
|
|
| Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr| |
|
|
|--------|------:|------|-----:|---------------|---|-------:|---|------| |
|
|
|wikitext| 2|none | 0|bits_per_byte |↓ | 1.6693|± | N/A| |
|
|
| | |none | 0|byte_perplexity|↓ | 3.1806|± | N/A| |
|
|
| | |none | 0|word_perplexity|↓ |486.5306|± | N/A| |