--- library_name: transformers tags: - language-model license: odc-by datasets: - HuggingFaceFW/fineweb-edu language: - en --- # Model Card for AICrossSim/clm-200m A 200M parameter language model trained on `22 * 200M` tokens from FineWeb-Edu dataset. ## Model Details aixsim-200M is a transformer-based language model with approximately 200 million parameters (embedding layer params excluded). It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset. - **Developed by:** AICrossSim - **Funded by:** [ARIA](https://www.aria.org.uk/) - **Model type:** Transformer Language Model - **Language(s) (NLP):** English - **Tokenizer:** [HuggingFaceTB/cosmo2-tokenizer](https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer) - **Repository:** [AICrossSim/NewComputeBench](https://github.com/AICrossSim/NewComputeBench) ## Training Details Experiment setup and training logs can be found at [wandb run](https://wandb.ai/cz98/torchtitan/runs/uhnlw6k8?nw=nwusercz98). ## Usage ```python import transformers model_name="AICrossSim/clm-200m" model = transformers.AutoModelForCausalLM.from_pretrained(model_name) tokenizer = transformers.AutoTokenizer.from_pretrained(model_name) ``` ## lm-evaluation-harness | Tasks |Version|Filter|n-shot| Metric | | Value | |Stderr| |--------|------:|------|-----:|---------------|---|------:|---|------| |wikitext| 2|none | 0|bits_per_byte |↓ | 1.0994|± | N/A| | | |none | 0|byte_perplexity|↓ | 2.1427|± | N/A| | | |none | 0|word_perplexity|↓ |58.8531|± | N/A|