AICrossSim
/

clm-60m

Text Generation

text-generation-inference

Model card Files Files and versions

clm-60m / README.md

Cheng98's picture

Update README.md

1213e01 verified 10 months ago

|

history blame contribute delete

1.6 kB

	---
	library_name: transformers
	tags:
	- language-model
	license: odc-by
	datasets:
	- HuggingFaceFW/fineweb-edu
	language:
	- en
	---

	# Model Card for AICrossSim/clm-60m

	A 60M parameter language model trained on `22 * 60M` tokens from FineWeb-Edu dataset.

	## Model Details

	aixsim-60M is a transformer-based language model with approximately 60 million parameters (embedding layer params excluded).
	It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.

	- Developed by: AICrossSim
	- Funded by: [ARIA](https://www.aria.org.uk/)
	- Model type: Transformer Language Model
	- Language(s) (NLP): English
	- Tokenizer: [HuggingFaceTB/cosmo2-tokenizer](https://huggingface.co/HuggingFaceTB/cosmo2-tokenizer)
	- Repository: [AICrossSim/NewComputeBench](https://github.com/AICrossSim/NewComputeBench)

	## Training Details

	Experiment setup and training logs can be found at [wandb run](https://wandb.ai/cz98/torchtitan/runs/7kttp3qt?nw=nwusercz98).

	## Usage

	```python
	import transformers

	model_name="AICrossSim/clm-60m"
	model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
	tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
	```

	## lm-evaluation-harness

	\| Tasks \|Version\|Filter\|n-shot\| Metric \| \| Value \| \|Stderr\|
	\|--------\|------:\|------\|-----:\|---------------\|---\|-------:\|---\|------\|
	\|wikitext\| 2\|none \| 0\|bits_per_byte \|↓ \| 1.6693\|± \| N/A\|
	\| \| \|none \| 0\|byte_perplexity\|↓ \| 3.1806\|± \| N/A\|
	\| \| \|none \| 0\|word_perplexity\|↓ \|486.5306\|± \| N/A\|