| license: apache-2.0 | |
| datasets: | |
| - allenai/dolma | |
| # Training run to compare Mixture-of-Depths, Bitnet | |
| [Wandb Report](https://api.wandb.ai/links/tulasiram/pw76q41i) | |
|  | |
| #### 4 Models trained for 100k steps on Dolma | |
| - OLMo-50M - 50M parameter model | |
| - OLMo-50M-bitlinear - 50M parameter bitnet model | |
| - OLMo-50M-mod - 50M parameter mixture-of-depths model | |
| - OLMo-50M-mod-bitlinear - 50M parameter mixture-of-depths bitnet model | |
| Repo has zip files which include training states and other files for each model. I am not the author of the mixture-of-depths implementation, it can be found [here](https://github.com/thepowerfuldeez/OLMo) | |