AlpineLLM-Tiny-10M-Base

A domain-specific language model for alpine storytelling.
Try asking about mountain adventures! 🏔️
Author: Bartek Borzyszkowski

⛏️ AlpineLLM GitHub | 🤗 AlpineLLM Live Demo

About AlpineLLM

AlpineLLM-Tiny-10M-Base is a lightweight base language model with ~10.8 million trainable parameters. It was pre-trained from scratch on raw text corpora drawn primarily from public-domain literature on alpinism, including expedition narratives and climbing essays. Please note that AlpineLLM is a base model, and it has not been fine-tuned for downstream tasks such as summarization or dialogue. Its outputs reflect patterns learned directly from the training texts.

Model Details

Model name and version: AlpineLLM-Tiny-10M-Base
Model type: Language Model (Base Model)
Architecture: TransformerDecoder
Parameters: ~10M
Context window size: 256 tokens
Tokenization: CharacterLevelTokenizer
Vocabulary size: 96 tokens
Training language: English
Training set size: ~1M tokens
Training iterations: ~15K iters

Training Setup

Hardware: 1X NVIDIA RTX A4000
Software: Ubuntu 22.04 with CUDA 12.2 and Python3.10
Pre-training duration: ~1h

For a complete list of dependencies, refer to the source training repository.

Usage

Check out the AlpineLLM Live Demo repository on HuggingFace Spaces for a practical implementation.

Citation

@misc{alpinellm2025,
  title={AlpineLLM: A Domain-Specific Language Model for Alpine Storytelling.},
  author={Borzyszkowski, Bartek},
  year={2025},
  publisher={Hugging Face},
  howpublished={https://github.com/Borzyszkowski/AlpineLLM}
}

Downloads last month: -; Downloads are not tracked for this model. How to track