--- license: mit datasets: - vesteinn/babylm --- # rootxhacker/arthemis-lm Building capable language models shouldn't require massive corporate budgets. While the industry pushes toward increasingly large models, this project explores what's possible with neuromorphic architectures and limited resources. I developed this 155.8M parameter Llama-SNN-LTC model with specific constraints: - Budget limit: Under $50 using Google Colab Pro Plus - From-scratch pretraining with fully open-source dataset - No fine-tuning or synthetic data generation from existing LLMs - Focus on architectural innovation over scale ## Model Details This project incorporates **Spiking Neural Networks (SNNs)** and **Liquid Time Constants (LTCs)** into the Llama architecture, creating a neuromorphic language model. I spent under $50 on Google Colab Pro Plus and used the first 1M samples from the BabyLM challenge dataset, which contains approximately 100M tokens. This model is working on par with google/bert-large-uncased model **Model Type**: Causal Language Model with Neuromorphic Enhancements **Supported Languages**: English **Number of Parameters**: 155.8M **Context Length**: 1024 tokens **Base Architecture**: Llama with SNN/LTC modifications **Training Data**: BabyLM (vesteinn/babylm) - 1M samples (~100M tokens) ### Architecture Features - **Spiking Neural Networks** in attention mechanisms for temporal processing - **Liquid Time Constants** in feed-forward layers for adaptive dynamics - **12-layer transformer backbone** with neuromorphic enhancements - **RoPE positional encoding** for sequence understanding - **Custom surrogate gradient training** for differentiable spike computation Here are my major model configurations: ``` hidden_size = 768 intermediate_size = 2048 num_hidden_layers = 12 num_attention_heads = 12 num_key_value_heads = 12 max_position_embeddings = 1024 vocab_size = 50257 spiking_threshold = 1.0 ltc_hidden_size = 256 ltc_layers = 2 ``` ## Usage ### Install dependencies ```bash pip install transformers torch numpy ``` ## Inference This gist has full code for inference ``` bash https://gist.github.com/harishsg993010/e632de8b15a3ab1ff03e3912f55109ea ``` ## Evaluation I performed evaluation using https://gist.github.com/harishsg993010/e3c31c2d2c8207384ee263627f990300 ### Results Comparison | Model | Params | Budget | HellaSwag | OBQA | WinoGrande | ARC_e | ARC_c | BoolQ | Avg | |-------|--------|--------|-----------|------|------------|-------|-------|-------|-----| | **rootxhacker/arthemis-lm** | **155.8M** | **<$50** | **24.65** | **20.60** | **48.10** | **28.20** | **22.20** | **39.80** | **30.59** | | google/bert-large-uncased | 336M | N/A | 24.53 | 26.20 | 49.80 | 25.08 | 25.68 | 40.86 | 32.03 | ## Observations - **Budget Efficiency**: Our model achieves competitive performance with only ~$50 budget, demonstrating that meaningful language models can be built with limited resources. - **Neuromorphic Advantages**: The SNN-LTC architecture shows particularly strong performance in WinoGrande (48.10%), suggesting enhanced reasoning capabilities from temporal dynamics. - **Parameter Efficiency**: With 155.8M parameters, our model performs comparably to BERT-large-uncased (336M parameters) while being significantly smaller. - **Room for Improvement**: More training data and compute would likely improve performance, but the current results validate the neuromorphic approach. ``` Architecture: Llama + Spiking Neural Networks + Liquid Time Constants Hidden Size: 768 Intermediate Size: 2048 Attention Heads: 12 Layers: 12 Max Position Embeddings: 1024 Vocabulary Size: 50,257 Spiking Threshold: 1.0 LTC Hidden Size: 256 Training Precision: FP32 ``` ## Training Details The model was pretrained from scratch using: - **Dataset**: BabyLM (vesteinn/babylm) - First 1M samples (~100M tokens) - **Hardware**: Google Colab Pro Plus (A100 GPU) - **Training Steps**: 20,000 steps - **Batch Size**: 8 with gradient accumulation - **Learning Rate**: 3e-4 with linear warmup - **Precision**: FP32 for stability with neuromorphic components ### Key Innovations - **Custom SNN Implementation**: Leaky Integrate-and-Fire neurons with surrogate gradients - **Liquid Time Constants**: Adaptive time dynamics in feed-forward layers - **Budget-Conscious Training**: Optimized for maximum performance per dollar spent - **Neuromorphic Language Modeling**: First known integration of SNNs and LTCs in causal LM ## Future Work - Scale to larger datasets with increased compute budget - Explore different spiking neuron models (e.g., Adaptive LIF, Izhikevich) - Implement more sophisticated LTC architectures - Fine-tune for specific downstream tasks - Compare energy efficiency with standard transformers ## Model Sources - **Repository**: [Coming Soon] - **Paper**: [In Progress] - **Hugging Face**: [rootxhacker/arthemis-lm](https://huggingface.co/rootxhacker/arthemis-lm) ## Uses This model can be used for: - Text generation and completion - Few-shot learning tasks - Research into neuromorphic language models - Educational purposes for understanding SNN/LTC architectures - Base model for fine-tuning on specific tasks ## Limitations - **Training Data**: Limited to 100M tokens (much smaller than typical LLMs) - **Context Length**: Maximum 1024 tokens - **Domain**: Primarily trained on English text - **Compute**: Training limited by budget constraints - **Performance**: Lower than larger, more extensively trained models ## Acknowledgments Special thanks to **keeeeenw** for the inspiration and open-source MicroLlama project, which demonstrated that impressive language models can be built on a budget. This work builds upon those principles while exploring neuromorphic computing approaches to language modeling. ## Citation ```bibtex @misc{arthemis-lm-2024, title={Arthemis-LM: A Neuromorphic Language Model with Spiking Neural Networks and Liquid Time Constants}, author={rootxhacker}, year={2024}, howpublished={\url{https://huggingface.co/rootxhacker/arthemis-lm}} } ``` ## License Apache License 2.0