arthemis-lm / README.md

Update README.md

5a77836 verified 5 months ago

6.1 kB

	---
	license: mit
	datasets:
	- vesteinn/babylm
	---
	# rootxhacker/arthemis-lm

	Building capable language models shouldn't require massive corporate budgets. While the industry pushes toward increasingly large models, this project explores what's possible with neuromorphic architectures and limited resources.

	I developed this 155.8M parameter Llama-SNN-LTC model with specific constraints:

	- Budget limit: Under $50 using Google Colab Pro Plus
	- From-scratch pretraining with fully open-source dataset
	- No fine-tuning or synthetic data generation from existing LLMs
	- Focus on architectural innovation over scale

	## Model Details

	This project incorporates Spiking Neural Networks (SNNs) and Liquid Time Constants (LTCs) into the Llama architecture, creating a neuromorphic language model. I spent under $50 on Google Colab Pro Plus and used the first 1M samples from the BabyLM challenge dataset, which contains approximately 100M tokens.
	This model is working on par with google/bert-large-uncased model

	Model Type: Causal Language Model with Neuromorphic Enhancements
	Supported Languages: English
	Number of Parameters: 155.8M
	Context Length: 1024 tokens
	Base Architecture: Llama with SNN/LTC modifications
	Training Data: BabyLM (vesteinn/babylm) - 1M samples (~100M tokens)

	### Architecture Features
	- Spiking Neural Networks in attention mechanisms for temporal processing
	- Liquid Time Constants in feed-forward layers for adaptive dynamics
	- 12-layer transformer backbone with neuromorphic enhancements
	- RoPE positional encoding for sequence understanding
	- Custom surrogate gradient training for differentiable spike computation

	Here are my major model configurations:

	```
	hidden_size = 768
	intermediate_size = 2048
	num_hidden_layers = 12
	num_attention_heads = 12
	num_key_value_heads = 12
	max_position_embeddings = 1024
	vocab_size = 50257
	spiking_threshold = 1.0
	ltc_hidden_size = 256
	ltc_layers = 2
	```

	## Usage

	### Install dependencies
	```bash
	pip install transformers torch numpy
	```

	## Inference
	This gist has full code for inference

	``` bash
	https://gist.github.com/harishsg993010/e632de8b15a3ab1ff03e3912f55109ea
	```

	## Evaluation

	I performed evaluation using https://gist.github.com/harishsg993010/e3c31c2d2c8207384ee263627f990300

	### Results Comparison

	\| Model \| Params \| Budget \| HellaSwag \| OBQA \| WinoGrande \| ARC_e \| ARC_c \| BoolQ \| Avg \|
	\|-------\|--------\|--------\|-----------\|------\|------------\|-------\|-------\|-------\|-----\|
	\| rootxhacker/arthemis-lm \| 155.8M \| <$50 \| 24.65 \| 20.60 \| 48.10 \| 28.20 \| 22.20 \| 39.80 \| 30.59 \|
	\| google/bert-large-uncased \| 336M \| N/A \| 24.53 \| 26.20 \| 49.80 \| 25.08 \| 25.68 \| 40.86 \| 32.03 \|

	## Observations

	- Budget Efficiency: Our model achieves competitive performance with only ~$50 budget, demonstrating that meaningful language models can be built with limited resources.
	- Neuromorphic Advantages: The SNN-LTC architecture shows particularly strong performance in WinoGrande (48.10%), suggesting enhanced reasoning capabilities from temporal dynamics.
	- Parameter Efficiency: With 155.8M parameters, our model performs comparably to BERT-large-uncased (336M parameters) while being significantly smaller.
	- Room for Improvement: More training data and compute would likely improve performance, but the current results validate the neuromorphic approach.



	```
	Architecture: Llama + Spiking Neural Networks + Liquid Time Constants
	Hidden Size: 768
	Intermediate Size: 2048
	Attention Heads: 12
	Layers: 12
	Max Position Embeddings: 1024
	Vocabulary Size: 50,257
	Spiking Threshold: 1.0
	LTC Hidden Size: 256
	Training Precision: FP32
	```

	## Training Details

	The model was pretrained from scratch using:
	- Dataset: BabyLM (vesteinn/babylm) - First 1M samples (~100M tokens)
	- Hardware: Google Colab Pro Plus (A100 GPU)
	- Training Steps: 20,000 steps
	- Batch Size: 8 with gradient accumulation
	- Learning Rate: 3e-4 with linear warmup
	- Precision: FP32 for stability with neuromorphic components

	### Key Innovations
	- Custom SNN Implementation: Leaky Integrate-and-Fire neurons with surrogate gradients
	- Liquid Time Constants: Adaptive time dynamics in feed-forward layers
	- Budget-Conscious Training: Optimized for maximum performance per dollar spent
	- Neuromorphic Language Modeling: First known integration of SNNs and LTCs in causal LM

	## Future Work

	- Scale to larger datasets with increased compute budget
	- Explore different spiking neuron models (e.g., Adaptive LIF, Izhikevich)
	- Implement more sophisticated LTC architectures
	- Fine-tune for specific downstream tasks
	- Compare energy efficiency with standard transformers

	## Model Sources

	- Repository: [Coming Soon]
	- Paper: [In Progress]
	- Hugging Face: [rootxhacker/arthemis-lm](https://huggingface.co/rootxhacker/arthemis-lm)

	## Uses

	This model can be used for:
	- Text generation and completion
	- Few-shot learning tasks
	- Research into neuromorphic language models
	- Educational purposes for understanding SNN/LTC architectures
	- Base model for fine-tuning on specific tasks

	## Limitations

	- Training Data: Limited to 100M tokens (much smaller than typical LLMs)
	- Context Length: Maximum 1024 tokens
	- Domain: Primarily trained on English text
	- Compute: Training limited by budget constraints
	- Performance: Lower than larger, more extensively trained models

	## Acknowledgments

	Special thanks to keeeeenw for the inspiration and open-source MicroLlama project, which demonstrated that impressive language models can be built on a budget. This work builds upon those principles while exploring neuromorphic computing approaches to language modeling.

	## Citation

	```bibtex
	@misc{arthemis-lm-2024,
	title={Arthemis-LM: A Neuromorphic Language Model with Spiking Neural Networks and Liquid Time Constants},
	author={rootxhacker},
	year={2024},
	howpublished={\url{https://huggingface.co/rootxhacker/arthemis-lm}}
	}
	```

	## License

	Apache License 2.0