Butanium
/

simple-stories-one-layer-simple-transformer

Butanium commited on Aug 4, 2025

Commit

40e27a9

verified ·

1 Parent(s): 4e74706

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,10 +1,34 @@
----
-tags:
-- model_hub_mixin
-- pytorch_model_hub_mixin
----
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: [More Information Needed]
-- Paper: [More Information Needed]
-- Docs: [More Information Needed]

+# One-Layer Simple Transformer
+A 1-layer transformer described in [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html).
+## Usage
+```python
+from transformers import LlamaConfig
+from migrate_models import OneLayerTransformer
+# Load the model
+model = OneLayerTransformer.from_pretrained('Butanium/simple-stories-one-layer-simple-transformer')
+# Or create from config
+config = LlamaConfig(vocab_size=4096, hidden_size=128, num_hidden_layers=1)
+model = OneLayerTransformer(config)
+```
+## Model Architecture
+This model consists of:
+- Token embeddings
+- Single self-attention layer with residual connection
+- Linear output head
+It serves as a minimal transformer for understanding attention mechanisms and transformer circuits.
+## Training Details
+- Trained on SimpleStories dataset
+- Vocabulary size: 4096
+- Hidden size: 128
+- Single self-attention layer
+- 4 attention heads