Butanium commited on
Commit
480c1c7
·
verified ·
1 Parent(s): ca63a14

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +29 -26
README.md CHANGED
@@ -1,29 +1,32 @@
1
- ---
2
- {}
3
- ---
4
- 0-layer transformer described in [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html).Load with ```python
5
- class ZeroLayerTransformer(nn.Module, PyTorchModelHubMixin):
6
- def __init__(self, config):
7
- super().__init__()
8
- self.config = config
9
- self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size)
10
- self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
11
-
12
- def forward(self, input_ids=None, attention_mask=None, labels=None, **kwargs):
13
- hidden_states = self.embed_tokens(input_ids)
14
- logits = self.lm_head(hidden_states)
15
-
16
- loss = None
17
- if labels is not None:
18
- shift_logits = logits[..., :-1, :].contiguous()
19
- shift_labels = labels[..., 1:].contiguous()
20
- loss_fct = nn.CrossEntropyLoss()
21
- loss = loss_fct(
22
- shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)
23
- )
24
-
25
- return {"loss": loss, "logits": logits}
26
 
 
27
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ```
29
- model = ZeroLayerTransformer.from_pretrained('Butanium/simple-stories-zero-layer-simple-transformer')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Zero-Layer Simple Transformer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ A 0-layer transformer described in [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html).
4
 
5
+ ## Usage
6
+
7
+ ```python
8
+ from transformers import LlamaConfig
9
+ from migrate_models import ZeroLayerTransformer
10
+
11
+ # Load the model
12
+ model = ZeroLayerTransformer.from_pretrained('Butanium/simple-stories-zero-layer-simple-transformer')
13
+
14
+ # Or create from config
15
+ config = LlamaConfig(vocab_size=4096, hidden_size=128, num_hidden_layers=0)
16
+ model = ZeroLayerTransformer(config)
17
  ```
18
+
19
+ ## Model Architecture
20
+
21
+ This model consists of only:
22
+ - Token embeddings
23
+ - Linear output head (no transformer layers)
24
+
25
+ It serves as a baseline for understanding transformer circuits and the importance of attention layers.
26
+
27
+ ## Training Details
28
+
29
+ - Trained on SimpleStories dataset
30
+ - Vocabulary size: 4096
31
+ - Hidden size: 128
32
+ - No transformer layers (0-layer architecture)