appvoid
/

llama-3-1b

 ## llama 3 1b
+wip effort to make merging compatible llama model
+## comparison to palmer-004
+there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
+```
+Traceback (most recent call last):
+  File "/home/zeus/miniconda3/envs/cloudspace/bin/mergekit-yaml", line 8, in <module>
+    sys.exit(main())
+  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
+    return self.main(*args, **kwargs)
+  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1078, in main
+    rv = self.invoke(ctx)
+  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
+    return ctx.invoke(self.callback, **ctx.params)
+  File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 783, in invoke
+    return __callback(*args, **kwargs)
+  File "/teamspace/studios/this_studio/mergekit/mergekit/options.py", line 82, in wrapper
+    f(*args, **kwargs)
+  File "/teamspace/studios/this_studio/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
+    run_merge(
+  File "/teamspace/studios/this_studio/mergekit/mergekit/merge.py", line 96, in run_merge
+    for _task, value in exec.run(quiet=options.quiet):
+  File "/teamspace/studios/this_studio/mergekit/mergekit/graph.py", line 197, in run
+    res = task.execute(**arguments)
+  File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
+    raise RuntimeError(
+RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
+```
+| Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
+|-----------|-------------|--------------|--------------------------------------|
+| Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
+| Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
+| Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
+| MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
+| Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
+| Final Normalization | model.norm.weight | model.norm.weight | Already identical |
+| Language Model Head | lm_head.weight | lm_head.weight | Already identical |
+| Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
+| Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
+| Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
+| Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
+| Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
+| Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |