Update README.md
Browse files
README.md
CHANGED
|
@@ -225,6 +225,24 @@ wip effort to make merging compatible llama model
|
|
| 225 |
|
| 226 |
## comparison to palmer-004
|
| 227 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 228 |
there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
|
| 229 |
|
| 230 |
```
|
|
@@ -250,20 +268,4 @@ Traceback (most recent call last):
|
|
| 250 |
File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
|
| 251 |
raise RuntimeError(
|
| 252 |
RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
|
| 253 |
-
```
|
| 254 |
-
|
| 255 |
-
| Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
|
| 256 |
-
|-----------|-------------|--------------|--------------------------------------|
|
| 257 |
-
| Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
|
| 258 |
-
| Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
|
| 259 |
-
| Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
|
| 260 |
-
| MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
|
| 261 |
-
| Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
|
| 262 |
-
| Final Normalization | model.norm.weight | model.norm.weight | Already identical |
|
| 263 |
-
| Language Model Head | lm_head.weight | lm_head.weight | Already identical |
|
| 264 |
-
| Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
|
| 265 |
-
| Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
|
| 266 |
-
| Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
|
| 267 |
-
| Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
|
| 268 |
-
| Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
|
| 269 |
-
| Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |
|
|
|
|
| 225 |
|
| 226 |
## comparison to palmer-004
|
| 227 |
|
| 228 |
+
| Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
|
| 229 |
+
|-----------|-------------|--------------|--------------------------------------|
|
| 230 |
+
| Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
|
| 231 |
+
| Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
|
| 232 |
+
| Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
|
| 233 |
+
| MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
|
| 234 |
+
| Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
|
| 235 |
+
| Final Normalization | model.norm.weight | model.norm.weight | Already identical |
|
| 236 |
+
| Language Model Head | lm_head.weight | lm_head.weight | Already identical |
|
| 237 |
+
| Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
|
| 238 |
+
| Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
|
| 239 |
+
| Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
|
| 240 |
+
| Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
|
| 241 |
+
| Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
|
| 242 |
+
| Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |
|
| 243 |
+
|
| 244 |
+
## further investigation
|
| 245 |
+
|
| 246 |
there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
|
| 247 |
|
| 248 |
```
|
|
|
|
| 268 |
File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
|
| 269 |
raise RuntimeError(
|
| 270 |
RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
|
| 271 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|