Update README.md
Browse files
README.md
CHANGED
|
@@ -221,4 +221,49 @@ extra_gated_button_content: Submit
|
|
| 221 |
|
| 222 |
## llama 3 1b
|
| 223 |
|
| 224 |
-
wip effort to make merging compatible llama model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 221 |
|
| 222 |
## llama 3 1b
|
| 223 |
|
| 224 |
+
wip effort to make merging compatible llama model
|
| 225 |
+
|
| 226 |
+
## comparison to palmer-004
|
| 227 |
+
|
| 228 |
+
there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
|
| 229 |
+
|
| 230 |
+
```
|
| 231 |
+
Traceback (most recent call last):
|
| 232 |
+
File "/home/zeus/miniconda3/envs/cloudspace/bin/mergekit-yaml", line 8, in <module>
|
| 233 |
+
sys.exit(main())
|
| 234 |
+
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
|
| 235 |
+
return self.main(*args, **kwargs)
|
| 236 |
+
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1078, in main
|
| 237 |
+
rv = self.invoke(ctx)
|
| 238 |
+
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
|
| 239 |
+
return ctx.invoke(self.callback, **ctx.params)
|
| 240 |
+
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 783, in invoke
|
| 241 |
+
return __callback(*args, **kwargs)
|
| 242 |
+
File "/teamspace/studios/this_studio/mergekit/mergekit/options.py", line 82, in wrapper
|
| 243 |
+
f(*args, **kwargs)
|
| 244 |
+
File "/teamspace/studios/this_studio/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
|
| 245 |
+
run_merge(
|
| 246 |
+
File "/teamspace/studios/this_studio/mergekit/mergekit/merge.py", line 96, in run_merge
|
| 247 |
+
for _task, value in exec.run(quiet=options.quiet):
|
| 248 |
+
File "/teamspace/studios/this_studio/mergekit/mergekit/graph.py", line 197, in run
|
| 249 |
+
res = task.execute(**arguments)
|
| 250 |
+
File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
|
| 251 |
+
raise RuntimeError(
|
| 252 |
+
RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
|
| 253 |
+
```
|
| 254 |
+
|
| 255 |
+
| Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
|
| 256 |
+
|-----------|-------------|--------------|--------------------------------------|
|
| 257 |
+
| Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
|
| 258 |
+
| Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
|
| 259 |
+
| Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
|
| 260 |
+
| MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
|
| 261 |
+
| Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
|
| 262 |
+
| Final Normalization | model.norm.weight | model.norm.weight | Already identical |
|
| 263 |
+
| Language Model Head | lm_head.weight | lm_head.weight | Already identical |
|
| 264 |
+
| Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
|
| 265 |
+
| Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
|
| 266 |
+
| Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
|
| 267 |
+
| Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
|
| 268 |
+
| Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
|
| 269 |
+
| Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |
|