appvoid commited on
Commit
22cdfc4
·
verified ·
1 Parent(s): 19dd077

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -1
README.md CHANGED
@@ -221,4 +221,49 @@ extra_gated_button_content: Submit
221
 
222
  ## llama 3 1b
223
 
224
- wip effort to make merging compatible llama model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
 
222
  ## llama 3 1b
223
 
224
+ wip effort to make merging compatible llama model
225
+
226
+ ## comparison to palmer-004
227
+
228
+ there is not differences between these models but for some reason i'm constantly facing this error when doing passthrough:
229
+
230
+ ```
231
+ Traceback (most recent call last):
232
+ File "/home/zeus/miniconda3/envs/cloudspace/bin/mergekit-yaml", line 8, in <module>
233
+ sys.exit(main())
234
+ File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
235
+ return self.main(*args, **kwargs)
236
+ File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1078, in main
237
+ rv = self.invoke(ctx)
238
+ File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
239
+ return ctx.invoke(self.callback, **ctx.params)
240
+ File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/click/core.py", line 783, in invoke
241
+ return __callback(*args, **kwargs)
242
+ File "/teamspace/studios/this_studio/mergekit/mergekit/options.py", line 82, in wrapper
243
+ f(*args, **kwargs)
244
+ File "/teamspace/studios/this_studio/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
245
+ run_merge(
246
+ File "/teamspace/studios/this_studio/mergekit/mergekit/merge.py", line 96, in run_merge
247
+ for _task, value in exec.run(quiet=options.quiet):
248
+ File "/teamspace/studios/this_studio/mergekit/mergekit/graph.py", line 197, in run
249
+ res = task.execute(**arguments)
250
+ File "/teamspace/studios/this_studio/mergekit/mergekit/io/tasks.py", line 86, in execute
251
+ raise RuntimeError(
252
+ RuntimeError: Tensor lm_head.weight required but not present in model meta-llama/Llama-3.2-1B
253
+ ```
254
+
255
+ | Component | palmer-004 | llama 3 1b | How to Make Second Similar to First |
256
+ |-----------|-------------|--------------|--------------------------------------|
257
+ | Total Layers | 22 (0 to 21) | 16 (0 to 15) | Add 6 more layers (16 to 21) with identical structure to existing layers |
258
+ | Embedding Layer | model.embed_tokens.weight | model.embed_tokens.weight | Already identical |
259
+ | Self-Attention Layers | 22 sets of (q_proj, k_proj, v_proj, o_proj) weights | 16 sets of (q_proj, k_proj, v_proj, o_proj) weights | Add 6 more sets of self-attention weights |
260
+ | MLP Layers | 22 sets of (gate_proj, up_proj, down_proj) weights | 16 sets of (gate_proj, up_proj, down_proj) weights | Add 6 more sets of MLP weights |
261
+ | Layer Normalization | 22 sets of (input_layernorm, post_attention_layernorm) weights | 16 sets of (input_layernorm, post_attention_layernorm) weights | Add 6 more sets of layer normalization weights |
262
+ | Final Normalization | model.norm.weight | model.norm.weight | Already identical |
263
+ | Language Model Head | lm_head.weight | lm_head.weight | Already identical |
264
+ | Layer Structure | Consistent across all 22 layers | Consistent across all 16 layers | Maintain the same structure when adding new layers |
265
+ | Hidden Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same hidden size |
266
+ | Attention Heads | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same number of attention heads |
267
+ | Intermediate MLP Size | Likely consistent (inferred from weight names) | Likely consistent (inferred from weight names) | Ensure new layers use the same intermediate MLP size |
268
+ | Position Embeddings | Not explicitly mentioned (might be part of embed_tokens) | Not explicitly mentioned (might be part of embed_tokens) | Ensure position embeddings support the maximum sequence length of the first model |
269
+ | Vocabulary Size | Determined by embed_tokens and lm_head dimensions | Determined by embed_tokens and lm_head dimensions | Already identical (assuming dimensions match) |