| --- |
| base_model: [] |
| library_name: transformers |
| tags: |
| - mergekit |
| - merge |
| license: apache-2.0 |
| language: |
| - ja |
| - en |
| --- |
| # final_merge |
| |
| This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). |
| |
| ## Merge Details |
| ### Merge Method |
| |
| This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [kawagoshi-llm-team/llma3_manydata_our_data_rope](https://huggingface.co/kawagoshi-llm-team/llma3_manydata_our_data_rope) as a base. |
| |
| ### Models Merged |
| |
| The following models were included in the merge: |
| * [kawagoshi-llm-team/llama3_sft_many_chat](https://huggingface.co/kawagoshi-llm-team/llama3_sft_many_chat) |
| * [kawagoshi-llm-team/llma3_manydata_not_our_data_rope](https://huggingface.co/kawagoshi-llm-team/llma3_manydata_not_our_data_rope) |
| |
| ### Configuration |
| |
| The following YAML configuration was used to produce this model: |
| |
| ```yaml |
| base_model: kawagoshi-llm-team/llma3_manydata_our_data_rope |
| dtype: bfloat16 |
| merge_method: dare_ties |
| parameters: |
| int8_mask: 1.0 |
| normalize: 1.0 |
| slices: |
| - sources: |
| - layer_range: [0, 8] |
| model: kawagoshi-llm-team/llma3_manydata_our_data_rope |
| parameters: |
| density: 0.7653375506603464 |
| weight: 0.13767610478325062 |
| - layer_range: [0, 8] |
| model: kawagoshi-llm-team/llma3_manydata_not_our_data_rope |
| parameters: |
| density: 0.7336602489449524 |
| weight: 0.3666639544856324 |
| - layer_range: [0, 8] |
| model: kawagoshi-llm-team/llama3_sft_many_chat |
| parameters: |
| density: 1.0 |
| weight: 0.3030835610677404 |
| - sources: |
| - layer_range: [8, 16] |
| model: kawagoshi-llm-team/llma3_manydata_our_data_rope |
| parameters: |
| density: 0.9861387586510485 |
| weight: 0.3948174181228292 |
| - layer_range: [8, 16] |
| model: kawagoshi-llm-team/llma3_manydata_not_our_data_rope |
| parameters: |
| density: 0.8413699662162298 |
| weight: 0.45739982954282526 |
| - layer_range: [8, 16] |
| model: kawagoshi-llm-team/llama3_sft_many_chat |
| parameters: |
| density: 1.0 |
| weight: 0.30274586211044396 |
| - sources: |
| - layer_range: [16, 24] |
| model: kawagoshi-llm-team/llma3_manydata_our_data_rope |
| parameters: |
| density: 0.9503146891835705 |
| weight: 0.2849061463174477 |
| - layer_range: [16, 24] |
| model: kawagoshi-llm-team/llma3_manydata_not_our_data_rope |
| parameters: |
| density: 0.832031377573231 |
| weight: 0.6047693096979141 |
| - layer_range: [16, 24] |
| model: kawagoshi-llm-team/llama3_sft_many_chat |
| parameters: |
| density: 0.9442991059236329 |
| weight: 0.4002445342115458 |
| - sources: |
| - layer_range: [24, 32] |
| model: kawagoshi-llm-team/llma3_manydata_our_data_rope |
| parameters: |
| density: 0.8517897851608993 |
| weight: 0.3362716927810899 |
| - layer_range: [24, 32] |
| model: kawagoshi-llm-team/llma3_manydata_not_our_data_rope |
| parameters: |
| density: 1.0 |
| weight: 0.2909336827183003 |
| - layer_range: [24, 32] |
| model: kawagoshi-llm-team/llama3_sft_many_chat |
| parameters: |
| density: 1.0 |
| weight: 0.3474712168573882 |
| - sources: |
| - layer_range: [32, 40] |
| model: kawagoshi-llm-team/llma3_manydata_our_data_rope |
| parameters: |
| density: 1.0 |
| weight: 0.27727322046805786 |
| - layer_range: [32, 40] |
| model: kawagoshi-llm-team/llma3_manydata_not_our_data_rope |
| parameters: |
| density: 0.8394275769864135 |
| weight: 0.4724670213437233 |
| - layer_range: [32, 40] |
| model: kawagoshi-llm-team/llama3_sft_many_chat |
| parameters: |
| density: 1.0 |
| weight: 0.31333702280148296 |
| tokenizer_source: base |
| ``` |