Davidsv commited on
Commit
140d11e
·
verified ·
1 Parent(s): adb97da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -35
README.md CHANGED
@@ -8,44 +8,18 @@ tags:
8
  - mergekit
9
  - lazymergekit
10
  - mistral
11
- - optimized
12
  ---
13
- # SUONG-4 (3B Parameters)
14
 
15
- This is an optimized merge of pre-trained language models created using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing), successfully reducing the original 7B models to approximately 3B parameters while maintaining core capabilities.
16
 
17
- ## About Me
18
- I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and resource usage.
19
-
20
- 🔗 [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/)
21
-
22
- ## Model Size Optimization
23
- The reduction from 7B to 3B parameters was achieved through:
24
- - Layer reduction from 32 to 12 layers
25
- - Conversion to bfloat16 format (half precision)
26
- - Selective layer range implementation
27
- - SLERP merge method optimization with progressive fusion
28
 
29
  ## Merge Details
30
- ### Models Merged
31
- * [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B)
32
- * [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B)
 
 
33
 
34
- ### Configuration
35
- ```yaml
36
- slices:
37
- - sources:
38
- - model: mlabonne/NeuralHermes-2.5-Mistral-7B
39
- layer_range: [0, 32]
40
- - model: teknium/OpenHermes-2.5-Mistral-7B
41
- layer_range: [0, 32]
42
- merge_method: slerp
43
- base_model: mlabonne/NeuralHermes-2.5-Mistral-7B
44
- parameters:
45
- t:
46
- - filter: self_attn
47
- value: [0, 0.3, 0.6, 0.9, 1] # Progressive fusion des couches d'attention
48
- - filter: mlp
49
- value: [1, 0.7, 0.4, 0.1, 0] # Transition inverse pour les MLP
50
- - value: 0.45 # Ratio de fusion global
51
- dtype: bfloat16
 
8
  - mergekit
9
  - lazymergekit
10
  - mistral
 
11
  ---
12
+ # SUONG-4 (7B Parameters)
13
 
14
+ This is a merge of pre-trained language models created using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing), combining the strengths of NeuralHermes and OpenHermes models.
15
 
16
+ [... reste de la présentation personnelle ...]
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Merge Details
19
+ ### Merge Method
20
+ This model uses SLERP (Spherical Linear Interpolation) with a progressive fusion approach:
21
+ - Progressive attention layer fusion (0 to 1)
22
+ - Inverse MLP layer transition (1 to 0)
23
+ - Global fusion ratio of 0.45
24
 
25
+ [... reste de la model card avec la configuration et le code d'utilisation ...]