arcee-ai
/

Clown-DPO-Extended

Text Generation

CorticalStack/pastiche-crown-clown-7b-dare-dpo

text-generation-inference

Model card Files Files and versions

Mark-Arcee commited on Mar 18, 2024

Commit

a6c74a9

·

verified ·

1 Parent(s): 71bac2b

Update README.md

Files changed (1) hide show

README.md +46 -2

README.md CHANGED Viewed

@@ -18,7 +18,11 @@ This is a extension of a pre-trained language models created using [mergekit](ht
 # Merge Details
 ### Merge Method
-This model was extended using the passthrough merge method.
 ### Models Merged
@@ -181,7 +185,47 @@ The following YAML configuration was used to produce this model:
   merge_method: passthrough
   dtype: bfloat16
 ```

 # Merge Details
 ### Merge Method
+This method employs mergekit's passthrough method to expand blocks within the "CorticalStack/pastiche-crown-clown-7b-dare-dpo" model. For every 5th layer,
+a new layer is added, with the `o_proj` and `down_proj` parameters of these added layers initialized to zero, mirroring the approach used in LLaMA Pro.
+### It's important to note that this configuration has not undergone fine-tuning. Therefore, when fine-tuning, ensure that only every 5th layer is trainable, while all other layers remain frozen.
 ### Models Merged
   merge_method: passthrough
   dtype: bfloat16
+```
+# Function to freeze layers
 ```
+from transformers import AutoModelForCausalLM
+def enable_grad_only_every_nth(model, n):
+    """
+    This function configures the specified model to enable gradient calculations exclusively for every nth layer, starting
+    from the first layer (0-indexed), to accommodate newly added blocks for training. Concurrently, it freezes the gradients
+    for all other components of the model, including the embedding layers and the model's head. This setup is particularly
+    useful for fine-tuning processes where only a subset of layers are targeted for updates, ensuring efficient training and
+    adaptation of newly integrated layers while maintaining the pre-trained behavior of other model components.
+    """
+    # Freeze embeddings.
+    for param in model.model.embed_tokens.parameters():
+        param.requires_grad = False
+    # Freeze lm_head.
+    for param in model.lm_head.parameters():
+        param.requires_grad = False
+    # Enable gradients for every nth layer
+    layers = model.model.layers  # Access the ModuleList containing the layers
+    for index, layer in enumerate(layers):
+        if (index + 1) % n == 0:  # Enables gradients for every nth layer, starting from the layer after the 0th
+            for param in layer.parameters():
+                param.requires_grad = True
+        else:
+            for param in layer.parameters():
+                param.requires_grad = False
+model = transformers.AutoModelForCausalLM.from_pretrained(
+    "arcee-ai/Mistral-7B-Instruct-v0.2-expanded"
+    )
+# Update layer gradients, specify the correct value for n based on your model's architecture
+n =5
+enable_grad_only_every_nth(model, n)
+```