| --- | |
| tags: | |
| - merge | |
| - abacaj/phi-2-super | |
| base_model: | |
| - abacaj/phi-2-super | |
| # phi-2-DLEC | |
| The DLEC (Distributive Layer Expansion Curve) methodology offers a novel approach to improving neural network models by focusing on the strategic duplication of certain effective layers. | |
| Developed with the aim of enhancing model performance, DLEC carefully identifies and amplifies the impact of key layers within the model's architecture. | |
| Below is a overview of the method and its implementation, particularly in how it integrates with the Hugging Face Transformers library and utilizes PyTorch and BitsAndBytes for efficient operation. | |
| Overview | |
| Setting Up: First, the script ensures all necessary components are in place, from libraries to the model and dataset. | |
| Database for Activations: A SQLite database is established to track layer activations, providing a clear view into how individual neurons react and which layers are most influential — these are our 'beneficial layers.' | |
| Analyzing and Identifying: By analyzing activation data, the script pinpoints which layers are most valuable to the model's performance. | |
| Configuring DLEC: A configuration is then created, guiding how the model should incorporate duplicates of these beneficial layers to boost effectiveness without unnecessarily increasing complexity. | |
| Reconfiguring and Running the Model: Finally, the model is adjusted according to DLEC's insights, focusing enhancement on the identified layers. | |
| Key Features: | |
| Selective Layer Duplication: DLEC doesn't just add more layers; it doubles down on the ones that really matter. This methodical selection ensures we're making the most of the model's capabilities without wasteful expansion. | |
| Smart Resource Management: By honing in on specific areas for improvement, DLEC aims to make better use of computational and memory resources, promoting more efficient learning without adding undue complexity to the model. | |
| This approach is about making informed, strategic enhancements to model architecture, prioritizing efficiency and effectiveness in utilizing neural network capabilities. | |
| # This Method is still in development and I do not expect "Game Changing" or will I oversell this method, it is purely done for fun. Please let me know how the model works for you. | |
| ## 🧩 Configuration | |
| ```yaml | |
| dtype: bfloat16 | |
| merge_method: passthrough | |
| slices: | |
| - sources: | |
| - model: abacaj/phi-2-super | |
| layer_range: [0, 3] # Introduces 0, 3 | |
| - sources: | |
| - model: abacaj/phi-2-super | |
| layer_range: [3, 8] # Duplicates 3, introduces 4, 7, 8 | |
| - sources: | |
| - model: abacaj/phi-2-super | |
| layer_range: [7, 12] # Duplicates 7, 8, introduces 11, 12 | |
| - sources: | |
| - model: abacaj/phi-2-super | |
| layer_range: [11, 16] # Duplicates 11, 12, introduces 15, 16 | |
| - sources: | |
| - model: abacaj/phi-2-super | |
| layer_range: [15, 20] # Duplicates 15, 16, introduces 19, 20 | |
| - sources: | |
| - model: abacaj/phi-2-super | |
| layer_range: [19, 24] # Duplicates 19, 20, introduces 23, 24 | |
| - sources: | |
| - model: abacaj/phi-2-super | |
| layer_range: [23, 28] # Duplicates 23, 24, introduces 27, 28 | |
| - sources: | |
| - model: abacaj/phi-2-super | |
| layer_range: [27, 32] # Duplicates 27, 28, introduces 31, 32 | |
| ``` | |
| ## 💻 Usage | |
| ```python | |
| !pip install -qU transformers accelerate | |
| from transformers import AutoTokenizer | |
| import transformers | |
| import torch | |
| model = "TheSkullery/phi-2-DLEC" | |
| messages = [{"role": "user", "content": "What is a large language model?"}] | |
| tokenizer = AutoTokenizer.from_pretrained(model) | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| pipeline = transformers.pipeline( | |
| "text-generation", | |
| model=model, | |
| torch_dtype=torch.float16, | |
| device_map="auto", | |
| ) | |
| outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) | |
| print(outputs[0]["generated_text"]) | |
| ``` |