broadfield-dev/codegemma-7b-it-8L-2048h-sliced-healed
Model Description
This model is a Reforged version of broadfield-dev/codegemma-7b-it-8L-2048h-sliced, created using Tensor-Centric Model Reforging.
The architecture has been reduced and "Healed" to create a compact, efficient model that retains subspace consistency.
Reforging Configuration
- Original Model: broadfield-dev/codegemma-7b-it-8L-2048h-sliced
- Target Layers: N/A
- Target Hidden Size: N/A
- Target Vocab: N/A
- Slicing Method: Magnitude-based Structured Pruning (Global Coherence)
Healing Process
- Surgical Slicing: Weights were pruned using a global importance mask to maintain residual stream coherence.
- Surgical Healing (Phase 1): A "Stitch" LoRA (Rank 64) was trained on the bridge layer to realign the severed hidden states.
- Global Adaptation (Phase 2): A second LoRA (Rank 8) was trained globally to fine-tune the reduced capacity.
- Dataset: databricks/databricks-dolly-15k
- Healing Steps: 200
- Global Steps: 50
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "broadfield-dev/codegemma-7b-it-8L-2048h-sliced-healed"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
input_text = "Explain the theory of relativity."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support