broadfield-dev/codegemma-7b-it-8L-2048h-sliced-healed

Model Description

This model is a Reforged version of broadfield-dev/codegemma-7b-it-8L-2048h-sliced, created using Tensor-Centric Model Reforging.

The architecture has been reduced and "Healed" to create a compact, efficient model that retains subspace consistency.

Reforging Configuration

  • Original Model: broadfield-dev/codegemma-7b-it-8L-2048h-sliced
  • Target Layers: N/A
  • Target Hidden Size: N/A
  • Target Vocab: N/A
  • Slicing Method: Magnitude-based Structured Pruning (Global Coherence)

Healing Process

  1. Surgical Slicing: Weights were pruned using a global importance mask to maintain residual stream coherence.
  2. Surgical Healing (Phase 1): A "Stitch" LoRA (Rank 64) was trained on the bridge layer to realign the severed hidden states.
  3. Global Adaptation (Phase 2): A second LoRA (Rank 8) was trained globally to fine-tune the reduced capacity.
  • Dataset: databricks/databricks-dolly-15k
  • Healing Steps: 200
  • Global Steps: 50

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "broadfield-dev/codegemma-7b-it-8L-2048h-sliced-healed"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

input_text = "Explain the theory of relativity."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
Downloads last month
-
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for broadfield-dev/codegemma-7b-it-8L-2048h-sliced-healed

Finetuned
(1)
this model

Dataset used to train broadfield-dev/codegemma-7b-it-8L-2048h-sliced-healed