cell_vs_llama

IronCell โ€” Mark 1: Technical Brief

GitHub Repository: gaoang1111/IronMan
Checkpoints: HuggingFace - IronCell-Mark-1
Training Logs: WandB Overview


Core Efficiency Metrics

Metric Value / Performance
VRAM Footprint Reduced by 93.75% (Requirement down to 6.25%)
Logic Integrity (PPL) 11.20 (FineWeb Zero-Overlap)
Baseline (Llama 3.1 8B) 7.40 PPL

The Verdict: This represents a marginal increase in perplexity exchanged for an impossible context capacity on consumer-grade GPUs.


Cellular Differentiation Theory

The project views a pre-trained LLM as a powerful but rigid "state machine" and treats the homologous base (Llama 3.1 8B) as a "stem cell". Through induced functional differentiation, the model is split into collaborating units:

  • Compressor (cmp): Specialized in distilling raw text chunks into dense semantic latent vectors.
  • Generator (gen): A causal language model trained to reconstruct and reason based on these compressed vectors.
  • Projector (proj): A linear mapping that translates compressor hidden states into the generator's hidden space.

Zipper Layout (Masked Parallel Training)

To achieve 16:1 sequence compression, IronCell utilizes a "control chain + raw chunks" layout:

  1. Structural Chain: Formatted as [<bos>][<soc>] V-1 [<eoc>] V0 [<eoc>] V1 [<eoc>] ... [Raw_Token chunks]
  2. Zipper (Staircase) Mask: A custom attention mask ensures each raw segment only attends to its permitted control tokens, maintaining causal integrity without information leakage.

Training & Reproducibility

The entire differentiation process is reproducible in an afternoon (~5 hours) using an 8ร—A800 node.

Phase 1: Alignment

  • Objective: Only the projector and new special tokens are trained.
  • Performance: Aligns the compressed signal as loss dropped from 12.8 to 4.12 in ~20 steps.

Phase 2: Differentiation

  • Objective: Model weights are unfrozen with L2 regularization.
  • Performance: Resulting in a steady eval loss decline from 2.72 to 2.41.

Data Specifications

  • Source: FineWeb-Edu (HuggingFace).
  • Scale: Phase 2 uses 10,000 samples.
  • Length: Individual string lengths ranging from 10k to 30k characters.
  • Protocol: A zero-overlap sampling strategy was maintained within the first 150 training steps.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ddddamn/IronCell-Mark-1

Finetuned
(1721)
this model