IronCell — Mark 1: Technical Brief

GitHub Repository: gaoang1111/IronMan
Checkpoints: HuggingFace - IronCell-Mark-1
Training Logs: WandB Overview

Core Efficiency Metrics

Metric	Value / Performance
VRAM Footprint	Reduced by 93.75% (Requirement down to 6.25%)
Logic Integrity (PPL)	11.20 (FineWeb Zero-Overlap)
Baseline (Llama 3.1 8B)	7.40 PPL

The Verdict: This represents a marginal increase in perplexity exchanged for an impossible context capacity on consumer-grade GPUs.

Cellular Differentiation Theory

The project views a pre-trained LLM as a powerful but rigid "state machine" and treats the homologous base (Llama 3.1 8B) as a "stem cell". Through induced functional differentiation, the model is split into collaborating units:

Compressor (cmp): Specialized in distilling raw text chunks into dense semantic latent vectors.
Generator (gen): A causal language model trained to reconstruct and reason based on these compressed vectors.
Projector (proj): A linear mapping that translates compressor hidden states into the generator's hidden space.

Zipper Layout (Masked Parallel Training)

To achieve 16:1 sequence compression, IronCell utilizes a "control chain + raw chunks" layout:

Structural Chain: Formatted as [<bos>][<soc>] V-1 [<eoc>] V0 [<eoc>] V1 [<eoc>] ... [Raw_Token chunks]
Zipper (Staircase) Mask: A custom attention mask ensures each raw segment only attends to its permitted control tokens, maintaining causal integrity without information leakage.

Training & Reproducibility

The entire differentiation process is reproducible in an afternoon (~5 hours) using an 8×A800 node.

Phase 1: Alignment

Objective: Only the projector and new special tokens are trained.
Performance: Aligns the compressed signal as loss dropped from 12.8 to 4.12 in ~20 steps.

Phase 2: Differentiation

Objective: Model weights are unfrozen with L2 regularization.
Performance: Resulting in a steady eval loss decline from 2.72 to 2.41.

Data Specifications

Source: FineWeb-Edu (HuggingFace).
Scale: Phase 2 uses 10,000 samples.
Length: Individual string lengths ranging from 10k to 30k characters.
Protocol: A zero-overlap sampling strategy was maintained within the first 150 training steps.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ddddamn/IronCell-Mark-1

Base model

meta-llama/Llama-3.1-8B

Finetuned

(1770)

this model