JuliaFluxGPT-fused

A symbiogenesis-fused LLaMA-style decoder-only language model. Learned representations from JuliaSLM (5M params, d=256, val_loss=3.54) were projected into the larger JuliaFluxGPT architecture (23M params, d=512) using symbiogenesis projection fusion, then fine-tuned on curated classical philosophy texts.

Model Details

Value
Parameters 22.79M
d_model 512
Layers 8
Attention 8 query heads / 2 KV heads (GQA)
Head dim 64
FFN SwiGLU (inner dim 1344)
Normalization RMSNorm (pre-norm)
Position encoding RoPE (base 10000)
Context length 256 tokens
Vocab size 2000 (BPE)
Weight tying Yes (embedding = output projection)
Val loss 3.698
Framework PyTorch

Symbiogenesis Fusion

This model was created using a novel weight transfer technique โ€” symbiogenesis projection fusion โ€” rather than training from scratch:

  1. Source: JuliaSLM (5.04M params, d=256, 6 layers, 4-head MHA, val_loss=3.54 on curated data)
  2. Target: JuliaFluxGPT architecture (22.79M params, d=512, 8 layers, 8Q/2KV GQA)
  3. Projection:
    • Embedding: zero-pad d=256 โ†’ d=512 (noise-pad extra dims at 2% of embedding std)
    • Q heads: each source head (4) duplicated to 2 target heads (8 total)
    • KV heads: average pairs of source heads โ†’ 2 target KV heads
    • Output projection: split equally across duplicated head pairs
    • FFN: zero-pad inner_dim 640 โ†’ 1344
    • Layers 0-5: transferred from source; Layers 6-7: randomly initialized
  4. Fine-tuning: 7000 steps on 266M curated philosophy tokens (cosine LR 3e-4 โ†’ 1e-5)

Training Data

Curated classical philosophy corpus (BPE tokenized, vocab=2000). The same corpus used to train JuliaSLM, SymbioSLM, MonarchSLM, and SymbioGPT-10M.

Scaling Context

All models trained on the same curated philosophy corpus with BPE vocab=2000, ctx=256:

Model Params Val Loss Architecture
SymbioSLM 4.07M 3.620 3-organelle gated (Lux.jl)
MonarchSLM 4.98M 3.650 Monarch Mixer (Lux.jl)
JuliaSLM 5.04M 3.540 Standard MHA (Lux.jl)
SymbioGPT-10M 11.05M 3.563 4-organelle gated (PyTorch)
JuliaFluxGPT-fused 22.79M 3.698 GQA fused from JuliaSLM (PyTorch)

Files

File Description
juliaflux_fused_best.pt Best checkpoint (wrapped: {"model_state_dict": ..., "config": ..., "step": ..., "val_loss": ...})
juliaflux_model.py Model architecture definition (PyTorch)
vocab.json BPE tokenizer vocabulary (2000 tokens, GPT-2 format)
merges.txt BPE merge rules

Usage

import torch
from juliaflux_model import JuliaFluxConfig, JuliaFluxGPT

config = JuliaFluxConfig()
model = JuliaFluxGPT(config)

checkpoint = torch.load("juliaflux_fused_best.pt", map_location="cpu", weights_only=True)
state_dict = checkpoint["model_state_dict"]
model.load_state_dict(state_dict)
model.eval()

Inference server: An OpenAI-compatible API server with KV-cached generation is available at spaces/LisaMegaWatts/JuliaFluxGPT-fused.

Links

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using LisaMegaWatts/JuliaFluxGPT-fused 1

Evaluation results