JuliaFluxGPT-fused

A symbiogenesis-fused LLaMA-style decoder-only language model. Learned representations from JuliaSLM (5M params, d=256, val_loss=3.54) were projected into the larger JuliaFluxGPT architecture (23M params, d=512) using symbiogenesis projection fusion, then fine-tuned on curated classical philosophy texts.

Model Details

	Value
Parameters	22.79M
d_model	512
Layers	8
Attention	8 query heads / 2 KV heads (GQA)
Head dim	64
FFN	SwiGLU (inner dim 1344)
Normalization	RMSNorm (pre-norm)
Position encoding	RoPE (base 10000)
Context length	256 tokens
Vocab size	2000 (BPE)
Weight tying	Yes (embedding = output projection)
Val loss	3.698
Framework	PyTorch

Symbiogenesis Fusion

This model was created using a novel weight transfer technique — symbiogenesis projection fusion — rather than training from scratch:

Source: JuliaSLM (5.04M params, d=256, 6 layers, 4-head MHA, val_loss=3.54 on curated data)
Target: JuliaFluxGPT architecture (22.79M params, d=512, 8 layers, 8Q/2KV GQA)
Projection:
- Embedding: zero-pad d=256 → d=512 (noise-pad extra dims at 2% of embedding std)
- Q heads: each source head (4) duplicated to 2 target heads (8 total)
- KV heads: average pairs of source heads → 2 target KV heads
- Output projection: split equally across duplicated head pairs
- FFN: zero-pad inner_dim 640 → 1344
- Layers 0-5: transferred from source; Layers 6-7: randomly initialized
Fine-tuning: 7000 steps on 266M curated philosophy tokens (cosine LR 3e-4 → 1e-5)

Training Data

Curated classical philosophy corpus (BPE tokenized, vocab=2000). The same corpus used to train JuliaSLM, SymbioSLM, MonarchSLM, and SymbioGPT-10M.

Scaling Context

All models trained on the same curated philosophy corpus with BPE vocab=2000, ctx=256:

Model	Params	Val Loss	Architecture
SymbioSLM	4.07M	3.620	3-organelle gated (Lux.jl)
MonarchSLM	4.98M	3.650	Monarch Mixer (Lux.jl)
JuliaSLM	5.04M	3.540	Standard MHA (Lux.jl)
SymbioGPT-10M	11.05M	3.563	4-organelle gated (PyTorch)
JuliaFluxGPT-fused	22.79M	3.698	GQA fused from JuliaSLM (PyTorch)

Files

File	Description
`juliaflux_fused_best.pt`	Best checkpoint (wrapped: `{"model_state_dict": ..., "config": ..., "step": ..., "val_loss": ...}`)
`juliaflux_model.py`	Model architecture definition (PyTorch)
`vocab.json`	BPE tokenizer vocabulary (2000 tokens, GPT-2 format)
`merges.txt`	BPE merge rules

Usage

import torch
from juliaflux_model import JuliaFluxConfig, JuliaFluxGPT

config = JuliaFluxConfig()
model = JuliaFluxGPT(config)

checkpoint = torch.load("juliaflux_fused_best.pt", map_location="cpu", weights_only=True)
state_dict = checkpoint["model_state_dict"]
model.load_state_dict(state_dict)
model.eval()

Inference server: An OpenAI-compatible API server with KV-cached generation is available at spaces/LisaMegaWatts/JuliaFluxGPT-fused.

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Space using LisaMegaWatts/JuliaFluxGPT-fused 1

Evaluation results

Val Loss on Curated Philosophy Corpus
self-reported

3.698
Val Perplexity on Curated Philosophy Corpus
self-reported

40.400

LisaMegaWatts
/

JuliaFluxGPT-fused