JuliaFluxGPT-fused
A symbiogenesis-fused LLaMA-style decoder-only language model. Learned representations from JuliaSLM (5M params, d=256, val_loss=3.54) were projected into the larger JuliaFluxGPT architecture (23M params, d=512) using symbiogenesis projection fusion, then fine-tuned on curated classical philosophy texts.
Model Details
| Value | |
|---|---|
| Parameters | 22.79M |
| d_model | 512 |
| Layers | 8 |
| Attention | 8 query heads / 2 KV heads (GQA) |
| Head dim | 64 |
| FFN | SwiGLU (inner dim 1344) |
| Normalization | RMSNorm (pre-norm) |
| Position encoding | RoPE (base 10000) |
| Context length | 256 tokens |
| Vocab size | 2000 (BPE) |
| Weight tying | Yes (embedding = output projection) |
| Val loss | 3.698 |
| Framework | PyTorch |
Symbiogenesis Fusion
This model was created using a novel weight transfer technique โ symbiogenesis projection fusion โ rather than training from scratch:
- Source: JuliaSLM (5.04M params, d=256, 6 layers, 4-head MHA, val_loss=3.54 on curated data)
- Target: JuliaFluxGPT architecture (22.79M params, d=512, 8 layers, 8Q/2KV GQA)
- Projection:
- Embedding: zero-pad d=256 โ d=512 (noise-pad extra dims at 2% of embedding std)
- Q heads: each source head (4) duplicated to 2 target heads (8 total)
- KV heads: average pairs of source heads โ 2 target KV heads
- Output projection: split equally across duplicated head pairs
- FFN: zero-pad inner_dim 640 โ 1344
- Layers 0-5: transferred from source; Layers 6-7: randomly initialized
- Fine-tuning: 7000 steps on 266M curated philosophy tokens (cosine LR 3e-4 โ 1e-5)
Training Data
Curated classical philosophy corpus (BPE tokenized, vocab=2000). The same corpus used to train JuliaSLM, SymbioSLM, MonarchSLM, and SymbioGPT-10M.
Scaling Context
All models trained on the same curated philosophy corpus with BPE vocab=2000, ctx=256:
| Model | Params | Val Loss | Architecture |
|---|---|---|---|
| SymbioSLM | 4.07M | 3.620 | 3-organelle gated (Lux.jl) |
| MonarchSLM | 4.98M | 3.650 | Monarch Mixer (Lux.jl) |
| JuliaSLM | 5.04M | 3.540 | Standard MHA (Lux.jl) |
| SymbioGPT-10M | 11.05M | 3.563 | 4-organelle gated (PyTorch) |
| JuliaFluxGPT-fused | 22.79M | 3.698 | GQA fused from JuliaSLM (PyTorch) |
Files
| File | Description |
|---|---|
juliaflux_fused_best.pt |
Best checkpoint (wrapped: {"model_state_dict": ..., "config": ..., "step": ..., "val_loss": ...}) |
juliaflux_model.py |
Model architecture definition (PyTorch) |
vocab.json |
BPE tokenizer vocabulary (2000 tokens, GPT-2 format) |
merges.txt |
BPE merge rules |
Usage
import torch
from juliaflux_model import JuliaFluxConfig, JuliaFluxGPT
config = JuliaFluxConfig()
model = JuliaFluxGPT(config)
checkpoint = torch.load("juliaflux_fused_best.pt", map_location="cpu", weights_only=True)
state_dict = checkpoint["model_state_dict"]
model.load_state_dict(state_dict)
model.eval()
Inference server: An OpenAI-compatible API server with KV-cached generation is available at spaces/LisaMegaWatts/JuliaFluxGPT-fused.
Links
- Inference API: HF Space
- Source model: LisaMegaWatts/JuliaSLM
- Source code: DavinciDreams/SymbioGPT
- W&B: symbiogenesis project
License
MIT
Space using LisaMegaWatts/JuliaFluxGPT-fused 1
Evaluation results
- Val Loss on Curated Philosophy Corpusself-reported3.698
- Val Perplexity on Curated Philosophy Corpusself-reported40.400