You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Edge-2 Large — Preview Checkpoint

Status: Pre-release · evaluation snapshot. NOT for production use.

Model facts

Parameters: 1,191,282,688 (~1.19B)
Architecture: Transformer · RoPE + SwiGLU + GQA (Gemma/LLaMA-style)
- dim: 2048 · n_heads: 16 · n_kv_heads: 2 (GQA) · n_layers: 24 · max_seq_len: 1024
Tokenizer: BPE-32K (custom, trained from scratch · vocab_size 32,768)
Training framework: MLX (Apple Silicon native)
Training corpus: master_corpus_v3_32k (multi-source · 23.6 GB token file)
This checkpoint: step 112000 · val_loss 2.92757 · perplexity 18.68
Checkpoint timestamp: 2026-05-14T11:47:12 UTC

What this is

The lowest-validation-loss snapshot of an in-progress training run. Published as evidence of training methodology and reproducible artifacts. Subsequent steps continued past this checkpoint with degraded val_loss — this snapshot represents the current best.

What this is NOT

Not the final Edge-2 release
Not instruction-tuned (no SFT, no RLHF)
Not safety-evaluated for general deployment
Not benchmarked against published LLMs (in-corpus eval only)

Edge roadmap

This checkpoint is one step in an active foundational-model trajectory. The full Edge family:

Generation	Parameters	Status
Edge-1	812K	Shipped
Edge-2 Medium	85M	Trained
Edge-2 Large	1.2B	This preview · step 112k of 500k · in active training
Edge-3B	3B (depth-upcycled)	In development
Edge-3 Multimodal	TBD	Future
Edge-11B Instruct	11B	Direct-sales target

This preview represents step 112,000 of a planned 500,000 — the lowest-validation-loss checkpoint to date. It is not the final state of this run, and certainly not the final shape of Edge.

Active research areas

Architectural and training-pipeline decisions are under active refinement. The next checkpoint will reflect:

Scaled-curriculum corpus engineering — a 100B-token training curriculum with three-stage WSD scheduling, license-tracked across web, math, code, and diversity tiers (FineWeb-Edu, Nemotron-CC-Math, The Stack v2, Common Pile v0.1, and others).
Depth-upcycling architecture — Edge-3B is being built via middle-block stacking from this checkpoint, evaluating LiGO and MIDAS (NeurIPS 2024) approaches.
Alternative optimizer infrastructure — Muon (arxiv:2502.16982) under evaluation as a successor to the AdamW optimizer used here.

Do not treat this preview as representative of Edge's mature state. It is a transparent snapshot of a foundational-model project mid-trajectory.

Citation guidance

If you reference this checkpoint, please cite as:

Edge-2 Large preview · step 112k · val_loss 2.92757 · 2026-05-14 AXE Technology Canada — superseded by subsequent checkpoints; see HF repo for current state.

Reproducibility

Training run start: 2026-05-10T02:28:41 UTC
Checkpoint timestamp: 2026-05-14T11:47:12 UTC
Steps to this checkpoint: 112,000 of 500,000 planned
Hardware: Apple Silicon (Mac Studio M1 Max 64GB, JL1)
Framework: MLX · Python 3.14 · seed 42
Throughput: ~94.5 tokens/sec at this stage (large model · single-device)
Optimizer: AdamW · lr 8.959e-05 · weight_decay 0.1 · grad_clip 1.0

Training metrics at best checkpoint

Metric	Value
Step	112,000
val_loss	2.92757
train_loss	3.4838
perplexity	18.68
grad_norm	1.3713
learning_rate	8.959e-05

Files in this repo

File	Description
`model.safetensors`	Model weights (4.4 GB)
`tokenizer.json`	BPE-32K tokenizer
`state.json`	Training state + config at this checkpoint
`run_manifest.json`	Data provenance + environment metadata

Author

AXE Technology Canada · James Lewis · james@virul.co

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

1B params

Tensor type

F32

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for axetechnologies/edge-2-large-preview

Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12