MLX
Safetensors
edge
axe
preview
foundation-model

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Edge-2 Large — Preview Checkpoint

Status: Pre-release · evaluation snapshot. NOT for production use.

Model facts

  • Parameters: 1,191,282,688 (~1.19B)
  • Architecture: Transformer · RoPE + SwiGLU + GQA (Gemma/LLaMA-style)
    • dim: 2048 · n_heads: 16 · n_kv_heads: 2 (GQA) · n_layers: 24 · max_seq_len: 1024
  • Tokenizer: BPE-32K (custom, trained from scratch · vocab_size 32,768)
  • Training framework: MLX (Apple Silicon native)
  • Training corpus: master_corpus_v3_32k (multi-source · 23.6 GB token file)
  • This checkpoint: step 112000 · val_loss 2.92757 · perplexity 18.68
  • Checkpoint timestamp: 2026-05-14T11:47:12 UTC

What this is

The lowest-validation-loss snapshot of an in-progress training run. Published as evidence of training methodology and reproducible artifacts. Subsequent steps continued past this checkpoint with degraded val_loss — this snapshot represents the current best.

What this is NOT

  • Not the final Edge-2 release
  • Not instruction-tuned (no SFT, no RLHF)
  • Not safety-evaluated for general deployment
  • Not benchmarked against published LLMs (in-corpus eval only)

Edge roadmap

This checkpoint is one step in an active foundational-model trajectory. The full Edge family:

Generation Parameters Status
Edge-1 812K Shipped
Edge-2 Medium 85M Trained
Edge-2 Large 1.2B This preview · step 112k of 500k · in active training
Edge-3B 3B (depth-upcycled) In development
Edge-3 Multimodal TBD Future
Edge-11B Instruct 11B Direct-sales target

This preview represents step 112,000 of a planned 500,000 — the lowest-validation-loss checkpoint to date. It is not the final state of this run, and certainly not the final shape of Edge.

Active research areas

Architectural and training-pipeline decisions are under active refinement. The next checkpoint will reflect:

  • Scaled-curriculum corpus engineering — a 100B-token training curriculum with three-stage WSD scheduling, license-tracked across web, math, code, and diversity tiers (FineWeb-Edu, Nemotron-CC-Math, The Stack v2, Common Pile v0.1, and others).
  • Depth-upcycling architecture — Edge-3B is being built via middle-block stacking from this checkpoint, evaluating LiGO and MIDAS (NeurIPS 2024) approaches.
  • Alternative optimizer infrastructure — Muon (arxiv:2502.16982) under evaluation as a successor to the AdamW optimizer used here.

Do not treat this preview as representative of Edge's mature state. It is a transparent snapshot of a foundational-model project mid-trajectory.

Citation guidance

If you reference this checkpoint, please cite as:

Edge-2 Large preview · step 112k · val_loss 2.92757 · 2026-05-14 AXE Technology Canada — superseded by subsequent checkpoints; see HF repo for current state.

Reproducibility

  • Training run start: 2026-05-10T02:28:41 UTC
  • Checkpoint timestamp: 2026-05-14T11:47:12 UTC
  • Steps to this checkpoint: 112,000 of 500,000 planned
  • Hardware: Apple Silicon (Mac Studio M1 Max 64GB, JL1)
  • Framework: MLX · Python 3.14 · seed 42
  • Throughput: ~94.5 tokens/sec at this stage (large model · single-device)
  • Optimizer: AdamW · lr 8.959e-05 · weight_decay 0.1 · grad_clip 1.0

Training metrics at best checkpoint

Metric Value
Step 112,000
val_loss 2.92757
train_loss 3.4838
perplexity 18.68
grad_norm 1.3713
learning_rate 8.959e-05

Files in this repo

File Description
model.safetensors Model weights (4.4 GB)
tokenizer.json BPE-32K tokenizer
state.json Training state + config at this checkpoint
run_manifest.json Data provenance + environment metadata

Author

AXE Technology Canada · James Lewis · james@virul.co

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
1B params
Tensor type
F32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for axetechnologies/edge-2-large-preview