Instructions to use axetechnologies/edge-2-large-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use axetechnologies/edge-2-large-preview with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir edge-2-large-preview axetechnologies/edge-2-large-preview
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Edge-2 Large — Preview Checkpoint
Status: Pre-release · evaluation snapshot. NOT for production use.
Model facts
- Parameters: 1,191,282,688 (~1.19B)
- Architecture: Transformer · RoPE + SwiGLU + GQA (Gemma/LLaMA-style)
- dim: 2048 · n_heads: 16 · n_kv_heads: 2 (GQA) · n_layers: 24 · max_seq_len: 1024
- Tokenizer: BPE-32K (custom, trained from scratch · vocab_size 32,768)
- Training framework: MLX (Apple Silicon native)
- Training corpus: master_corpus_v3_32k (multi-source · 23.6 GB token file)
- This checkpoint: step 112000 · val_loss 2.92757 · perplexity 18.68
- Checkpoint timestamp: 2026-05-14T11:47:12 UTC
What this is
The lowest-validation-loss snapshot of an in-progress training run. Published as evidence of training methodology and reproducible artifacts. Subsequent steps continued past this checkpoint with degraded val_loss — this snapshot represents the current best.
What this is NOT
- Not the final Edge-2 release
- Not instruction-tuned (no SFT, no RLHF)
- Not safety-evaluated for general deployment
- Not benchmarked against published LLMs (in-corpus eval only)
Edge roadmap
This checkpoint is one step in an active foundational-model trajectory. The full Edge family:
| Generation | Parameters | Status |
|---|---|---|
| Edge-1 | 812K | Shipped |
| Edge-2 Medium | 85M | Trained |
| Edge-2 Large | 1.2B | This preview · step 112k of 500k · in active training |
| Edge-3B | 3B (depth-upcycled) | In development |
| Edge-3 Multimodal | TBD | Future |
| Edge-11B Instruct | 11B | Direct-sales target |
This preview represents step 112,000 of a planned 500,000 — the lowest-validation-loss checkpoint to date. It is not the final state of this run, and certainly not the final shape of Edge.
Active research areas
Architectural and training-pipeline decisions are under active refinement. The next checkpoint will reflect:
- Scaled-curriculum corpus engineering — a 100B-token training curriculum with three-stage WSD scheduling, license-tracked across web, math, code, and diversity tiers (FineWeb-Edu, Nemotron-CC-Math, The Stack v2, Common Pile v0.1, and others).
- Depth-upcycling architecture — Edge-3B is being built via middle-block stacking from this checkpoint, evaluating LiGO and MIDAS (NeurIPS 2024) approaches.
- Alternative optimizer infrastructure — Muon (arxiv:2502.16982) under evaluation as a successor to the AdamW optimizer used here.
Do not treat this preview as representative of Edge's mature state. It is a transparent snapshot of a foundational-model project mid-trajectory.
Citation guidance
If you reference this checkpoint, please cite as:
Edge-2 Large preview · step 112k · val_loss 2.92757 · 2026-05-14 AXE Technology Canada — superseded by subsequent checkpoints; see HF repo for current state.
Reproducibility
- Training run start: 2026-05-10T02:28:41 UTC
- Checkpoint timestamp: 2026-05-14T11:47:12 UTC
- Steps to this checkpoint: 112,000 of 500,000 planned
- Hardware: Apple Silicon (Mac Studio M1 Max 64GB, JL1)
- Framework: MLX · Python 3.14 · seed 42
- Throughput: ~94.5 tokens/sec at this stage (large model · single-device)
- Optimizer: AdamW · lr 8.959e-05 · weight_decay 0.1 · grad_clip 1.0
Training metrics at best checkpoint
| Metric | Value |
|---|---|
| Step | 112,000 |
| val_loss | 2.92757 |
| train_loss | 3.4838 |
| perplexity | 18.68 |
| grad_norm | 1.3713 |
| learning_rate | 8.959e-05 |
Files in this repo
| File | Description |
|---|---|
model.safetensors |
Model weights (4.4 GB) |
tokenizer.json |
BPE-32K tokenizer |
state.json |
Training state + config at this checkpoint |
run_manifest.json |
Data provenance + environment metadata |
Author
AXE Technology Canada · James Lewis · james@virul.co
Quantized