How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="shazzadulimun/aurora",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Aurora LLMs β€” Benchmark Catalog

Every Aurora-tuned LLM trained in this project, kept together for systematic benchmarking across model size (270M β†’ 8B, with 12B / 20B / 27B / 70B / 120B slots reserved), training recipe (chat / coder / ML / ops), and training data (multi-rank vs. single-rank distillation).

For everyday use, prefer the standalone repos at shazzadulimun:

Pick Why
llama31-8b-aurora-chat-v3-gguf Best (eval 2.80/5, +59% over base). 16 GB.
llama32-3b-aurora-chat-v3 Mid-size for laptop GPU. 6 GB.
gemma3-270m-aurora-ml-v3-gguf Smallest. 518 MB. Runs anywhere.

Layout

aurora/
β”œβ”€β”€ llama31-8b-aurora-chat-v3/      ← best chat (eval 2.80/5)
β”œβ”€β”€ llama31-8b-aurora-chat-v2/      ← size-sweep recipe (eval pending)
β”œβ”€β”€ llama31-8b-aurora-chat-v1/      ← single-rank distillation ablation (2.45)
β”œβ”€β”€ llama31-8b-aurora-coder-v3/     ← SYCL / OpenMP / oneAPI / CMake specialist
β”œβ”€β”€ llama31-8b-aurora-ml-v3/        ← PyTorch-XPU / IPEX / vLLM specialist
β”œβ”€β”€ llama31-8b-aurora-ops-v3/       ← PBS / mpiexec / DAOS / Lustre specialist
β”œβ”€β”€ llama32-3b-aurora-chat-v3/      ← 3B chat
β”œβ”€β”€ llama32-1b-aurora-chat-v3/      ← 1B chat
β”œβ”€β”€ gemma3-1b-aurora-coder-v3/
β”œβ”€β”€ gemma3-1b-aurora-ml-v3/
β”œβ”€β”€ gemma3-270m-aurora-coder-v3/
└── gemma3-270m-aurora-ml-v3/

Each subfolder contains either a single GGUF (*.gguf) or the full Transformers shape (config.json, model.safetensors, tokenizer.json, etc.) β€” depending on how the model was published.

Models β€” index

Subfolder Base Format Train loss Holdout (53-Q, 0–5)
llama31-8b-aurora-chat-v3/ meta-llama/Llama-3.1-8B-Instruct GGUF f16 0.6224 2.80
llama31-8b-aurora-chat-v2/ meta-llama/Llama-3.1-8B-Instruct merged 16-bit 0.64 pending
llama31-8b-aurora-chat-v1/ meta-llama/Llama-3.1-8B-Instruct GGUF f16 0.6338 2.45
llama31-8b-aurora-coder-v3/ meta-llama/Llama-3.1-8B-Instruct GGUF f16 0.6851 1.97
llama31-8b-aurora-ml-v3/ meta-llama/Llama-3.1-8B-Instruct GGUF f16 0.6630 2.13
llama31-8b-aurora-ops-v3/ meta-llama/Llama-3.1-8B-Instruct GGUF f16 0.6523 2.31
llama32-3b-aurora-chat-v3/ meta-llama/Llama-3.2-3B-Instruct merged 16-bit 0.72 pending
llama32-1b-aurora-chat-v3/ meta-llama/Llama-3.2-1B-Instruct merged 16-bit 0.84 pending
gemma3-1b-aurora-coder-v3/ unsloth/gemma-3-1b-it GGUF f16 1.0268 pending
gemma3-1b-aurora-ml-v3/ unsloth/gemma-3-1b-it GGUF f16 0.9609 pending
gemma3-270m-aurora-coder-v3/ unsloth/gemma-3-270m-it GGUF f16 1.3203 β€”
gemma3-270m-aurora-ml-v3/ unsloth/gemma-3-270m-it GGUF f16 1.2462 β€”

Download

# Whole catalog (~100 GB)
hf download shazzadulimun/aurora --local-dir ./aurora-catalog

# Just one model
hf download shazzadulimun/aurora --include "llama31-8b-aurora-chat-v3/*" --local-dir ./aurora-catalog

How to use a model from this catalog

There are two formats in the catalog and they load differently.

A. Merged 16-bit checkpoints (most subfolders)

Drop-in HuggingFace Transformers β€” no extra steps:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo, sub = "shazzadulimun/aurora", "llama32-3b-aurora-chat-v3"
tok = AutoTokenizer.from_pretrained(repo, subfolder=sub)
mdl = AutoModelForCausalLM.from_pretrained(
    repo, subfolder=sub, torch_dtype=torch.bfloat16, device_map="auto"
)

Or grab single-file GGUFs from the standalone repos (shazzadulimun/<name>-gguf) for llama.cpp / Ollama / LM Studio.

B. LoRA-only entries (70B and 120B β€” for now)

llama31-70b-aurora-chat-v3/ and gpt-oss-120b-aurora-chat-v3/ contain only the LoRA adapter (the mega-train job ran out of walltime before it could write the full merged checkpoint). Until the merged + GGUF versions land here, load the base model + adapter with PEFT β€” works on any laptop / GPU box that fits the base:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# 70B
base = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-70B-Instruct",
    torch_dtype=torch.bfloat16, device_map="auto",
)
m = PeftModel.from_pretrained(
    base, "shazzadulimun/aurora", subfolder="llama31-70b-aurora-chat-v3"
)
tok = AutoTokenizer.from_pretrained("shazzadulimun/aurora", subfolder="llama31-70b-aurora-chat-v3")

# 120B (gpt-oss MoE)
base = AutoModelForCausalLM.from_pretrained(
    "openai/gpt-oss-120b",
    torch_dtype=torch.bfloat16, device_map="auto",
)
m = PeftModel.from_pretrained(
    base, "shazzadulimun/aurora", subfolder="gpt-oss-120b-aurora-chat-v3"
)
tok = AutoTokenizer.from_pretrained("shazzadulimun/aurora", subfolder="gpt-oss-120b-aurora-chat-v3")

You can also m = m.merge_and_unload() and m.save_pretrained("./70b-merged") to get a self-contained merged copy locally.

Training data + recipe

All models are LoRA fine-tunes (r=32, Ξ±=64, lr 2e-4 cosine, bf16, 2 epochs) of their respective base, trained on synthetic ChatML data distilled from gpt-oss-120b (ALCF Sophia) over docs.alcf.anl.gov/aurora. Three dataset variants are used across the catalog (multi-rank, single-rank ablation, topic- filtered specialists). Full provenance: SIslamMun/Generator @ aurora-datasets-2026-04-30.

License

Apache-2.0 β€” weights, training data (gpt-oss-120b synthetic), and source corpus (public ALCF docs). Each base model retains its own license; check before redistributing merged checkpoints.

Downloads last month
203
GGUF
Model size
1.0B params
Architecture
gemma3
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support