Instructions to use shazzadulimun/aurora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use shazzadulimun/aurora with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="shazzadulimun/aurora",
	filename="gemma3-1b-aurora-coder-v3/gemma1b_C1-f16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use shazzadulimun/aurora with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf shazzadulimun/aurora:F16
# Run inference directly in the terminal:
llama cli -hf shazzadulimun/aurora:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf shazzadulimun/aurora:F16
# Run inference directly in the terminal:
llama cli -hf shazzadulimun/aurora:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf shazzadulimun/aurora:F16
# Run inference directly in the terminal:
./llama-cli -hf shazzadulimun/aurora:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf shazzadulimun/aurora:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf shazzadulimun/aurora:F16

Use Docker

docker model run hf.co/shazzadulimun/aurora:F16

LM Studio
Jan
Ollama
How to use shazzadulimun/aurora with Ollama:
```
ollama run hf.co/shazzadulimun/aurora:F16
```

Unsloth Studio

How to use shazzadulimun/aurora with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shazzadulimun/aurora to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shazzadulimun/aurora to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for shazzadulimun/aurora to start chatting

Atomic Chat new
Docker Model Runner
How to use shazzadulimun/aurora with Docker Model Runner:
```
docker model run hf.co/shazzadulimun/aurora:F16
```

Lemonade

How to use shazzadulimun/aurora with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull shazzadulimun/aurora:F16

Run and chat with the model

lemonade run user.aurora-F16

List all available models

lemonade list

Aurora LLMs — Benchmark Catalog

Every Aurora-tuned LLM trained in this project, kept together for systematic benchmarking across model size (270M → 8B, with 12B / 20B / 27B / 70B / 120B slots reserved), training recipe (chat / coder / ML / ops), and training data (multi-rank vs. single-rank distillation).

For everyday use, prefer the standalone repos at shazzadulimun:

Pick	Why
`llama31-8b-aurora-chat-v3-gguf`	Best (eval 2.80/5, +59% over base). 16 GB.
`llama32-3b-aurora-chat-v3`	Mid-size for laptop GPU. 6 GB.
`gemma3-270m-aurora-ml-v3-gguf`	Smallest. 518 MB. Runs anywhere.

Layout

aurora/
├── llama31-8b-aurora-chat-v3/      ← best chat (eval 2.80/5)
├── llama31-8b-aurora-chat-v2/      ← size-sweep recipe (eval pending)
├── llama31-8b-aurora-chat-v1/      ← single-rank distillation ablation (2.45)
├── llama31-8b-aurora-coder-v3/     ← SYCL / OpenMP / oneAPI / CMake specialist
├── llama31-8b-aurora-ml-v3/        ← PyTorch-XPU / IPEX / vLLM specialist
├── llama31-8b-aurora-ops-v3/       ← PBS / mpiexec / DAOS / Lustre specialist
├── llama32-3b-aurora-chat-v3/      ← 3B chat
├── llama32-1b-aurora-chat-v3/      ← 1B chat
├── gemma3-1b-aurora-coder-v3/
├── gemma3-1b-aurora-ml-v3/
├── gemma3-270m-aurora-coder-v3/
└── gemma3-270m-aurora-ml-v3/

Each subfolder contains either a single GGUF (*.gguf) or the full Transformers shape (config.json, model.safetensors, tokenizer.json, etc.) — depending on how the model was published.

Models — index

Subfolder	Base	Format	Train loss	Holdout (53-Q, 0–5)
`llama31-8b-aurora-chat-v3/`	meta-llama/Llama-3.1-8B-Instruct	GGUF f16	0.6224	2.80
`llama31-8b-aurora-chat-v2/`	meta-llama/Llama-3.1-8B-Instruct	merged 16-bit	0.64	pending
`llama31-8b-aurora-chat-v1/`	meta-llama/Llama-3.1-8B-Instruct	GGUF f16	0.6338	2.45
`llama31-8b-aurora-coder-v3/`	meta-llama/Llama-3.1-8B-Instruct	GGUF f16	0.6851	1.97
`llama31-8b-aurora-ml-v3/`	meta-llama/Llama-3.1-8B-Instruct	GGUF f16	0.6630	2.13
`llama31-8b-aurora-ops-v3/`	meta-llama/Llama-3.1-8B-Instruct	GGUF f16	0.6523	2.31
`llama32-3b-aurora-chat-v3/`	meta-llama/Llama-3.2-3B-Instruct	merged 16-bit	0.72	pending
`llama32-1b-aurora-chat-v3/`	meta-llama/Llama-3.2-1B-Instruct	merged 16-bit	0.84	pending
`gemma3-1b-aurora-coder-v3/`	unsloth/gemma-3-1b-it	GGUF f16	1.0268	pending
`gemma3-1b-aurora-ml-v3/`	unsloth/gemma-3-1b-it	GGUF f16	0.9609	pending
`gemma3-270m-aurora-coder-v3/`	unsloth/gemma-3-270m-it	GGUF f16	1.3203	—
`gemma3-270m-aurora-ml-v3/`	unsloth/gemma-3-270m-it	GGUF f16	1.2462	—

Download

# Whole catalog (~100 GB)
hf download shazzadulimun/aurora --local-dir ./aurora-catalog

# Just one model
hf download shazzadulimun/aurora --include "llama31-8b-aurora-chat-v3/*" --local-dir ./aurora-catalog

How to use a model from this catalog

There are two formats in the catalog and they load differently.

A. Merged 16-bit checkpoints (most subfolders)

Drop-in HuggingFace Transformers — no extra steps:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

repo, sub = "shazzadulimun/aurora", "llama32-3b-aurora-chat-v3"
tok = AutoTokenizer.from_pretrained(repo, subfolder=sub)
mdl = AutoModelForCausalLM.from_pretrained(
    repo, subfolder=sub, torch_dtype=torch.bfloat16, device_map="auto"
)

Or grab single-file GGUFs from the standalone repos (shazzadulimun/<name>-gguf) for llama.cpp / Ollama / LM Studio.

B. LoRA-only entries (70B and 120B — for now)

llama31-70b-aurora-chat-v3/ and gpt-oss-120b-aurora-chat-v3/ contain only the LoRA adapter (the mega-train job ran out of walltime before it could write the full merged checkpoint). Until the merged + GGUF versions land here, load the base model + adapter with PEFT — works on any laptop / GPU box that fits the base:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# 70B
base = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-70B-Instruct",
    torch_dtype=torch.bfloat16, device_map="auto",
)
m = PeftModel.from_pretrained(
    base, "shazzadulimun/aurora", subfolder="llama31-70b-aurora-chat-v3"
)
tok = AutoTokenizer.from_pretrained("shazzadulimun/aurora", subfolder="llama31-70b-aurora-chat-v3")

# 120B (gpt-oss MoE)
base = AutoModelForCausalLM.from_pretrained(
    "openai/gpt-oss-120b",
    torch_dtype=torch.bfloat16, device_map="auto",
)
m = PeftModel.from_pretrained(
    base, "shazzadulimun/aurora", subfolder="gpt-oss-120b-aurora-chat-v3"
)
tok = AutoTokenizer.from_pretrained("shazzadulimun/aurora", subfolder="gpt-oss-120b-aurora-chat-v3")

You can also m = m.merge_and_unload() and m.save_pretrained("./70b-merged") to get a self-contained merged copy locally.

Training data + recipe

All models are LoRA fine-tunes (r=32, α=64, lr 2e-4 cosine, bf16, 2 epochs) of their respective base, trained on synthetic ChatML data distilled from gpt-oss-120b (ALCF Sophia) over docs.alcf.anl.gov/aurora. Three dataset variants are used across the catalog (multi-rank, single-rank ablation, topic- filtered specialists). Full provenance: SIslamMun/Generator @ aurora-datasets-2026-04-30.

License

Apache-2.0 — weights, training data (gpt-oss-120b synthetic), and source corpus (public ALCF docs). Each base model retains its own license; check before redistributing merged checkpoints.

Downloads last month: 7

GGUF

Model size

1.0B params

Architecture

gemma3

Hardware compatibility

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support