Nexus1-124M-v1

Nexus1 is the first prototype in the Nexus LLM series. It is a 124M parameter decoder-only transformer pre-trained from scratch on high-quality educational web data.

Model Details

Architecture: GPT-2 style
Parameters: 124 Million
Context Window: 1024 tokens
Vocabulary Size: 50,257 (Custom BPE)
Training Stage: Base Pre-training (Complete)

Training Data

Pre-trained on the FineWeb-Edu (10B Sample) dataset, which focuses on high-quality, educational content from the web to ensure better reasoning capabilities in a small model.

Purpose

Nexus1 serves as the proof-of-concept for the Nexus Training Pipeline. It is intended to be used as a reference model for knowledge distillation into the larger Nexus2-1B model.

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("mnnobi/Nexus1-124M-v1")
tokenizer = AutoTokenizer.from_pretrained("mnnobi/Nexus1-124M-v1")

Downloads last month: 4

Safetensors

Model size

0.1B params

Tensor type

F32

mnnobi
/

Nexus1-124M-v1

Nexus1-124M-v1

Model Details

Training Data

Purpose

How to use

Dataset used to train mnnobi/Nexus1-124M-v1