Organization Profile

AutonomousX

Open Source Research for Building Large Language Models from Scratch and Finetuning on TPUs & GPUs

Rohit Yadav | B.Tech 3rd Year

Dr. B.R. Ambedkar National Institute of Technology (NIT) Jalandhar, India

📧 yrohit1825@gmail.com 🔗 LinkedIn 💻 GitHub

Mission

AutonomousX aims to make LLM training infrastructure accessible and reproducible for researchers, students, and developers. While modern language models are widely used, complete end-to-end guides for training LLMs from scratch on TPUs remain scarce, particularly for beginners working with JAX and distributed TPU training. AutonomousX focuses on filling this gap by publishing fully reproducible open-source pipelines that demonstrate how to train language models from scratch using limited compute resources.

Compute supporting the development of this organization and its models was provided by Google's TRC Program (TPU Research Cloud).

Research Focus

The organization explores multiple aspects of efficient LLM training on TPUs, including:

Custom transformer architectures
Variants with and without RoPE (Rotary Positional Embeddings)
Memory-efficient training techniques
Custom optimizer experiments
Training pipeline optimization using JAX + pmap
Efficient dataset streaming and preprocessing

The goal is to demonstrate how meaningful LLM research can be conducted even with compute-limited environments.

Instinct Model Family

AutonomousX develops the Instinct family of language models. These models are built entirely from scratch, including tokenizer, architecture, training pipeline, and TPU training infrastructure. Instinct models explore different configurations such as:

Transformer architectures with and without RoPE
Custom training optimizers
TPU-optimized training pipelines using JAX + pmap
Memory-efficient training for limited hardware environments

The models are designed to demonstrate how modern language models can be trained on small TPU pods such as TPU v4-8.

Compute Strategy

One of the core goals of AutonomousX is to explore efficient training on limited compute resources. Research focuses on training models:

Up to ~1.5B parameters
On small TPU v4-8 pods
Across hundreds of billions of tokens

By optimizing training pipelines and architecture design, AutonomousX investigates how far efficient training can scale without access to massive GPU clusters.

Open Source Philosophy

AutonomousX publishes complete reproducible implementations including Dataset pipelines, Tokenizer training, Model architectures, TPU training scripts, Checkpointing systems, and Inference pipelines. All repositories aim to provide transparent and educational implementations so the open-source community can learn how large language models are trained from the ground up.

Why This Matters

Many tutorials focus only on using pretrained models, but very few resources explain:

How to train LLMs from scratch
How to run training pipelines on TPUs
How distributed JAX training works
How datasets like The PILE are processed at scale

AutonomousX aims to make these processes accessible, understandable, and reproducible.