PTBR-40M Base

PTBR-40M Base is a small Portuguese causal language model architecture (≈40M parameters) designed for experimentation and fast training on consumer GPUs such as a T4.

This repository contains the randomly initialized base model before training.

The model is intended for:

educational purposes
rapid LLM prototyping
small-scale Portuguese experiments
architecture research

Model Details

Architecture

The model uses a GPT-NeoX style transformer decoder architecture.

Key characteristics:

Property	Value
Parameters	~40M
Layers	12
Hidden size	512
Attention heads	8
Context length	256 tokens
Positional encoding	Rotary (RoPE)

Framework:

Transformers

Intended Use

This base model is not trained and therefore cannot generate meaningful language.

It is intended to be used as:

a starting point for pretraining
a fine-tuning base
a toy LLM architecture

Example use cases:

research experiments
educational demonstrations
low-resource language model training

Training Procedure

This model contains randomly initialized weights.

Typical training setup used with this architecture:

optimizer: AdamW
learning rate: 4e-4
context length: 256
batch size: 16–32 (depending on hardware)

The architecture is optimized to allow training on a single T4 GPU in small-scale experiments.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("username/ptbr-40m-base")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

inputs = tokenizer("Olá mundo", return_tensors="pt")

outputs = model(**inputs)

Downloads last month: 10

Safetensors

Model size

89.3M params

Tensor type

F32

Model tree for PatoFlamejanteTV/QuackPTBR40M

Finetunes

1 model