PTBR-40M Base

PTBR-40M Base is a small Portuguese causal language model architecture (≈40M parameters) designed for experimentation and fast training on consumer GPUs such as a T4.

This repository contains the randomly initialized base model before training.

The model is intended for:

  • educational purposes
  • rapid LLM prototyping
  • small-scale Portuguese experiments
  • architecture research

Model Details

Architecture

The model uses a GPT-NeoX style transformer decoder architecture.

Key characteristics:

Property Value
Parameters ~40M
Layers 12
Hidden size 512
Attention heads 8
Context length 256 tokens
Positional encoding Rotary (RoPE)

Framework:

  • Transformers

Intended Use

This base model is not trained and therefore cannot generate meaningful language.

It is intended to be used as:

  • a starting point for pretraining
  • a fine-tuning base
  • a toy LLM architecture

Example use cases:

  • research experiments
  • educational demonstrations
  • low-resource language model training

Training Procedure

This model contains randomly initialized weights.

Typical training setup used with this architecture:

  • optimizer: AdamW
  • learning rate: 4e-4
  • context length: 256
  • batch size: 16–32 (depending on hardware)

The architecture is optimized to allow training on a single T4 GPU in small-scale experiments.


Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("username/ptbr-40m-base")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

inputs = tokenizer("Olá mundo", return_tensors="pt")

outputs = model(**inputs)
Downloads last month
10
Safetensors
Model size
89.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PatoFlamejanteTV/QuackPTBR40M

Finetunes
1 model