PTBR-40M Base
PTBR-40M Base is a small Portuguese causal language model architecture (≈40M parameters) designed for experimentation and fast training on consumer GPUs such as a T4.
This repository contains the randomly initialized base model before training.
The model is intended for:
- educational purposes
- rapid LLM prototyping
- small-scale Portuguese experiments
- architecture research
Model Details
Architecture
The model uses a GPT-NeoX style transformer decoder architecture.
Key characteristics:
| Property | Value |
|---|---|
| Parameters | ~40M |
| Layers | 12 |
| Hidden size | 512 |
| Attention heads | 8 |
| Context length | 256 tokens |
| Positional encoding | Rotary (RoPE) |
Framework:
- Transformers
Intended Use
This base model is not trained and therefore cannot generate meaningful language.
It is intended to be used as:
- a starting point for pretraining
- a fine-tuning base
- a toy LLM architecture
Example use cases:
- research experiments
- educational demonstrations
- low-resource language model training
Training Procedure
This model contains randomly initialized weights.
Typical training setup used with this architecture:
- optimizer: AdamW
- learning rate: 4e-4
- context length: 256
- batch size: 16–32 (depending on hardware)
The architecture is optimized to allow training on a single T4 GPU in small-scale experiments.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("username/ptbr-40m-base")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer("Olá mundo", return_tensors="pt")
outputs = model(**inputs)
- Downloads last month
- 10