MicroBananaMind-v1

Banner MicroBananaMind-v1 is a very small causal language model trained from scratch on FineWeb-Edu, FineMath, and Cosmopedia-v2.

The model has 902,272 parameters and uses a custom 1536-token byte-level BPE tokenizer with digit-aware tokenization It is our smallest model ever that is not just a TinyStories model.

Model Details

Field Value
Parameters 902,272
Architecture Custom Llama-style decoder
Layers 4
Hidden size 128
Intermediate size 352
Attention heads 4
KV heads 1
Vocabulary size 1,536
Context length 1,024
Embeddings Tied input/output embeddings
Weight format safetensors

Tokenizer

MicroBananaMind-v1 uses our digit-aware 1536-token tokenizer.

Training Data

Dataset Tokens
FineWeb-Edu sample-10BT retokenized with 1536 digit tokenizer 16,799,039,898
FineMath retokenized with 1536 digit tokenizer 1,740,373,303
Cosmopedia-v2 retokenized with 1536 digit tokenizer 3,458,958,651

Training setup:

Field Value
Sequence length 1,024
FineWeb sampling ratio 70%
FineMath sampling ratio 10%
Cosmopedia sampling ratio 20%
Batch size 128
Gradient accumulation 8
Tokens per optimizer step 1,048,576
Training steps 20,963
Approx training tokens seen 21,981,298,688
Learning rate 8e-4
Minimum learning rate 8e-5
Warmup steps 500
Weight decay 0.1
Seed 1337

We recommend using a temperature of 0 or 0.1

Usage

This model uses custom architecture code, so load it with trust_remote_code=True.

pip install -U transformers safetensors torch
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "BananaMind/MicroBananaMind-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
).cuda().eval()

prompt = "The color of the sky is "
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=64,
        do_sample=False,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

License

Apache 2.0

Downloads last month
15
Safetensors
Model size
902k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train BananaMind/MicroBananaMind-v1

Space using BananaMind/MicroBananaMind-v1 1