MicroBananaMind-v1

MicroBananaMind-v1 is a very small causal language model trained from scratch on FineWeb-Edu, FineMath, and Cosmopedia-v2.

The model has 902,272 parameters and uses a custom 1536-token byte-level BPE tokenizer with digit-aware tokenization It is our smallest model ever that is not just a TinyStories model.

Model Details

Field	Value
Parameters	902,272
Architecture	Custom Llama-style decoder
Layers	4
Hidden size	128
Intermediate size	352
Attention heads	4
KV heads	1
Vocabulary size	1,536
Context length	1,024
Embeddings	Tied input/output embeddings
Weight format	safetensors

Tokenizer

MicroBananaMind-v1 uses our digit-aware 1536-token tokenizer.

Training Data

Dataset	Tokens
FineWeb-Edu sample-10BT retokenized with 1536 digit tokenizer	16,799,039,898
FineMath retokenized with 1536 digit tokenizer	1,740,373,303
Cosmopedia-v2 retokenized with 1536 digit tokenizer	3,458,958,651

Training setup:

Field	Value
Sequence length	1,024
FineWeb sampling ratio	70%
FineMath sampling ratio	10%
Cosmopedia sampling ratio	20%
Batch size	128
Gradient accumulation	8
Tokens per optimizer step	1,048,576
Training steps	20,963
Approx training tokens seen	21,981,298,688
Learning rate	8e-4
Minimum learning rate	8e-5
Warmup steps	500
Weight decay	0.1
Seed	1337

We recommend using a temperature of 0 or 0.1

Usage

This model uses custom architecture code, so load it with trust_remote_code=True.

pip install -U transformers safetensors torch

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "BananaMind/MicroBananaMind-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
).cuda().eval()

prompt = "The color of the sky is "
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=64,
        do_sample=False,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

License

Apache 2.0

Downloads last month: 15

Safetensors

Model size

902k params

Tensor type

F32

BananaMind
/

MicroBananaMind-v1