PebbleLM-117M

A 117.5M parameter language model trained from scratch. Small but solid - designed for edge deployment and educational purposes.

Model Description

PebbleLM-117M is a decoder-only transformer trained on a diverse corpus of text. Despite its small size, it demonstrates basic language understanding and generation capabilities.

Property	Value
Parameters	117.5M
Architecture	Decoder-only Transformer
Layers	8
Hidden Size	1024
Attention Heads	16
Context Length	1024 tokens
Vocabulary	16,000 BPE tokens
Position Encoding	RoPE
Normalization	RMSNorm
Activation	GELU

Training Data

Pretrained on 1.17M samples from diverse sources:

Dataset	Samples	Description	Link
Wikipedia	488,906	Encyclopedic knowledge	wikipedia
OpenWebText	500,000	Diverse web content	openwebtext
TinyStories	188,067	Simple narrative structure	roneneldan/TinyStories
Total	1,176,973

Training Details

Epochs: 3
Batch Size: 48
Gradient Accumulation: 2
Effective Batch Size: 96
Learning Rate: 3e-4
Warmup Ratio: 0.1
Precision: FP16
Hardware: NVIDIA A100 80GB
Training Time: ~4.5 hours

Benchmark Results

Evaluated on 500 samples per benchmark:

Benchmark	Accuracy	Random Baseline	Above Random
HellaSwag	32.20%	25%	+7.2%
ARC-Easy	35.80%	25%	+10.8%
WinoGrande	52.80%	50%	+2.8%
PIQA	58.20%	50%	+8.2%
Average	44.75%	-	-

Usage

Installation

pip install torch tokenizers

Download Model

from huggingface_hub import hf_hub_download

# Download model files
model_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M", filename="model.pt")
tokenizer_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M", filename="tokenizer.json")

Load and Generate

import torch
import json
from tokenizers import Tokenizer

# Load tokenizer
tokenizer = Tokenizer.from_file(tokenizer_path)

# Model architecture is included in this repo
from src.model.transformer import SLMForCausalLM
from src.model.config import SLMConfig

config = SLMConfig(vocab_size=16384)
model = SLMForCausalLM(config)

state_dict = torch.load(model_path, map_location="cpu")
if "model_state_dict" in state_dict:
    state_dict = state_dict["model_state_dict"]
model.load_state_dict(state_dict)
model.eval()

# Generate text
prompt = "The quick brown fox"
input_ids = torch.tensor([tokenizer.encode(prompt).ids])

with torch.no_grad():
    for _ in range(50):
        logits = model(input_ids).logits[:, -1, :]
        next_token = torch.argmax(logits, dim=-1, keepdim=True)
        input_ids = torch.cat([input_ids, next_token], dim=-1)

output = tokenizer.decode(input_ids[0].tolist())
print(output)

For Chat/Q&A Use

This base model is for language modeling only. For conversational Q&A, use the finetuned version: PebbleLM-117M-Chat

Intended Use

Appropriate for:

Edge deployment experiments
Educational purposes (learning transformer architecture)
Research on small language models
Baseline comparisons

Not recommended for:

Production applications
Factual question answering
Complex reasoning tasks

Limitations

This is a 117M parameter model - one of the smallest functional language models:

Limited knowledge capacity - Cannot reliably store extensive world knowledge
Weak reasoning - Not enough parameters for complex logical relationships
Inconsistent outputs - May produce repetitive or off-topic responses
English only - Trained exclusively on English text

For production-quality results, consider models with 1B+ parameters.

Model Files

File	Description
`model.pt`	PyTorch model weights
`config.json`	Model configuration
`tokenizer.json`	BPE tokenizer
`tokenizer_config.json`	Tokenizer configuration

Citation

@misc{pebblellm2026,
  author = {Sakthivel},
  title = {PebbleLM-117M: A Small Language Model for Edge Deployment},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/nameissakthi/PebbleLM-117M}}
}

Acknowledgments

Training Data

Wikipedia - Wikimedia Foundation
OpenWebText - Aaron Gokaslan and Vanya Cohen
TinyStories - Ronen Eldan and Yuanzhi Li

Infrastructure

Google Cloud Platform (A100 GPU)
Weights & Biases (experiment tracking)

Frameworks

PyTorch
Hugging Face Tokenizers

License

MIT License

Downloads last month: 5

Model tree for nameissakthi/PebbleLM-117M

Finetunes

1 model

nameissakthi
/

PebbleLM-117M

PebbleLM-117M

Model Description

Training Data

Training Details

Benchmark Results

Usage

Installation

Download Model

Load and Generate

For Chat/Q&A Use

Intended Use

Limitations

Model Files

Citation

Acknowledgments

Training Data

Infrastructure

Frameworks

License

Model tree for nameissakthi/PebbleLM-117M

Datasets used to train nameissakthi/PebbleLM-117M