PebbleLM-117M

A 117.5M parameter language model trained from scratch. Small but solid - designed for edge deployment and educational purposes.

Model Description

PebbleLM-117M is a decoder-only transformer trained on a diverse corpus of text. Despite its small size, it demonstrates basic language understanding and generation capabilities.

Property Value
Parameters 117.5M
Architecture Decoder-only Transformer
Layers 8
Hidden Size 1024
Attention Heads 16
Context Length 1024 tokens
Vocabulary 16,000 BPE tokens
Position Encoding RoPE
Normalization RMSNorm
Activation GELU

Training Data

Pretrained on 1.17M samples from diverse sources:

Dataset Samples Description Link
Wikipedia 488,906 Encyclopedic knowledge wikipedia
OpenWebText 500,000 Diverse web content openwebtext
TinyStories 188,067 Simple narrative structure roneneldan/TinyStories
Total 1,176,973

Training Details

Epochs: 3
Batch Size: 48
Gradient Accumulation: 2
Effective Batch Size: 96
Learning Rate: 3e-4
Warmup Ratio: 0.1
Precision: FP16
Hardware: NVIDIA A100 80GB
Training Time: ~4.5 hours

Benchmark Results

Evaluated on 500 samples per benchmark:

Benchmark Accuracy Random Baseline Above Random
HellaSwag 32.20% 25% +7.2%
ARC-Easy 35.80% 25% +10.8%
WinoGrande 52.80% 50% +2.8%
PIQA 58.20% 50% +8.2%
Average 44.75% - -

Usage

Installation

pip install torch tokenizers

Download Model

from huggingface_hub import hf_hub_download

# Download model files
model_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M", filename="model.pt")
tokenizer_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M", filename="tokenizer.json")

Load and Generate

import torch
import json
from tokenizers import Tokenizer

# Load tokenizer
tokenizer = Tokenizer.from_file(tokenizer_path)

# Model architecture is included in this repo
from src.model.transformer import SLMForCausalLM
from src.model.config import SLMConfig

config = SLMConfig(vocab_size=16384)
model = SLMForCausalLM(config)

state_dict = torch.load(model_path, map_location="cpu")
if "model_state_dict" in state_dict:
    state_dict = state_dict["model_state_dict"]
model.load_state_dict(state_dict)
model.eval()

# Generate text
prompt = "The quick brown fox"
input_ids = torch.tensor([tokenizer.encode(prompt).ids])

with torch.no_grad():
    for _ in range(50):
        logits = model(input_ids).logits[:, -1, :]
        next_token = torch.argmax(logits, dim=-1, keepdim=True)
        input_ids = torch.cat([input_ids, next_token], dim=-1)

output = tokenizer.decode(input_ids[0].tolist())
print(output)

For Chat/Q&A Use

This base model is for language modeling only. For conversational Q&A, use the finetuned version: PebbleLM-117M-Chat

Intended Use

Appropriate for:

  • Edge deployment experiments
  • Educational purposes (learning transformer architecture)
  • Research on small language models
  • Baseline comparisons

Not recommended for:

  • Production applications
  • Factual question answering
  • Complex reasoning tasks

Limitations

This is a 117M parameter model - one of the smallest functional language models:

  • Limited knowledge capacity - Cannot reliably store extensive world knowledge
  • Weak reasoning - Not enough parameters for complex logical relationships
  • Inconsistent outputs - May produce repetitive or off-topic responses
  • English only - Trained exclusively on English text

For production-quality results, consider models with 1B+ parameters.

Model Files

File Description
model.pt PyTorch model weights
config.json Model configuration
tokenizer.json BPE tokenizer
tokenizer_config.json Tokenizer configuration

Citation

@misc{pebblellm2026,
  author = {Sakthivel},
  title = {PebbleLM-117M: A Small Language Model for Edge Deployment},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/nameissakthi/PebbleLM-117M}}
}

Acknowledgments

Training Data

Infrastructure

  • Google Cloud Platform (A100 GPU)
  • Weights & Biases (experiment tracking)

Frameworks

  • PyTorch
  • Hugging Face Tokenizers

License

MIT License

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nameissakthi/PebbleLM-117M

Finetunes
1 model

Datasets used to train nameissakthi/PebbleLM-117M