PebbleLM-117M
A 117.5M parameter language model trained from scratch. Small but solid - designed for edge deployment and educational purposes.
Model Description
PebbleLM-117M is a decoder-only transformer trained on a diverse corpus of text. Despite its small size, it demonstrates basic language understanding and generation capabilities.
| Property | Value |
|---|---|
| Parameters | 117.5M |
| Architecture | Decoder-only Transformer |
| Layers | 8 |
| Hidden Size | 1024 |
| Attention Heads | 16 |
| Context Length | 1024 tokens |
| Vocabulary | 16,000 BPE tokens |
| Position Encoding | RoPE |
| Normalization | RMSNorm |
| Activation | GELU |
Training Data
Pretrained on 1.17M samples from diverse sources:
| Dataset | Samples | Description | Link |
|---|---|---|---|
| Wikipedia | 488,906 | Encyclopedic knowledge | wikipedia |
| OpenWebText | 500,000 | Diverse web content | openwebtext |
| TinyStories | 188,067 | Simple narrative structure | roneneldan/TinyStories |
| Total | 1,176,973 |
Training Details
Epochs: 3
Batch Size: 48
Gradient Accumulation: 2
Effective Batch Size: 96
Learning Rate: 3e-4
Warmup Ratio: 0.1
Precision: FP16
Hardware: NVIDIA A100 80GB
Training Time: ~4.5 hours
Benchmark Results
Evaluated on 500 samples per benchmark:
| Benchmark | Accuracy | Random Baseline | Above Random |
|---|---|---|---|
| HellaSwag | 32.20% | 25% | +7.2% |
| ARC-Easy | 35.80% | 25% | +10.8% |
| WinoGrande | 52.80% | 50% | +2.8% |
| PIQA | 58.20% | 50% | +8.2% |
| Average | 44.75% | - | - |
Usage
Installation
pip install torch tokenizers
Download Model
from huggingface_hub import hf_hub_download
# Download model files
model_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M", filename="model.pt")
tokenizer_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M", filename="tokenizer.json")
Load and Generate
import torch
import json
from tokenizers import Tokenizer
# Load tokenizer
tokenizer = Tokenizer.from_file(tokenizer_path)
# Model architecture is included in this repo
from src.model.transformer import SLMForCausalLM
from src.model.config import SLMConfig
config = SLMConfig(vocab_size=16384)
model = SLMForCausalLM(config)
state_dict = torch.load(model_path, map_location="cpu")
if "model_state_dict" in state_dict:
state_dict = state_dict["model_state_dict"]
model.load_state_dict(state_dict)
model.eval()
# Generate text
prompt = "The quick brown fox"
input_ids = torch.tensor([tokenizer.encode(prompt).ids])
with torch.no_grad():
for _ in range(50):
logits = model(input_ids).logits[:, -1, :]
next_token = torch.argmax(logits, dim=-1, keepdim=True)
input_ids = torch.cat([input_ids, next_token], dim=-1)
output = tokenizer.decode(input_ids[0].tolist())
print(output)
For Chat/Q&A Use
This base model is for language modeling only. For conversational Q&A, use the finetuned version: PebbleLM-117M-Chat
Intended Use
Appropriate for:
- Edge deployment experiments
- Educational purposes (learning transformer architecture)
- Research on small language models
- Baseline comparisons
Not recommended for:
- Production applications
- Factual question answering
- Complex reasoning tasks
Limitations
This is a 117M parameter model - one of the smallest functional language models:
- Limited knowledge capacity - Cannot reliably store extensive world knowledge
- Weak reasoning - Not enough parameters for complex logical relationships
- Inconsistent outputs - May produce repetitive or off-topic responses
- English only - Trained exclusively on English text
For production-quality results, consider models with 1B+ parameters.
Model Files
| File | Description |
|---|---|
model.pt |
PyTorch model weights |
config.json |
Model configuration |
tokenizer.json |
BPE tokenizer |
tokenizer_config.json |
Tokenizer configuration |
Citation
@misc{pebblellm2026,
author = {Sakthivel},
title = {PebbleLM-117M: A Small Language Model for Edge Deployment},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/nameissakthi/PebbleLM-117M}}
}
Acknowledgments
Training Data
- Wikipedia - Wikimedia Foundation
- OpenWebText - Aaron Gokaslan and Vanya Cohen
- TinyStories - Ronen Eldan and Yuanzhi Li
Infrastructure
- Google Cloud Platform (A100 GPU)
- Weights & Biases (experiment tracking)
Frameworks
- PyTorch
- Hugging Face Tokenizers
License
MIT License
- Downloads last month
- 5