PebbleLM-117M-Chat
A 117.5M parameter language model finetuned for conversational Q&A. Small but solid - designed for edge deployment.
Base Model: PebbleLM-117M
Model Description
PebbleLM-117M-Chat is finetuned from PebbleLM-117M on focused Q&A datasets to provide direct, concise answers to questions.
| Property | Value |
|---|---|
| Parameters | 117.5M |
| Architecture | Decoder-only Transformer |
| Layers | 8 |
| Hidden Size | 1024 |
| Attention Heads | 16 |
| Context Length | 1024 tokens |
| Vocabulary | 16,000 BPE tokens |
Inference Examples
User: What is 2 + 2?
Assistant: 2 + 2 equals 4.
User: What is the capital of France?
Assistant: The capital of France is Paris.
User: What is the largest planet?
Assistant: Jupiter is the largest planet in our solar system.
User: Who wrote Romeo and Juliet?
Assistant: William Shakespeare wrote Romeo and Juliet.
User: How do you say hello in Spanish?
Assistant: Hello in Spanish is 'Hola'.
User: What is AI?
Assistant: AI (artificial intelligence) is the ability of machines to perform
tasks that normally require human intelligence, such as learning,
problem-solving, and decision making.
User: Thank you
Assistant: Goodbye! Have a great day!
Training Data
Pretraining (Base Model)
| Dataset | Samples | Link |
|---|---|---|
| Wikipedia | 488,906 | wikipedia |
| OpenWebText | 500,000 | openwebtext |
| TinyStories | 188,067 | roneneldan/TinyStories |
Finetuning (This Model)
| Dataset | Samples | Description | Link |
|---|---|---|---|
| Alpaca-cleaned | 20,000 | Instruction-response pairs | yahma/alpaca-cleaned |
| Databricks Dolly | 10,991 | Q&A pairs | databricks/databricks-dolly-15k |
| Simple Q&A | 1,500 | Hand-crafted basic facts | Custom |
| Total | 32,491 |
Training Details
Base Checkpoint: PebbleLM-117M
Epochs: 5
Batch Size: 48
Gradient Accumulation: 2
Learning Rate: 5e-5
Final Training Loss: 1.55
Hardware: NVIDIA A100 80GB
Training Time: ~40 minutes
Benchmark Results
| Benchmark | Base Model | Chat Model | Change |
|---|---|---|---|
| HellaSwag | 32.20% | 31.80% | -0.4% |
| ARC-Easy | 35.80% | 40.00% | +4.2% |
| WinoGrande | 52.80% | 49.20% | -3.6% |
| PIQA | 58.20% | 56.00% | -2.2% |
| Average | 44.75% | 44.25% | -0.5% |
Note: Slight benchmark decrease is expected - model is optimized for Q&A quality, not reasoning benchmarks. The real improvement is in conversational responses.
Usage
Installation
pip install torch tokenizers huggingface_hub
Download Model
from huggingface_hub import hf_hub_download
# Download model files
model_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M-Chat", filename="model.pt")
tokenizer_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M-Chat", filename="tokenizer.json")
Load Model
import torch
from tokenizers import Tokenizer
# Model architecture is included in this repo
from src.model.transformer import SLMForCausalLM
from src.model.config import SLMConfig
# Load tokenizer
tokenizer = Tokenizer.from_file(tokenizer_path)
# Load model
config = SLMConfig(vocab_size=16384)
model = SLMForCausalLM(config)
state_dict = torch.load(model_path, map_location="cpu")
if "model_state_dict" in state_dict:
state_dict = state_dict["model_state_dict"]
model.load_state_dict(state_dict)
model.eval()
Prompt Format
<|user|>
Your question here
<|assistant|>
Generate Response
def generate(prompt, max_tokens=128, temperature=0.3):
formatted = f"<|user|>\n{prompt}\n<|assistant|>\n"
input_ids = torch.tensor([tokenizer.encode(formatted).ids])
with torch.no_grad():
for _ in range(max_tokens):
logits = model(input_ids).logits[:, -1, :]
logits = logits / temperature
probs = torch.softmax(logits, dim=-1)
next_token = torch.multinomial(probs, 1)
input_ids = torch.cat([input_ids, next_token], dim=-1)
# Stop on EOS or user token
if next_token.item() in [tokenizer.token_to_id("<|eos|>"),
tokenizer.token_to_id("<|user|>")]:
break
response = tokenizer.decode(input_ids[0].tolist())
return response.split("<|assistant|>")[-1].replace("<|eos|>", "").strip()
# Example
print(generate("What is the capital of France?"))
# Output: The capital of France is Paris.
print(generate("What is 2 + 2?"))
# Output: 2 + 2 equals 4.
Recommended Settings
temperature = 0.3 # Lower = more consistent
top_k = 50 # Limit token choices
top_p = 0.9 # Nucleus sampling
repetition_penalty = 1.2 # Reduce repetition
max_tokens = 128 # Keep responses short
Intended Use
Appropriate for:
- Edge deployment demos
- Simple Q&A applications
- Educational purposes
- IoT/embedded device experiments
Not recommended for:
- Production chatbots
- Factual accuracy-critical applications
- Complex multi-turn conversations
Limitations
- ~60% accuracy on simple factual questions
- Inconsistent on complex or unusual questions
- May hallucinate incorrect facts
- English only
- 117M parameters limits knowledge capacity
For production quality, consider 1B+ parameter models.
Model Files
| File | Description |
|---|---|
model.pt |
PyTorch model weights |
config.json |
Model configuration |
tokenizer.json |
BPE tokenizer |
tokenizer_config.json |
Tokenizer configuration |
Citation
@misc{pebblellmchat2026,
author = {Sakthivel},
title = {PebbleLM-117M-Chat: A Small Conversational Language Model},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/nameissakthi/PebbleLM-117M-Chat}}
}
Acknowledgments
Training Data
- Wikipedia - Wikimedia Foundation
- OpenWebText - Aaron Gokaslan and Vanya Cohen
- TinyStories - Ronen Eldan and Yuanzhi Li
- Alpaca-cleaned - Yahoo Research
- Databricks Dolly - Databricks
Infrastructure
- Google Cloud Platform (A100 GPU)
- Weights & Biases (experiment tracking)
Frameworks
- PyTorch
- Hugging Face Tokenizers
License
MIT License
- Downloads last month
- 7
Model tree for nameissakthi/PebbleLM-117M-Chat
Base model
nameissakthi/PebbleLM-117M