PebbleLM-117M-Chat

A 117.5M parameter language model finetuned for conversational Q&A. Small but solid - designed for edge deployment.

Base Model: PebbleLM-117M

Model Description

PebbleLM-117M-Chat is finetuned from PebbleLM-117M on focused Q&A datasets to provide direct, concise answers to questions.

Property Value
Parameters 117.5M
Architecture Decoder-only Transformer
Layers 8
Hidden Size 1024
Attention Heads 16
Context Length 1024 tokens
Vocabulary 16,000 BPE tokens

Inference Examples

User: What is 2 + 2?
Assistant: 2 + 2 equals 4.

User: What is the capital of France?
Assistant: The capital of France is Paris.

User: What is the largest planet?
Assistant: Jupiter is the largest planet in our solar system.

User: Who wrote Romeo and Juliet?
Assistant: William Shakespeare wrote Romeo and Juliet.

User: How do you say hello in Spanish?
Assistant: Hello in Spanish is 'Hola'.

User: What is AI?
Assistant: AI (artificial intelligence) is the ability of machines to perform
tasks that normally require human intelligence, such as learning,
problem-solving, and decision making.

User: Thank you
Assistant: Goodbye! Have a great day!

Training Data

Pretraining (Base Model)

Dataset Samples Link
Wikipedia 488,906 wikipedia
OpenWebText 500,000 openwebtext
TinyStories 188,067 roneneldan/TinyStories

Finetuning (This Model)

Dataset Samples Description Link
Alpaca-cleaned 20,000 Instruction-response pairs yahma/alpaca-cleaned
Databricks Dolly 10,991 Q&A pairs databricks/databricks-dolly-15k
Simple Q&A 1,500 Hand-crafted basic facts Custom
Total 32,491

Training Details

Base Checkpoint: PebbleLM-117M
Epochs: 5
Batch Size: 48
Gradient Accumulation: 2
Learning Rate: 5e-5
Final Training Loss: 1.55
Hardware: NVIDIA A100 80GB
Training Time: ~40 minutes

Benchmark Results

Benchmark Base Model Chat Model Change
HellaSwag 32.20% 31.80% -0.4%
ARC-Easy 35.80% 40.00% +4.2%
WinoGrande 52.80% 49.20% -3.6%
PIQA 58.20% 56.00% -2.2%
Average 44.75% 44.25% -0.5%

Note: Slight benchmark decrease is expected - model is optimized for Q&A quality, not reasoning benchmarks. The real improvement is in conversational responses.

Usage

Installation

pip install torch tokenizers huggingface_hub

Download Model

from huggingface_hub import hf_hub_download

# Download model files
model_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M-Chat", filename="model.pt")
tokenizer_path = hf_hub_download(repo_id="nameissakthi/PebbleLM-117M-Chat", filename="tokenizer.json")

Load Model

import torch
from tokenizers import Tokenizer

# Model architecture is included in this repo
from src.model.transformer import SLMForCausalLM
from src.model.config import SLMConfig

# Load tokenizer
tokenizer = Tokenizer.from_file(tokenizer_path)

# Load model
config = SLMConfig(vocab_size=16384)
model = SLMForCausalLM(config)

state_dict = torch.load(model_path, map_location="cpu")
if "model_state_dict" in state_dict:
    state_dict = state_dict["model_state_dict"]
model.load_state_dict(state_dict)
model.eval()

Prompt Format

<|user|>
Your question here
<|assistant|>

Generate Response

def generate(prompt, max_tokens=128, temperature=0.3):
    formatted = f"<|user|>\n{prompt}\n<|assistant|>\n"
    input_ids = torch.tensor([tokenizer.encode(formatted).ids])

    with torch.no_grad():
        for _ in range(max_tokens):
            logits = model(input_ids).logits[:, -1, :]
            logits = logits / temperature
            probs = torch.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, 1)
            input_ids = torch.cat([input_ids, next_token], dim=-1)

            # Stop on EOS or user token
            if next_token.item() in [tokenizer.token_to_id("<|eos|>"),
                                      tokenizer.token_to_id("<|user|>")]:
                break

    response = tokenizer.decode(input_ids[0].tolist())
    return response.split("<|assistant|>")[-1].replace("<|eos|>", "").strip()

# Example
print(generate("What is the capital of France?"))
# Output: The capital of France is Paris.

print(generate("What is 2 + 2?"))
# Output: 2 + 2 equals 4.

Recommended Settings

temperature = 0.3        # Lower = more consistent
top_k = 50               # Limit token choices
top_p = 0.9              # Nucleus sampling
repetition_penalty = 1.2 # Reduce repetition
max_tokens = 128         # Keep responses short

Intended Use

Appropriate for:

  • Edge deployment demos
  • Simple Q&A applications
  • Educational purposes
  • IoT/embedded device experiments

Not recommended for:

  • Production chatbots
  • Factual accuracy-critical applications
  • Complex multi-turn conversations

Limitations

  • ~60% accuracy on simple factual questions
  • Inconsistent on complex or unusual questions
  • May hallucinate incorrect facts
  • English only
  • 117M parameters limits knowledge capacity

For production quality, consider 1B+ parameter models.

Model Files

File Description
model.pt PyTorch model weights
config.json Model configuration
tokenizer.json BPE tokenizer
tokenizer_config.json Tokenizer configuration

Citation

@misc{pebblellmchat2026,
  author = {Sakthivel},
  title = {PebbleLM-117M-Chat: A Small Conversational Language Model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/nameissakthi/PebbleLM-117M-Chat}}
}

Acknowledgments

Training Data

Infrastructure

  • Google Cloud Platform (A100 GPU)
  • Weights & Biases (experiment tracking)

Frameworks

  • PyTorch
  • Hugging Face Tokenizers

License

MIT License

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nameissakthi/PebbleLM-117M-Chat

Finetuned
(1)
this model

Datasets used to train nameissakthi/PebbleLM-117M-Chat