Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

JEEVES — 100M Parameter Instruction-Tuned Language Model

A compact, multi-topic instruction-tuned LLM trained from scratch.

Model Summary

JEEVES is a 100M parameter decoder-only transformer language model, designed and trained from scratch. It follows the LLaMA architecture (RMSNorm, GQA, SwiGLU, RoPE) and uses ChatML template for conversations.

Property	Value
Parameters	100.1M total (75.5M non-embedding)
Architecture	Decoder-only Transformer
Layers	12
Hidden Size	768
Attention Heads	12 (Q) / 4 (KV)
FFN Size	2048
Vocab Size	32,000
Max Sequence Length	512
Chat Format	ChatML (`<

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("YOUR_USERNAME/Jeeves-Small-100M")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/Jeeves-Small-100M")

messages = [
    {"role": "system", "content": "You are Jeeves, a helpful AI assistant."},
    {"role": "user", "content": "What is the capital of France?"},
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Benchmarks

Benchmarks pending...

All evaluations are zero-shot unless stated otherwise. Comparable to SmolLM2-135M-Instruct.

Training

Architecture

Type: Decoder-only Transformer (LLaMA-style)
Normalization: RMSNorm (pre-norm)
Attention: Grouped-Query Attention (GQA) with RoPE
FFN: SwiGLU
Embeddings: Tied input/output embeddings

Training Details

Pretraining: N/A tokens on TinyStories
SFT Data: ~94K multi-topic examples
- OpenHermes-2.5 (40K) — math, coding, science, reasoning
- SlimOrca (25K) — GPT-4 verified FLAN tasks
- Alpaca-Cleaned (15K) — general instructions
- Dolly-15K (14K) — human-written diverse tasks
SFT Epochs: 3
SFT LR: 2e-5
Hardware: N/A

Training Pipeline

Phase 1 — Pretraining: Next-token prediction on TinyStories corpus
Phase 2 — Chat SFT: Instruction tuning with ChatML template, loss masked on assistant tokens only

Limitations

Small model: 100M params — significantly less capable than larger models
Knowledge cutoff: Limited world knowledge due to small pretraining corpus
Math: Limited mathematical reasoning ability
Code: Can produce code snippets but may contain errors
Hallucination: May generate plausible but incorrect information
Language: English only

License

Apache 2.0

Citation

@misc{jeeves2025,
  title={JEEVES: A 100M Parameter Instruction-Tuned Language Model},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/YOUR_USERNAME/Jeeves-Small-100M}
}

Downloads last month: 91

Model tree for Anurich/Jeeves-Small-100M

Quantizations

1 model