Configuration Parsing
Warning:
In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string
JEEVES β 100M Parameter Instruction-Tuned Language Model
A compact, multi-topic instruction-tuned LLM trained from scratch.
Model Summary
JEEVES is a 100M parameter decoder-only transformer language model, designed and trained from scratch. It follows the LLaMA architecture (RMSNorm, GQA, SwiGLU, RoPE) and uses ChatML template for conversations.
| Property | Value |
|---|---|
| Parameters | 100.1M total (75.5M non-embedding) |
| Architecture | Decoder-only Transformer |
| Layers | 12 |
| Hidden Size | 768 |
| Attention Heads | 12 (Q) / 4 (KV) |
| FFN Size | 2048 |
| Vocab Size | 32,000 |
| Max Sequence Length | 512 |
| Chat Format | ChatML (`< |
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("YOUR_USERNAME/Jeeves-Small-100M")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/Jeeves-Small-100M")
messages = [
{"role": "system", "content": "You are Jeeves, a helpful AI assistant."},
{"role": "user", "content": "What is the capital of France?"},
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Benchmarks
Benchmarks pending...
All evaluations are zero-shot unless stated otherwise. Comparable to SmolLM2-135M-Instruct.
Training
Architecture
- Type: Decoder-only Transformer (LLaMA-style)
- Normalization: RMSNorm (pre-norm)
- Attention: Grouped-Query Attention (GQA) with RoPE
- FFN: SwiGLU
- Embeddings: Tied input/output embeddings
Training Details
- Pretraining: N/A tokens on TinyStories
- SFT Data: ~94K multi-topic examples
- OpenHermes-2.5 (40K) β math, coding, science, reasoning
- SlimOrca (25K) β GPT-4 verified FLAN tasks
- Alpaca-Cleaned (15K) β general instructions
- Dolly-15K (14K) β human-written diverse tasks
- SFT Epochs: 3
- SFT LR: 2e-5
- Hardware: N/A
Training Pipeline
- Phase 1 β Pretraining: Next-token prediction on TinyStories corpus
- Phase 2 β Chat SFT: Instruction tuning with ChatML template, loss masked on assistant tokens only
Limitations
- Small model: 100M params β significantly less capable than larger models
- Knowledge cutoff: Limited world knowledge due to small pretraining corpus
- Math: Limited mathematical reasoning ability
- Code: Can produce code snippets but may contain errors
- Hallucination: May generate plausible but incorrect information
- Language: English only
License
Apache 2.0
Citation
@misc{jeeves2025,
title={JEEVES: A 100M Parameter Instruction-Tuned Language Model},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/YOUR_USERNAME/Jeeves-Small-100M}
}
- Downloads last month
- 91