Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

JEEVES β€” 100M Parameter Instruction-Tuned Language Model

A compact, multi-topic instruction-tuned LLM trained from scratch.

Model Summary

JEEVES is a 100M parameter decoder-only transformer language model, designed and trained from scratch. It follows the LLaMA architecture (RMSNorm, GQA, SwiGLU, RoPE) and uses ChatML template for conversations.

Property Value
Parameters 100.1M total (75.5M non-embedding)
Architecture Decoder-only Transformer
Layers 12
Hidden Size 768
Attention Heads 12 (Q) / 4 (KV)
FFN Size 2048
Vocab Size 32,000
Max Sequence Length 512
Chat Format ChatML (`<

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("YOUR_USERNAME/Jeeves-Small-100M")
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/Jeeves-Small-100M")

messages = [
    {"role": "system", "content": "You are Jeeves, a helpful AI assistant."},
    {"role": "user", "content": "What is the capital of France?"},
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Benchmarks

Benchmarks pending...

All evaluations are zero-shot unless stated otherwise. Comparable to SmolLM2-135M-Instruct.

Training

Architecture

  • Type: Decoder-only Transformer (LLaMA-style)
  • Normalization: RMSNorm (pre-norm)
  • Attention: Grouped-Query Attention (GQA) with RoPE
  • FFN: SwiGLU
  • Embeddings: Tied input/output embeddings

Training Details

  • Pretraining: N/A tokens on TinyStories
  • SFT Data: ~94K multi-topic examples
    • OpenHermes-2.5 (40K) β€” math, coding, science, reasoning
    • SlimOrca (25K) β€” GPT-4 verified FLAN tasks
    • Alpaca-Cleaned (15K) β€” general instructions
    • Dolly-15K (14K) β€” human-written diverse tasks
  • SFT Epochs: 3
  • SFT LR: 2e-5
  • Hardware: N/A

Training Pipeline

  1. Phase 1 β€” Pretraining: Next-token prediction on TinyStories corpus
  2. Phase 2 β€” Chat SFT: Instruction tuning with ChatML template, loss masked on assistant tokens only

Limitations

  • Small model: 100M params β€” significantly less capable than larger models
  • Knowledge cutoff: Limited world knowledge due to small pretraining corpus
  • Math: Limited mathematical reasoning ability
  • Code: Can produce code snippets but may contain errors
  • Hallucination: May generate plausible but incorrect information
  • Language: English only

License

Apache 2.0

Citation

@misc{jeeves2025,
  title={JEEVES: A 100M Parameter Instruction-Tuned Language Model},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/YOUR_USERNAME/Jeeves-Small-100M}
}
Downloads last month
91
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Model tree for Anurich/Jeeves-Small-100M

Quantizations
1 model