miniJBrain-Story-SFT-v0.1
miniJBrain-Story-SFT-v0.1 is a small GPT-style causal language model fine-tuned for short, gentle, children-story style generation.
This release is part of the miniJBrain learning project, which covers:
- tokenizer training
- base pretraining
- supervised fine-tuning (SFT)
- lightweight style alignment
- checkpoint export and open release
Official code repository:
https://github.com/chongliujia/miniJBrain
Model Details
- Model family: custom GPT-style causal LM
- Release checkpoint:
minij_chat_story_stage2p1 - Main use case: short story and bedtime-style text generation
- Vocabulary size:
32,000 - Context length:
1,024 - Layers:
16 - Attention heads:
16 - Embedding size:
1,024 - Weights format:
safetensors
This checkpoint was selected as the most balanced story-oriented SFT result in the project. It performed better overall than narrower bedtime-only variants.
Repository Contents
This directory is an exported model package. It includes:
model.safetensorsconfig.jsontokenizer.jsongeneration_config.jsoninference.pyREADME.md
Important: this is not a zero-code Hugging Face transformers package. The weights are present and usable, but the architecture is defined by the miniJBrain codebase rather than a standard AutoModelForCausalLM config.
How To Use
The recommended way to run this model is to use the original miniJBrain model code:
https://github.com/chongliujia/miniJBrain
Option 1: Run the local inference script
If you do not already have the model code, clone it first:
git clone https://github.com/chongliujia/miniJBrain.git
If this model directory sits next to the cloned miniJBrain project directory, you can run:
python inference.py \
--device cpu \
--prompt $'User:\nTell me a warm short bedtime story before sleep.\n\nAssistant:\n' \
--max_new_tokens 220 \
--temperature 0.50 \
--top_k 50 \
--top_p 0.95 \
--repetition_penalty 1.06
By default, inference.py loads:
./model.safetensors./config.json./tokenizer.json../miniJBrainas the model-code directory
If your miniJBrain checkout lives elsewhere:
python inference.py --minijbrain-root /path/to/miniJBrain
Option 2: Load it in your own Python code
Use the real miniJBrain model definition from model/gpt.py in the official repository:
import json
import sys
from pathlib import Path
import torch
from safetensors.torch import load_file
from tokenizers import Tokenizer
minijbrain_root = Path("/path/to/miniJBrain")
sys.path.insert(0, str(minijbrain_root))
from model.gpt import GPT, GPTConfig
device = "cuda" if torch.cuda.is_available() else "cpu"
with open("config.json", "r", encoding="utf-8") as f:
raw_config = json.load(f)
model = GPT(GPTConfig(**raw_config)).to(device)
state_dict = load_file("model.safetensors")
# The exported safetensors file keeps tied weights through lm_head.weight.
if "transformer.wte.weight" not in state_dict and "lm_head.weight" in state_dict:
state_dict["transformer.wte.weight"] = state_dict["lm_head.weight"]
model.load_state_dict(state_dict)
model.eval()
tokenizer = Tokenizer.from_file("tokenizer.json")
prompt = "User:\nTell me a warm short bedtime story before sleep.\n\nAssistant:\n"
input_ids = torch.tensor(
[tokenizer.encode(prompt, add_special_tokens=False).ids],
dtype=torch.long,
device=device,
)
with torch.no_grad():
output_ids = model.generate(
input_ids,
max_new_tokens=220,
temperature=0.50,
top_k=50,
top_p=0.95,
repetition_penalty=1.06,
eos_token_id=tokenizer.token_to_id("<eos>"),
stop_on_eos=True,
)
text = tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True)
print(text)
What does not work out of the box
This repository does not yet support direct loading like:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("your-repo-name")
That will not work yet because this repository does not provide a standard transformers architecture definition, model_type, or compatible modeling code.
Prompt Format
The model works best with the chat-style prompt format used during SFT:
User:
Tell me a warm short bedtime story before sleep.
Assistant:
It generally responds best when the prompt is short, explicit, and clearly story-oriented.
Recommended Decoding
Suggested defaults from generation_config.json:
max_new_tokens = 220
temperature = 0.50
top_k = 50
top_p = 0.95
repetition_penalty = 1.06
Intended Use
This model is intended for:
- educational demonstration of small-LLM training and release
- toy story generation experiments
- prompt-format experiments
- decoding experiments on a compact custom LM
- studying story-heavy SFT behavior
Out-of-Scope Use
This model is not intended for:
- factual question answering
- safety-critical applications
- production child-facing systems
- high-reliability assistant behavior
- benchmark-oriented comparison with modern instruction models
Training Summary
This release comes from the stage2p1 story-SFT experiment in the broader miniJBrain project.
High-level training path:
- train tokenizer
- pretrain a small GPT-style base model
- run instruction/story SFT
- build a story-heavy second-stage SFT mixture
- select the most balanced checkpoint for release
The final checkpoint was chosen because it retained better prompt following and more stable generation than narrower bedtime-specialized runs.
Data Summary
The broader miniJBrain SFT experiments used locally prepared prompt/response data assembled from public sources, including:
HuggingFaceH4/ultrachat_200kdatabricks/databricks-dolly-15kOpen-Orca/OpenOrcaopenai/gsm8kroneneldan/TinyStories
For this release checkpoint, the most important SFT sources were:
roneneldan/TinyStoriesHuggingFaceH4/ultrachat_200kdatabricks/databricks-dolly-15k
Approximate stage2p1 training composition:
- story samples:
120,000 - UltraChat-derived samples:
17,839 - Dolly-derived samples:
3,337
Approximate validation composition:
- story samples:
6,000 - UltraChat-derived samples:
1,059
This puts the final SFT mix at roughly:
85%story-style data15%chat/instruction-style data
That balance was selected because pure story-only tuning narrowed prompt generalization too much, while a story-heavy mix with some chat data produced more stable behavior.
Dataset note: before any formal redistribution claims, upstream dataset licenses and usage restrictions should be reviewed source by source.
Limitations
Known limitations include:
- repeated story structure and character patterns
- frequent reuse of certain names and motifs
- generic story arcs
- style instability across prompt phrasings
- occasional abrupt endings with short decoding limits
- imperfect specialization for bedtime-only prompts
Release Notes
This is a learning-project release, not a benchmark-optimized or production-tuned model.
The main publication goal is transparency around:
- how the data was formatted
- how story-heavy SFT was performed
- how custom GPT-style checkpoints were exported
- how a small custom model can be shared openly
- Downloads last month
- 327