miniJBrain-Story-SFT-v0.1

miniJBrain-Story-SFT-v0.1 is a small GPT-style causal language model fine-tuned for short, gentle, children-story style generation.

This release is part of the miniJBrain learning project, which covers:

tokenizer training
base pretraining
supervised fine-tuning (SFT)
lightweight style alignment
checkpoint export and open release

Official code repository:

https://github.com/chongliujia/miniJBrain

Model Details

Model family: custom GPT-style causal LM
Release checkpoint: minij_chat_story_stage2p1
Main use case: short story and bedtime-style text generation
Vocabulary size: 32,000
Context length: 1,024
Layers: 16
Attention heads: 16
Embedding size: 1,024
Weights format: safetensors

This checkpoint was selected as the most balanced story-oriented SFT result in the project. It performed better overall than narrower bedtime-only variants.

Repository Contents

This directory is an exported model package. It includes:

model.safetensors
config.json
tokenizer.json
generation_config.json
inference.py
README.md

Important: this is not a zero-code Hugging Face transformers package. The weights are present and usable, but the architecture is defined by the miniJBrain codebase rather than a standard AutoModelForCausalLM config.

How To Use

The recommended way to run this model is to use the original miniJBrain model code:

https://github.com/chongliujia/miniJBrain

Option 1: Run the local inference script

If you do not already have the model code, clone it first:

git clone https://github.com/chongliujia/miniJBrain.git

If this model directory sits next to the cloned miniJBrain project directory, you can run:

python inference.py \
  --device cpu \
  --prompt $'User:\nTell me a warm short bedtime story before sleep.\n\nAssistant:\n' \
  --max_new_tokens 220 \
  --temperature 0.50 \
  --top_k 50 \
  --top_p 0.95 \
  --repetition_penalty 1.06

By default, inference.py loads:

./model.safetensors
./config.json
./tokenizer.json
../miniJBrain as the model-code directory

If your miniJBrain checkout lives elsewhere:

python inference.py --minijbrain-root /path/to/miniJBrain

Option 2: Load it in your own Python code

Use the real miniJBrain model definition from model/gpt.py in the official repository:

import json
import sys
from pathlib import Path

import torch
from safetensors.torch import load_file
from tokenizers import Tokenizer

minijbrain_root = Path("/path/to/miniJBrain")
sys.path.insert(0, str(minijbrain_root))

from model.gpt import GPT, GPTConfig

device = "cuda" if torch.cuda.is_available() else "cpu"

with open("config.json", "r", encoding="utf-8") as f:
    raw_config = json.load(f)

model = GPT(GPTConfig(**raw_config)).to(device)
state_dict = load_file("model.safetensors")

# The exported safetensors file keeps tied weights through lm_head.weight.
if "transformer.wte.weight" not in state_dict and "lm_head.weight" in state_dict:
    state_dict["transformer.wte.weight"] = state_dict["lm_head.weight"]

model.load_state_dict(state_dict)
model.eval()

tokenizer = Tokenizer.from_file("tokenizer.json")
prompt = "User:\nTell me a warm short bedtime story before sleep.\n\nAssistant:\n"
input_ids = torch.tensor(
    [tokenizer.encode(prompt, add_special_tokens=False).ids],
    dtype=torch.long,
    device=device,
)

with torch.no_grad():
    output_ids = model.generate(
        input_ids,
        max_new_tokens=220,
        temperature=0.50,
        top_k=50,
        top_p=0.95,
        repetition_penalty=1.06,
        eos_token_id=tokenizer.token_to_id("<eos>"),
        stop_on_eos=True,
    )

text = tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True)
print(text)

What does not work out of the box

This repository does not yet support direct loading like:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("your-repo-name")

That will not work yet because this repository does not provide a standard transformers architecture definition, model_type, or compatible modeling code.

Prompt Format

The model works best with the chat-style prompt format used during SFT:

User:
Tell me a warm short bedtime story before sleep.

Assistant:

It generally responds best when the prompt is short, explicit, and clearly story-oriented.

Recommended Decoding

Suggested defaults from generation_config.json:

max_new_tokens = 220
temperature = 0.50
top_k = 50
top_p = 0.95
repetition_penalty = 1.06

Intended Use

This model is intended for:

educational demonstration of small-LLM training and release
toy story generation experiments
prompt-format experiments
decoding experiments on a compact custom LM
studying story-heavy SFT behavior

Out-of-Scope Use

This model is not intended for:

factual question answering
safety-critical applications
production child-facing systems
high-reliability assistant behavior
benchmark-oriented comparison with modern instruction models

Training Summary

This release comes from the stage2p1 story-SFT experiment in the broader miniJBrain project.

High-level training path:

train tokenizer
pretrain a small GPT-style base model
run instruction/story SFT
build a story-heavy second-stage SFT mixture
select the most balanced checkpoint for release

The final checkpoint was chosen because it retained better prompt following and more stable generation than narrower bedtime-specialized runs.

Data Summary

The broader miniJBrain SFT experiments used locally prepared prompt/response data assembled from public sources, including:

HuggingFaceH4/ultrachat_200k
databricks/databricks-dolly-15k
Open-Orca/OpenOrca
openai/gsm8k
roneneldan/TinyStories

For this release checkpoint, the most important SFT sources were:

roneneldan/TinyStories
HuggingFaceH4/ultrachat_200k
databricks/databricks-dolly-15k

Approximate stage2p1 training composition:

story samples: 120,000
UltraChat-derived samples: 17,839
Dolly-derived samples: 3,337

Approximate validation composition:

story samples: 6,000
UltraChat-derived samples: 1,059

This puts the final SFT mix at roughly:

85% story-style data
15% chat/instruction-style data

That balance was selected because pure story-only tuning narrowed prompt generalization too much, while a story-heavy mix with some chat data produced more stable behavior.

Dataset note: before any formal redistribution claims, upstream dataset licenses and usage restrictions should be reviewed source by source.

Limitations

Known limitations include:

repeated story structure and character patterns
frequent reuse of certain names and motifs
generic story arcs
style instability across prompt phrasings
occasional abrupt endings with short decoding limits
imperfect specialization for bedtime-only prompts

Release Notes

This is a learning-project release, not a benchmark-optimized or production-tuned model.

The main publication goal is transparency around:

how the data was formatted
how story-heavy SFT was performed
how custom GPT-style checkpoints were exported
how a small custom model can be shared openly

Downloads last month: 9

Safetensors

Model size

0.2B params

Tensor type

F32