miniJBrain-Story-SFT-v0.1

miniJBrain-Story-SFT-v0.1 is a small GPT-style causal language model fine-tuned for short, gentle, children-story style generation.

This release is part of the miniJBrain learning project, which covers:

  • tokenizer training
  • base pretraining
  • supervised fine-tuning (SFT)
  • lightweight style alignment
  • checkpoint export and open release

Official code repository:

  • https://github.com/chongliujia/miniJBrain

Model Details

  • Model family: custom GPT-style causal LM
  • Release checkpoint: minij_chat_story_stage2p1
  • Main use case: short story and bedtime-style text generation
  • Vocabulary size: 32,000
  • Context length: 1,024
  • Layers: 16
  • Attention heads: 16
  • Embedding size: 1,024
  • Weights format: safetensors

This checkpoint was selected as the most balanced story-oriented SFT result in the project. It performed better overall than narrower bedtime-only variants.

Repository Contents

This directory is an exported model package. It includes:

  • model.safetensors
  • config.json
  • tokenizer.json
  • generation_config.json
  • inference.py
  • README.md

Important: this is not a zero-code Hugging Face transformers package. The weights are present and usable, but the architecture is defined by the miniJBrain codebase rather than a standard AutoModelForCausalLM config.

How To Use

The recommended way to run this model is to use the original miniJBrain model code:

  • https://github.com/chongliujia/miniJBrain

Option 1: Run the local inference script

If you do not already have the model code, clone it first:

git clone https://github.com/chongliujia/miniJBrain.git

If this model directory sits next to the cloned miniJBrain project directory, you can run:

python inference.py \
  --device cpu \
  --prompt $'User:\nTell me a warm short bedtime story before sleep.\n\nAssistant:\n' \
  --max_new_tokens 220 \
  --temperature 0.50 \
  --top_k 50 \
  --top_p 0.95 \
  --repetition_penalty 1.06

By default, inference.py loads:

  • ./model.safetensors
  • ./config.json
  • ./tokenizer.json
  • ../miniJBrain as the model-code directory

If your miniJBrain checkout lives elsewhere:

python inference.py --minijbrain-root /path/to/miniJBrain

Option 2: Load it in your own Python code

Use the real miniJBrain model definition from model/gpt.py in the official repository:

import json
import sys
from pathlib import Path

import torch
from safetensors.torch import load_file
from tokenizers import Tokenizer

minijbrain_root = Path("/path/to/miniJBrain")
sys.path.insert(0, str(minijbrain_root))

from model.gpt import GPT, GPTConfig

device = "cuda" if torch.cuda.is_available() else "cpu"

with open("config.json", "r", encoding="utf-8") as f:
    raw_config = json.load(f)

model = GPT(GPTConfig(**raw_config)).to(device)
state_dict = load_file("model.safetensors")

# The exported safetensors file keeps tied weights through lm_head.weight.
if "transformer.wte.weight" not in state_dict and "lm_head.weight" in state_dict:
    state_dict["transformer.wte.weight"] = state_dict["lm_head.weight"]

model.load_state_dict(state_dict)
model.eval()

tokenizer = Tokenizer.from_file("tokenizer.json")
prompt = "User:\nTell me a warm short bedtime story before sleep.\n\nAssistant:\n"
input_ids = torch.tensor(
    [tokenizer.encode(prompt, add_special_tokens=False).ids],
    dtype=torch.long,
    device=device,
)

with torch.no_grad():
    output_ids = model.generate(
        input_ids,
        max_new_tokens=220,
        temperature=0.50,
        top_k=50,
        top_p=0.95,
        repetition_penalty=1.06,
        eos_token_id=tokenizer.token_to_id("<eos>"),
        stop_on_eos=True,
    )

text = tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True)
print(text)

What does not work out of the box

This repository does not yet support direct loading like:

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("your-repo-name")

That will not work yet because this repository does not provide a standard transformers architecture definition, model_type, or compatible modeling code.

Prompt Format

The model works best with the chat-style prompt format used during SFT:

User:
Tell me a warm short bedtime story before sleep.

Assistant:

It generally responds best when the prompt is short, explicit, and clearly story-oriented.

Recommended Decoding

Suggested defaults from generation_config.json:

max_new_tokens = 220
temperature = 0.50
top_k = 50
top_p = 0.95
repetition_penalty = 1.06

Intended Use

This model is intended for:

  • educational demonstration of small-LLM training and release
  • toy story generation experiments
  • prompt-format experiments
  • decoding experiments on a compact custom LM
  • studying story-heavy SFT behavior

Out-of-Scope Use

This model is not intended for:

  • factual question answering
  • safety-critical applications
  • production child-facing systems
  • high-reliability assistant behavior
  • benchmark-oriented comparison with modern instruction models

Training Summary

This release comes from the stage2p1 story-SFT experiment in the broader miniJBrain project.

High-level training path:

  1. train tokenizer
  2. pretrain a small GPT-style base model
  3. run instruction/story SFT
  4. build a story-heavy second-stage SFT mixture
  5. select the most balanced checkpoint for release

The final checkpoint was chosen because it retained better prompt following and more stable generation than narrower bedtime-specialized runs.

Data Summary

The broader miniJBrain SFT experiments used locally prepared prompt/response data assembled from public sources, including:

  • HuggingFaceH4/ultrachat_200k
  • databricks/databricks-dolly-15k
  • Open-Orca/OpenOrca
  • openai/gsm8k
  • roneneldan/TinyStories

For this release checkpoint, the most important SFT sources were:

  • roneneldan/TinyStories
  • HuggingFaceH4/ultrachat_200k
  • databricks/databricks-dolly-15k

Approximate stage2p1 training composition:

  • story samples: 120,000
  • UltraChat-derived samples: 17,839
  • Dolly-derived samples: 3,337

Approximate validation composition:

  • story samples: 6,000
  • UltraChat-derived samples: 1,059

This puts the final SFT mix at roughly:

  • 85% story-style data
  • 15% chat/instruction-style data

That balance was selected because pure story-only tuning narrowed prompt generalization too much, while a story-heavy mix with some chat data produced more stable behavior.

Dataset note: before any formal redistribution claims, upstream dataset licenses and usage restrictions should be reviewed source by source.

Limitations

Known limitations include:

  • repeated story structure and character patterns
  • frequent reuse of certain names and motifs
  • generic story arcs
  • style instability across prompt phrasings
  • occasional abrupt endings with short decoding limits
  • imperfect specialization for bedtime-only prompts

Release Notes

This is a learning-project release, not a benchmark-optimized or production-tuned model.

The main publication goal is transparency around:

  • how the data was formatted
  • how story-heavy SFT was performed
  • how custom GPT-style checkpoints were exported
  • how a small custom model can be shared openly
Downloads last month
327
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support