---
license: mit
language:
  - en
tags:
  - text-generation
  - gpt
  - story-generation
  - sft
  - educational
library_name: custom
pipeline_tag: text-generation
---

# miniJBrain-Story-SFT-v0.1

`miniJBrain-Story-SFT-v0.1` is a small GPT-style causal language model fine-tuned for short, gentle, children-story style generation.

This release is part of the `miniJBrain` learning project, which covers:

- tokenizer training
- base pretraining
- supervised fine-tuning (SFT)
- lightweight style alignment
- checkpoint export and open release

Official code repository:

- `https://github.com/chongliujia/miniJBrain`

## Model Details

- Model family: custom GPT-style causal LM
- Release checkpoint: `minij_chat_story_stage2p1`
- Main use case: short story and bedtime-style text generation
- Vocabulary size: `32,000`
- Context length: `1,024`
- Layers: `16`
- Attention heads: `16`
- Embedding size: `1,024`
- Weights format: `safetensors`

This checkpoint was selected as the most balanced story-oriented SFT result in the project. It performed better overall than narrower bedtime-only variants.

## Repository Contents

This directory is an exported model package. It includes:

- `model.safetensors`
- `config.json`
- `tokenizer.json`
- `generation_config.json`
- `inference.py`
- `README.md`

Important: this is not a zero-code Hugging Face `transformers` package. The weights are present and usable, but the architecture is defined by the `miniJBrain` codebase rather than a standard `AutoModelForCausalLM` config.

## How To Use

The recommended way to run this model is to use the original `miniJBrain` model code:

- `https://github.com/chongliujia/miniJBrain`

### Option 1: Run the local inference script

If you do not already have the model code, clone it first:

```bash
git clone https://github.com/chongliujia/miniJBrain.git
```

If this model directory sits next to the cloned `miniJBrain` project directory, you can run:

```bash
python inference.py \
  --device cpu \
  --prompt $'User:\nTell me a warm short bedtime story before sleep.\n\nAssistant:\n' \
  --max_new_tokens 220 \
  --temperature 0.50 \
  --top_k 50 \
  --top_p 0.95 \
  --repetition_penalty 1.06
```

By default, `inference.py` loads:

- `./model.safetensors`
- `./config.json`
- `./tokenizer.json`
- `../miniJBrain` as the model-code directory

If your `miniJBrain` checkout lives elsewhere:

```bash
python inference.py --minijbrain-root /path/to/miniJBrain
```

### Option 2: Load it in your own Python code

Use the real `miniJBrain` model definition from `model/gpt.py` in the official repository:

```python
import json
import sys
from pathlib import Path

import torch
from safetensors.torch import load_file
from tokenizers import Tokenizer

minijbrain_root = Path("/path/to/miniJBrain")
sys.path.insert(0, str(minijbrain_root))

from model.gpt import GPT, GPTConfig

device = "cuda" if torch.cuda.is_available() else "cpu"

with open("config.json", "r", encoding="utf-8") as f:
    raw_config = json.load(f)

model = GPT(GPTConfig(**raw_config)).to(device)
state_dict = load_file("model.safetensors")

# The exported safetensors file keeps tied weights through lm_head.weight.
if "transformer.wte.weight" not in state_dict and "lm_head.weight" in state_dict:
    state_dict["transformer.wte.weight"] = state_dict["lm_head.weight"]

model.load_state_dict(state_dict)
model.eval()

tokenizer = Tokenizer.from_file("tokenizer.json")
prompt = "User:\nTell me a warm short bedtime story before sleep.\n\nAssistant:\n"
input_ids = torch.tensor(
    [tokenizer.encode(prompt, add_special_tokens=False).ids],
    dtype=torch.long,
    device=device,
)

with torch.no_grad():
    output_ids = model.generate(
        input_ids,
        max_new_tokens=220,
        temperature=0.50,
        top_k=50,
        top_p=0.95,
        repetition_penalty=1.06,
        eos_token_id=tokenizer.token_to_id("<eos>"),
        stop_on_eos=True,
    )

text = tokenizer.decode(output_ids[0].tolist(), skip_special_tokens=True)
print(text)
```

### What does not work out of the box

This repository does not yet support direct loading like:

```python
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("your-repo-name")
```

That will not work yet because this repository does not provide a standard `transformers` architecture definition, `model_type`, or compatible modeling code.

## Prompt Format

The model works best with the chat-style prompt format used during SFT:

```text
User:
Tell me a warm short bedtime story before sleep.

Assistant:
```

It generally responds best when the prompt is short, explicit, and clearly story-oriented.

## Recommended Decoding

Suggested defaults from `generation_config.json`:

```text
max_new_tokens = 220
temperature = 0.50
top_k = 50
top_p = 0.95
repetition_penalty = 1.06
```

## Intended Use

This model is intended for:

- educational demonstration of small-LLM training and release
- toy story generation experiments
- prompt-format experiments
- decoding experiments on a compact custom LM
- studying story-heavy SFT behavior

## Out-of-Scope Use

This model is not intended for:

- factual question answering
- safety-critical applications
- production child-facing systems
- high-reliability assistant behavior
- benchmark-oriented comparison with modern instruction models

## Training Summary

This release comes from the `stage2p1` story-SFT experiment in the broader `miniJBrain` project.

High-level training path:

1. train tokenizer
2. pretrain a small GPT-style base model
3. run instruction/story SFT
4. build a story-heavy second-stage SFT mixture
5. select the most balanced checkpoint for release

The final checkpoint was chosen because it retained better prompt following and more stable generation than narrower bedtime-specialized runs.

## Data Summary

The broader `miniJBrain` SFT experiments used locally prepared prompt/response data assembled from public sources, including:

- `HuggingFaceH4/ultrachat_200k`
- `databricks/databricks-dolly-15k`
- `Open-Orca/OpenOrca`
- `openai/gsm8k`
- `roneneldan/TinyStories`

For this release checkpoint, the most important SFT sources were:

- `roneneldan/TinyStories`
- `HuggingFaceH4/ultrachat_200k`
- `databricks/databricks-dolly-15k`

Approximate `stage2p1` training composition:

- story samples: `120,000`
- UltraChat-derived samples: `17,839`
- Dolly-derived samples: `3,337`

Approximate validation composition:

- story samples: `6,000`
- UltraChat-derived samples: `1,059`

This puts the final SFT mix at roughly:

- `85%` story-style data
- `15%` chat/instruction-style data

That balance was selected because pure story-only tuning narrowed prompt generalization too much, while a story-heavy mix with some chat data produced more stable behavior.

Dataset note: before any formal redistribution claims, upstream dataset licenses and usage restrictions should be reviewed source by source.

## Limitations

Known limitations include:

- repeated story structure and character patterns
- frequent reuse of certain names and motifs
- generic story arcs
- style instability across prompt phrasings
- occasional abrupt endings with short decoding limits
- imperfect specialization for bedtime-only prompts

## Release Notes

This is a learning-project release, not a benchmark-optimized or production-tuned model.

The main publication goal is transparency around:

- how the data was formatted
- how story-heavy SFT was performed
- how custom GPT-style checkpoints were exported
- how a small custom model can be shared openly