How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="diverWayne/mikky-64m",
	filename="mikky-64m-bf16.gguf",
)
output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

mikky-64m

mikky-64m is a 63,912,192-parameter small language model named mikky. It was trained by HUANG JUNZHE 黄俊哲 with the minimind-scratch codebase, based on the MiniMind project/data format.

This release is intended as a compact learning and experimentation checkpoint for local inference, model-format conversion, and small-model alignment workflows.

Training Line

The released checkpoint uses the completed alignment path:

pretrain -> SFT -> mikky LoRA identity SFT -> DPO

GRPO was only run as a probe and is not used as the final release checkpoint. PPO was skipped because the local reward signal was not strong enough to justify another RL stage.

Identity

The model identity/persona is:

  • Name: mikky
  • Trainer: HUANG JUNZHE 黄俊哲
  • Origin: a small-parameter model trained from this MiniMind-based scratch project

Files

  • mikky-64m.pth: native minimind_scratch state dict, BF16 tensors.
  • model.safetensors: Qwen3-compatible Hugging Face tensor names, BF16 tensors.
  • mikky-64m-bf16.gguf: llama.cpp GGUF export, BF16, not quantized.
  • tokenizer.json, tokenizer_config.json: MiniMind tokenizer files.
  • config.json, generation_config.json: Qwen3-compatible metadata used for conversion and loading.

The final source checkpoint was checkpoints/dpo_768_resume.pth.

Prompt Format

The training code uses MiniMind chat markers:

<|im_start|>user
你的问题<|im_end|>
<|im_start|>assistant

Native Usage

Use the project code for native scratch inference:

python -m minimind_scratch.cli chat \
  --weight out/hf/mikky-64m/mikky-64m.pth \
  --prompt "请用一句话介绍你自己"

llama.cpp / GGUF

The GGUF file is BF16 and intentionally not quantized:

llama-cli -m mikky-64m-bf16.gguf \
  -p "<|im_start|>user\n请用一句话介绍你自己<|im_end|>\n<|im_start|>assistant\n" \
  -n 128

Notes

The GGUF export maps the scratch model to a Qwen3-compatible tensor layout because the model uses RMSNorm, SwiGLU MLP, grouped-query attention, RoPE, and q/k normalization. The GGUF structure and metadata were verified locally. Always verify generation quality in your target runtime before treating the GGUF file as production-ready.

Limitations

  • This is a very small model; expect limited reasoning, math, factual recall, and safety behavior.
  • It is not suitable for high-stakes medical, legal, financial, or safety-critical use.
  • GRPO/PPO are not part of the final release checkpoint.

Dataset And License

This model was trained with the MiniMind small-data recipe from jingyaogong/minimind_dataset. For this release, the dataset reference follows the MiniMind small dataset license: Apache-2.0.

Main data files used by this run:

  • pretrain_t2t_mini.jsonl: pretraining data.
  • sft_t2t_mini.jsonl: supervised fine-tuning data.
  • dpo.jsonl: preference data for DPO.
  • lora_identity_mikky.jsonl: project-authored identity/persona data for mikky.

The model card, exported native checkpoint, Safetensors checkpoint, and GGUF artifact are released under Apache-2.0.

Downloads last month
46
Safetensors
Model size
68.8M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train diverWayne/mikky-64m