Instructions to use diverWayne/mikky-64m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use diverWayne/mikky-64m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="diverWayne/mikky-64m")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("diverWayne/mikky-64m")
model = AutoModelForCausalLM.from_pretrained("diverWayne/mikky-64m")

llama-cpp-python

How to use diverWayne/mikky-64m with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="diverWayne/mikky-64m",
	filename="mikky-64m-bf16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use diverWayne/mikky-64m with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf diverWayne/mikky-64m:BF16
# Run inference directly in the terminal:
llama-cli -hf diverWayne/mikky-64m:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf diverWayne/mikky-64m:BF16
# Run inference directly in the terminal:
llama-cli -hf diverWayne/mikky-64m:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf diverWayne/mikky-64m:BF16
# Run inference directly in the terminal:
./llama-cli -hf diverWayne/mikky-64m:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf diverWayne/mikky-64m:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf diverWayne/mikky-64m:BF16

Use Docker

docker model run hf.co/diverWayne/mikky-64m:BF16

LM Studio
Jan

vLLM

How to use diverWayne/mikky-64m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "diverWayne/mikky-64m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "diverWayne/mikky-64m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/diverWayne/mikky-64m:BF16

SGLang

How to use diverWayne/mikky-64m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "diverWayne/mikky-64m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "diverWayne/mikky-64m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "diverWayne/mikky-64m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "diverWayne/mikky-64m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use diverWayne/mikky-64m with Ollama:
```
ollama run hf.co/diverWayne/mikky-64m:BF16
```

Unsloth Studio new

How to use diverWayne/mikky-64m with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for diverWayne/mikky-64m to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for diverWayne/mikky-64m to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for diverWayne/mikky-64m to start chatting

Docker Model Runner
How to use diverWayne/mikky-64m with Docker Model Runner:
```
docker model run hf.co/diverWayne/mikky-64m:BF16
```

Lemonade

How to use diverWayne/mikky-64m with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull diverWayne/mikky-64m:BF16

Run and chat with the model

lemonade run user.mikky-64m-BF16

List all available models

lemonade list

mikky-64m

mikky-64m is a 63,912,192-parameter small language model named mikky. It was trained by HUANG JUNZHE 黄俊哲 with the minimind-scratch codebase, based on the MiniMind project/data format.

This release is intended as a compact learning and experimentation checkpoint for local inference, model-format conversion, and small-model alignment workflows.

Training Line

The released checkpoint uses the completed alignment path:

pretrain -> SFT -> mikky LoRA identity SFT -> DPO

GRPO was only run as a probe and is not used as the final release checkpoint. PPO was skipped because the local reward signal was not strong enough to justify another RL stage.

Identity

The model identity/persona is:

Name: mikky
Trainer: HUANG JUNZHE 黄俊哲
Origin: a small-parameter model trained from this MiniMind-based scratch project

Files

mikky-64m.pth: native minimind_scratch state dict, BF16 tensors.
model.safetensors: Qwen3-compatible Hugging Face tensor names, BF16 tensors.
mikky-64m-bf16.gguf: llama.cpp GGUF export, BF16, not quantized.
tokenizer.json, tokenizer_config.json: MiniMind tokenizer files.
config.json, generation_config.json: Qwen3-compatible metadata used for conversion and loading.

The final source checkpoint was checkpoints/dpo_768_resume.pth.

Prompt Format

The training code uses MiniMind chat markers:

<|im_start|>user
你的问题<|im_end|>
<|im_start|>assistant

Native Usage

Use the project code for native scratch inference:

python -m minimind_scratch.cli chat \
  --weight out/hf/mikky-64m/mikky-64m.pth \
  --prompt "请用一句话介绍你自己"

llama.cpp / GGUF

The GGUF file is BF16 and intentionally not quantized:

llama-cli -m mikky-64m-bf16.gguf \
  -p "<|im_start|>user\n请用一句话介绍你自己<|im_end|>\n<|im_start|>assistant\n" \
  -n 128

Notes

The GGUF export maps the scratch model to a Qwen3-compatible tensor layout because the model uses RMSNorm, SwiGLU MLP, grouped-query attention, RoPE, and q/k normalization. The GGUF structure and metadata were verified locally. Always verify generation quality in your target runtime before treating the GGUF file as production-ready.

Limitations

This is a very small model; expect limited reasoning, math, factual recall, and safety behavior.
It is not suitable for high-stakes medical, legal, financial, or safety-critical use.
GRPO/PPO are not part of the final release checkpoint.

Dataset And License

This model was trained with the MiniMind small-data recipe from jingyaogong/minimind_dataset. For this release, the dataset reference follows the MiniMind small dataset license: Apache-2.0.

Main data files used by this run:

pretrain_t2t_mini.jsonl: pretraining data.
sft_t2t_mini.jsonl: supervised fine-tuning data.
dpo.jsonl: preference data for DPO.
lora_identity_mikky.jsonl: project-authored identity/persona data for mikky.

The model card, exported native checkpoint, Safetensors checkpoint, and GGUF artifact are released under Apache-2.0.

Downloads last month: 46

Safetensors

Model size

68.8M params

Tensor type

BF16

diverWayne
/

mikky-64m