Instructions to use diverWayne/mikky-64m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use diverWayne/mikky-64m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="diverWayne/mikky-64m")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("diverWayne/mikky-64m") model = AutoModelForCausalLM.from_pretrained("diverWayne/mikky-64m") - llama-cpp-python
How to use diverWayne/mikky-64m with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="diverWayne/mikky-64m", filename="mikky-64m-bf16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use diverWayne/mikky-64m with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf diverWayne/mikky-64m:BF16 # Run inference directly in the terminal: llama-cli -hf diverWayne/mikky-64m:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf diverWayne/mikky-64m:BF16 # Run inference directly in the terminal: llama-cli -hf diverWayne/mikky-64m:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf diverWayne/mikky-64m:BF16 # Run inference directly in the terminal: ./llama-cli -hf diverWayne/mikky-64m:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf diverWayne/mikky-64m:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf diverWayne/mikky-64m:BF16
Use Docker
docker model run hf.co/diverWayne/mikky-64m:BF16
- LM Studio
- Jan
- vLLM
How to use diverWayne/mikky-64m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "diverWayne/mikky-64m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "diverWayne/mikky-64m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/diverWayne/mikky-64m:BF16
- SGLang
How to use diverWayne/mikky-64m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "diverWayne/mikky-64m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "diverWayne/mikky-64m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "diverWayne/mikky-64m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "diverWayne/mikky-64m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Ollama
How to use diverWayne/mikky-64m with Ollama:
ollama run hf.co/diverWayne/mikky-64m:BF16
- Unsloth Studio new
How to use diverWayne/mikky-64m with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for diverWayne/mikky-64m to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for diverWayne/mikky-64m to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for diverWayne/mikky-64m to start chatting
- Docker Model Runner
How to use diverWayne/mikky-64m with Docker Model Runner:
docker model run hf.co/diverWayne/mikky-64m:BF16
- Lemonade
How to use diverWayne/mikky-64m with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull diverWayne/mikky-64m:BF16
Run and chat with the model
lemonade run user.mikky-64m-BF16
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf diverWayne/mikky-64m:BF16# Run inference directly in the terminal:
llama-cli -hf diverWayne/mikky-64m:BF16Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf diverWayne/mikky-64m:BF16# Run inference directly in the terminal:
./llama-cli -hf diverWayne/mikky-64m:BF16Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf diverWayne/mikky-64m:BF16# Run inference directly in the terminal:
./build/bin/llama-cli -hf diverWayne/mikky-64m:BF16Use Docker
docker model run hf.co/diverWayne/mikky-64m:BF16mikky-64m
mikky-64m is a 63,912,192-parameter small language model named mikky.
It was trained by HUANG JUNZHE 黄俊哲 with the minimind-scratch codebase, based on the MiniMind project/data format.
This release is intended as a compact learning and experimentation checkpoint for local inference, model-format conversion, and small-model alignment workflows.
Training Line
The released checkpoint uses the completed alignment path:
pretrain -> SFT -> mikky LoRA identity SFT -> DPO
GRPO was only run as a probe and is not used as the final release checkpoint. PPO was skipped because the local reward signal was not strong enough to justify another RL stage.
Identity
The model identity/persona is:
- Name: mikky
- Trainer: HUANG JUNZHE 黄俊哲
- Origin: a small-parameter model trained from this MiniMind-based scratch project
Files
mikky-64m.pth: nativeminimind_scratchstate dict, BF16 tensors.model.safetensors: Qwen3-compatible Hugging Face tensor names, BF16 tensors.mikky-64m-bf16.gguf: llama.cpp GGUF export, BF16, not quantized.tokenizer.json,tokenizer_config.json: MiniMind tokenizer files.config.json,generation_config.json: Qwen3-compatible metadata used for conversion and loading.
The final source checkpoint was checkpoints/dpo_768_resume.pth.
Prompt Format
The training code uses MiniMind chat markers:
<|im_start|>user
你的问题<|im_end|>
<|im_start|>assistant
Native Usage
Use the project code for native scratch inference:
python -m minimind_scratch.cli chat \
--weight out/hf/mikky-64m/mikky-64m.pth \
--prompt "请用一句话介绍你自己"
llama.cpp / GGUF
The GGUF file is BF16 and intentionally not quantized:
llama-cli -m mikky-64m-bf16.gguf \
-p "<|im_start|>user\n请用一句话介绍你自己<|im_end|>\n<|im_start|>assistant\n" \
-n 128
Notes
The GGUF export maps the scratch model to a Qwen3-compatible tensor layout because the model uses RMSNorm, SwiGLU MLP, grouped-query attention, RoPE, and q/k normalization. The GGUF structure and metadata were verified locally. Always verify generation quality in your target runtime before treating the GGUF file as production-ready.
Limitations
- This is a very small model; expect limited reasoning, math, factual recall, and safety behavior.
- It is not suitable for high-stakes medical, legal, financial, or safety-critical use.
- GRPO/PPO are not part of the final release checkpoint.
Dataset And License
This model was trained with the MiniMind small-data recipe from
jingyaogong/minimind_dataset.
For this release, the dataset reference follows the MiniMind small dataset license: Apache-2.0.
Main data files used by this run:
pretrain_t2t_mini.jsonl: pretraining data.sft_t2t_mini.jsonl: supervised fine-tuning data.dpo.jsonl: preference data for DPO.lora_identity_mikky.jsonl: project-authored identity/persona data for mikky.
The model card, exported native checkpoint, Safetensors checkpoint, and GGUF artifact are released under Apache-2.0.
- Downloads last month
- 58
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf diverWayne/mikky-64m:BF16# Run inference directly in the terminal: llama-cli -hf diverWayne/mikky-64m:BF16