Instructions to use Sido/makellm-ja-char-75m-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sido/makellm-ja-char-75m-chat with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Sido/makellm-ja-char-75m-chat",
	filename="makellm-ja-char-75m-chat.f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Sido/makellm-ja-char-75m-chat with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Sido/makellm-ja-char-75m-chat:F16
# Run inference directly in the terminal:
llama cli -hf Sido/makellm-ja-char-75m-chat:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Sido/makellm-ja-char-75m-chat:F16
# Run inference directly in the terminal:
llama cli -hf Sido/makellm-ja-char-75m-chat:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Sido/makellm-ja-char-75m-chat:F16
# Run inference directly in the terminal:
./llama-cli -hf Sido/makellm-ja-char-75m-chat:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Sido/makellm-ja-char-75m-chat:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Sido/makellm-ja-char-75m-chat:F16

Use Docker

docker model run hf.co/Sido/makellm-ja-char-75m-chat:F16

LM Studio
Jan

vLLM

How to use Sido/makellm-ja-char-75m-chat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sido/makellm-ja-char-75m-chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sido/makellm-ja-char-75m-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Sido/makellm-ja-char-75m-chat:F16

Ollama
How to use Sido/makellm-ja-char-75m-chat with Ollama:
```
ollama run hf.co/Sido/makellm-ja-char-75m-chat:F16
```

Unsloth Studio

How to use Sido/makellm-ja-char-75m-chat with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Sido/makellm-ja-char-75m-chat to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Sido/makellm-ja-char-75m-chat to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Sido/makellm-ja-char-75m-chat to start chatting

Atomic Chat new
Docker Model Runner
How to use Sido/makellm-ja-char-75m-chat with Docker Model Runner:
```
docker model run hf.co/Sido/makellm-ja-char-75m-chat:F16
```

Lemonade

How to use Sido/makellm-ja-char-75m-chat with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Sido/makellm-ja-char-75m-chat:F16

Run and chat with the model

lemonade run user.makellm-ja-char-75m-chat-F16

List all available models

lemonade list

makellm-ja-char-75m-chat

🇯🇵 日本語は後半に書いてあります（下へスクロール）。 English version first, Japanese version below.

English

A from-scratch, character-level Japanese GPT-2 (about 75M parameters), chat-finetuned. Built entirely from scratch — no pretrained Hugging Face models or tokenizers. The model, data preprocessing, training, and GGUF conversion are all hand-written.

⚠️ This is a small, educational model. It produces grammatically natural Japanese small talk and greetings, but do not expect factual accuracy (dates, proper nouns, etc. are often wrong or fabricated). Intended for research, learning, and experimentation.

Author

Tsutomu Uchida (内田勉)

✉️ uchida@mazariba.co.jp
X: @sidodtv

Model details

Field	Value
Architecture	GPT-2 style (learned absolute position embeddings, pre-LayerNorm, weight-tied head, exact GELU)
Parameters	~75M
Layers / dim / heads	n_layer=10 / n_embd=768 / n_head=12
Context length	256
Vocabulary	7,210 (character-level) = 7,207 characters + 3 special tokens
Special tokens	`<
Format	GGUF (F16)
Trained on	AMD Radeon 8060S (gfx1151) + PyTorch ROCm

Character-level: 1 token = 1 Unicode character. No subword tokenization (BPE/SentencePiece).

Training data

All preprocessed at the character level (quality filters: Japanese-character ratio, deduplication, minimum length).

Pretraining (~4.8B characters)

CC-100 Japanese — range3/cc100-ja (web text, main source)
Japanese Wikipedia — wikimedia/wikipedia (20231101.ja)
Aozora Bunko — globis-university/aozorabunko-clean (public-domain books)

Chat finetuning (39,591 conversations)

kunishou/oasst1-89k-ja — multi-turn dialogue (main)
kunishou/databricks-dolly-15k-ja — instruction → output
Formatted as <|user|>…<|bot|>…<|end|>, with loss applied only to the response part.

Usage

This model uses a custom character-level tokenizer, so the chat format (<|user|>…<|bot|>…<|end|>) must be provided via the included Modelfile. The ollama run hf.co/... direct pull does not work correctly.

A. From the Ollama registry (easiest — if published there)

If the publisher has pushed to ollama.com, a single command works (no Modelfile needed):

ollama run <owner>/makellm-ja-char-75m-chat "おはよう"

B. From this Hugging Face repo (gguf + Modelfile)

# 1) Download the .gguf and the Modelfile
huggingface-cli download Sido/makellm-ja-char-75m-chat \
    makellm-ja-char-75m-chat.f16.gguf Modelfile --local-dir ./makellm

# 2) Register with Ollama and run
cd makellm
ollama create makellm -f Modelfile
ollama run makellm "おはよう"

C. llama.cpp

A Jinja chat template is embedded in the GGUF.

llama-cli -m makellm-ja-char-75m-chat.f16.gguf --jinja -p "おはよう"
# Plain generation:
llama-cli -m makellm-ja-char-75m-chat.f16.gguf -p "むかしむかし"

D. LM Studio

LM Studio has no separate registry; it uses Hugging Face as its catalog, so this repo works directly:

In-app search (Ctrl/⌘+Shift+M) for makellm-ja-char-75m-chat, or "Use this model → LM Studio" on the HF page.
For a local gguf: lms import makellm-ja-char-75m-chat.f16.gguf (or place it under ~/.lmstudio/models/<publisher>/<model>/).
The embedded Jinja chat template is auto-detected. If generation doesn't stop, add <|user|> as a stop string (<|end|> is already the EOS).

Notes on the character-level tokenizer

tokenizer.ggml.model is llama (SPM). The vocabulary contains only single-character tokens, so no merges fire — it matches the training-time character tokenizer exactly.
The ASCII space (U+0020) is aliased to U+2581 (▁) (an SPM requirement; without it, prompts containing spaces crash).
The 3 special tokens are CONTROL type: segmented atomically before the base tokenizer and hidden from output.
Keep num_ctx ≤ 256 (the position embeddings have only 256 rows).

Limitations

Low factuality (~75M small model). It mainly reproduces the "shape" of small talk, greetings, and short Q&A.
llama.cpp / Ollama's GPT-2 path uses tanh-approximation GELU, while this model was trained with exact (erf) GELU, so there is a tiny numerical difference at inference (sampled output is natural; not a practical problem).
It may reflect biases or produce inappropriate output stemming from the training data.

License

Released under CC-BY-SA-4.0 (the training data includes CC-BY-SA Wikipedia and databricks-dolly-15k-ja, so ShareAlike is respected). When redistributing or modifying, please share under the same terms and provide attribution.

Attribution (per CC-BY-SA)

CC-100 (range3/cc100-ja) — from CommonCrawl. The University of Edinburgh SMT claims no IP on the corpus preparation; the content is subject to the Common Crawl terms of use. Cite: Conneau et al. 2020 (XLM-R) / Wenzek et al. 2020 (CCNet).
Wikipedia (wikimedia/wikipedia) — © Wikipedia contributors, CC-BY-SA.
Aozora Bunko (globis-university/aozorabunko-clean) — public domain.
oasst1-89k-ja (kunishou/oasst1-89k-ja) — from OpenAssistant, Apache-2.0.
databricks-dolly-15k-ja (kunishou/databricks-dolly-15k-ja) — CC-BY-SA-3.0.

Whether model weights constitute a derivative work of training data is legally unsettled. This is a conservative (ShareAlike-respecting) choice, not legal advice.

ku-nlp/gpt2-{small,medium,large}-japanese-char are similar Japanese character-level GPT-2 models. This model is an independent from-scratch implementation, additionally chat-finetuned on conversational data.

日本語

フルスクラッチで実装した文字単位（BPE不使用）の日本語 GPT-2（約75Mパラメータ）をチャット用に微調整したモデルです。 Hugging Face の既製モデルやトークナイザを一切使わず、モデル・データ前処理・学習・GGUF変換まですべて自作しています。

⚠️ 小規模な学習用モデルです。文法的に自然な日本語の雑談・挨拶はできますが、事実の正確性は期待できません（年号・固有名詞などはしばしば誤り／創作します）。研究・学習・実験用途向けです。

作者

内田勉 (Tsutomu Uchida)

✉️ uchida@mazariba.co.jp
X: @sidodtv

モデル詳細

項目	値
アーキテクチャ	GPT-2 型（学習済み絶対位置埋め込み・pre-LayerNorm・重み共有head・厳密GELU）
パラメータ数	約 75M
レイヤ数 / 次元 / ヘッド	n_layer=10 / n_embd=768 / n_head=12
コンテキスト長	256
語彙	7,210（文字単位） = 7,207 文字 + 特殊トークン3つ
特殊トークン	`<
配布形式	GGUF（F16）
学習環境	AMD Radeon 8060S (gfx1151) + PyTorch ROCm

文字単位（character-level）なので、1 トークン＝1 Unicode 文字です。サブワード（BPE/SentencePiece）を使いません。

学習データ

すべて文字単位で前処理（日本語比率・重複除去・最小長などの品質フィルタを適用）。

事前学習（約48億字）

CC-100 日本語 — range3/cc100-ja（Web テキスト・主軸）
日本語 Wikipedia — wikimedia/wikipedia（20231101.ja）
青空文庫 — globis-university/aozorabunko-clean（著作権切れ書籍）

チャット微調整（39,591 会話）

kunishou/oasst1-89k-ja — 多ターン対話（雑談主力）
kunishou/databricks-dolly-15k-ja — 指示応答（instruction→output）
<|user|>…<|bot|>…<|end|> 形式に整形し、応答部分のみに loss をかけて学習。

使い方

このモデルはカスタム文字単位トークナイザのため、チャット書式（<|user|>…<|bot|>…<|end|>）は付属 Modelfile で与える必要があります。ollama run hf.co/... の直接 pull では正しく動作しません。

A. Ollama レジストリから（最も簡単・公開済みの場合）

公開者が ollama.com に push 済みなら、1コマンドで動きます（Modelfile 不要）:

ollama run <owner>/makellm-ja-char-75m-chat "おはよう"

B. この Hugging Face リポジトリから（gguf + Modelfile）

# 1) .gguf と Modelfile を取得
huggingface-cli download Sido/makellm-ja-char-75m-chat \
    makellm-ja-char-75m-chat.f16.gguf Modelfile --local-dir ./makellm

# 2) Ollama に登録して実行
cd makellm
ollama create makellm -f Modelfile
ollama run makellm "おはよう"

C. llama.cpp

GGUF に Jinja チャットテンプレートを埋め込んであります。

llama-cli -m makellm-ja-char-75m-chat.f16.gguf --jinja -p "おはよう"
# 単純な生成なら:
llama-cli -m makellm-ja-char-75m-chat.f16.gguf -p "むかしむかし"

D. LM Studio

LM Studio は独自の配布先を持たず Hugging Face をカタログとして使うので、本リポジトリをそのまま利用できます:

アプリ内検索（Ctrl/⌘+Shift+M）で makellm-ja-char-75m-chat を検索 → ダウンロード。または HF モデルページの「Use this model → LM Studio」。
ローカルの gguf を使う場合: lms import makellm-ja-char-75m-chat.f16.gguf（または ~/.lmstudio/models/<publisher>/<model>/ に配置）。
チャット書式は GGUF 埋め込みの Jinja テンプレートが自動検出されます。途中で止まらない場合は stop に <|user|> を追加（<|end|> は EOS 設定済み）。

文字単位トークナイザの注意点

GGUF の tokenizer.ggml.model は llama（SPM）。語彙が1文字トークンのみなのでマージは起きず、学習時の文字トークナイザと一致します。
半角スペース(U+0020)は U+2581（▁）にエイリアスしてあります（SPM の仕様。これが無いと空白入力でクラッシュします）。
特殊トークン3つは CONTROL 型で、前段でアトミックに切り出され、出力には表示されません。
num_ctx は 256 以下にしてください（位置埋め込みが256行のため）。

制限事項

事実性は低い（約75Mの小規模モデル）。雑談・挨拶・短いQAの「型」を再現する程度。
llama.cpp / Ollama の GPT-2 実装は tanh 近似 GELU を使うため、厳密(erf) GELU で学習した本モデルとは推論時にごく僅かな数値差が出ます（サンプリング出力は自然で、実用上は問題ありません）。
学習データ由来の偏り・不適切な出力が生じる可能性があります。

ライセンス

本モデルは CC-BY-SA-4.0 で公開します（学習データに CC-BY-SA の Wikipedia・databricks-dolly-15k-ja を含むため、継承〔ShareAlike〕を尊重）。再配布・改変時は同条件での共有と帰属表示をお願いします。

帰属・出典（CC-BY-SA に基づく表示）

CC-100 (range3/cc100-ja) — CommonCrawl 由来。エディンバラ大 SMT は整備物への権利を主張せず、内容は Common Crawl の利用規約に従う。引用論文: Conneau et al. 2020 (XLM-R) / Wenzek et al. 2020 (CCNet)。
青空文庫 (globis-university/aozorabunko-clean) — 著作権切れ（パブリックドメイン相当）。
oasst1-89k-ja (kunishou/oasst1-89k-ja) — OpenAssistant 由来, Apache-2.0。
databricks-dolly-15k-ja (kunishou/databricks-dolly-15k-ja) — CC-BY-SA-3.0。

※ 重みが学習データの派生著作物に当たるかは法的に未確定です。本表記は安全側（ShareAlike 尊重）の選択であり、法的助言ではありません。

Sido
/

makellm-ja-char-75m-chat

makellm-ja-char-75m-chat

English

Author

Model details

Training data

Usage

A. From the Ollama registry (easiest — if published there)

B. From this Hugging Face repo (gguf + Modelfile)

C. llama.cpp

D. LM Studio

Notes on the character-level tokenizer

Limitations

License

Attribution (per CC-BY-SA)

Related

日本語

作者

モデル詳細

学習データ

使い方

A. Ollama レジストリから（最も簡単・公開済みの場合）

B. この Hugging Face リポジトリから（gguf + Modelfile）

C. llama.cpp

D. LM Studio

文字単位トークナイザの注意点

制限事項

ライセンス

帰属・出典（CC-BY-SA に基づく表示）

関連

Datasets used to train Sido/makellm-ja-char-75m-chat