You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

MahjongLM 1M

MahjongLM 1M is a compact Qwen3-style causal language model trained for Mahjong game-log token generation. It uses the MahjongLM WordLevel tokenizer and a 16,384-token context window.

Training

Dataset: mitsutani/mahjonglm-dataset
Source data span: Tenhou game logs from 2011 through 2024
Views: view_complete, view_imperfect_0 to view_imperfect_3, and view_omniscient
Omniscient view: adds a wall block immediately after round_start, followed by the reconstructed 136-tile wall including red fives (m0, p0, s0)
Architecture: plain Qwen3 causal LM, without XSA, gated attention, attention residuals, or Mamba
Training length: 0.2 epoch
Batch settings: train/eval batch size 8, gradient accumulation 64
Optimizer: Muon+
Learning rate: 0.03, linear decay, 200 warmup steps
W&B project: mahjongLM_qwen3_plain_omniscient_sweep

Metrics

run/parameter_count: 1049792
trainer/global_step: 8519
eval/perplexity: 3.392782144275206
eval/loss: 1.2216502763358188

Prompt Format

Training adds <bos> at the beginning and <eos> at the end. A prompt should therefore usually start with <bos>, followed by the rule tokens, one view token, then game_start.

Basic four-player complete-information prompt:

<bos> rule_player_4 rule_length_hanchan view_complete game_start

Four-player imperfect-information prompt from player 0's perspective:

<bos> rule_player_4 rule_length_hanchan view_imperfect_0 game_start

Omniscient prompt:

<bos> rule_player_4 rule_length_hanchan view_omniscient game_start round_start wall

For view_omniscient, the model is expected to continue the wall block with 136 tile tokens before the normal round metadata. Tile tokens are the 34 tile types plus red fives: m0, p0, and s0.

Token Grammar Notes

rule_player_3 / rule_player_4 describes sanma or yonma.
rule_length_tonpu / rule_length_hanchan describes game length.
round_start opens a hand and round_end closes the hand result section.
game_start and game_end delimit the full game log.
Actions are represented by opt_*, then matching take_* or pass_* decisions in the same option order.
Win details begin with hule_{seat}; score deltas are emitted once after all win details.

Downloads last month: 65

Safetensors

Model size

1.05M params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including mitsutani/mahjonglm-1m

mahjonglm

Collection

Mahjong Language Model, trained on mahjong game rollout datasets. • 4 items • Updated 9 days ago