MahjongLM 1M
MahjongLM 1M is a compact Qwen3-style causal language model trained for Mahjong game-log token generation. It uses the MahjongLM WordLevel tokenizer and a 16,384-token context window.
Training
- Dataset:
mitsutani/mahjonglm-dataset - Source data span: Tenhou game logs from 2011 through 2024
- Views:
view_complete,view_imperfect_0toview_imperfect_3, andview_omniscient - Omniscient view: adds a
wallblock immediately afterround_start, followed by the reconstructed 136-tile wall including red fives (m0,p0,s0) - Architecture: plain Qwen3 causal LM, without XSA, gated attention, attention residuals, or Mamba
- Training length: 0.2 epoch
- Batch settings: train/eval batch size 8, gradient accumulation 64
- Optimizer: Muon+
- Learning rate: 0.03, linear decay, 200 warmup steps
- W&B project:
mahjongLM_qwen3_plain_omniscient_sweep
Metrics
- run/parameter_count:
1049792 - trainer/global_step:
8519 - eval/perplexity:
3.392782144275206 - eval/loss:
1.2216502763358188
Prompt Format
Training adds <bos> at the beginning and <eos> at the end. A prompt should therefore usually start with <bos>, followed by the rule tokens, one view token, then game_start.
Basic four-player complete-information prompt:
<bos> rule_player_4 rule_length_hanchan view_complete game_start
Four-player imperfect-information prompt from player 0's perspective:
<bos> rule_player_4 rule_length_hanchan view_imperfect_0 game_start
Omniscient prompt:
<bos> rule_player_4 rule_length_hanchan view_omniscient game_start round_start wall
For view_omniscient, the model is expected to continue the wall block with 136 tile tokens before the normal round metadata. Tile tokens are the 34 tile types plus red fives: m0, p0, and s0.
Token Grammar Notes
rule_player_3/rule_player_4describes sanma or yonma.rule_length_tonpu/rule_length_hanchandescribes game length.round_startopens a hand andround_endcloses the hand result section.game_startandgame_enddelimit the full game log.- Actions are represented by
opt_*, then matchingtake_*orpass_*decisions in the same option order. - Win details begin with
hule_{seat}; score deltas are emitted once after all win details.
- Downloads last month
- 65
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Collection including mitsutani/mahjonglm-1m
Collection
Mahjong Language Model, trained on mahjong game rollout datasets. • 4 items • Updated