You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

MahjongLM 1M

MahjongLM 1M is a compact Qwen3-style causal language model trained for Mahjong game-log token generation. It uses the MahjongLM WordLevel tokenizer and a 16,384-token context window.

Training

  • Dataset: mitsutani/mahjonglm-dataset
  • Source data span: Tenhou game logs from 2011 through 2024
  • Views: view_complete, view_imperfect_0 to view_imperfect_3, and view_omniscient
  • Omniscient view: adds a wall block immediately after round_start, followed by the reconstructed 136-tile wall including red fives (m0, p0, s0)
  • Architecture: plain Qwen3 causal LM, without XSA, gated attention, attention residuals, or Mamba
  • Training length: 0.2 epoch
  • Batch settings: train/eval batch size 8, gradient accumulation 64
  • Optimizer: Muon+
  • Learning rate: 0.03, linear decay, 200 warmup steps
  • W&B project: mahjongLM_qwen3_plain_omniscient_sweep

Metrics

  • run/parameter_count: 1049792
  • trainer/global_step: 8519
  • eval/perplexity: 3.392782144275206
  • eval/loss: 1.2216502763358188

Prompt Format

Training adds <bos> at the beginning and <eos> at the end. A prompt should therefore usually start with <bos>, followed by the rule tokens, one view token, then game_start.

Basic four-player complete-information prompt:

<bos> rule_player_4 rule_length_hanchan view_complete game_start

Four-player imperfect-information prompt from player 0's perspective:

<bos> rule_player_4 rule_length_hanchan view_imperfect_0 game_start

Omniscient prompt:

<bos> rule_player_4 rule_length_hanchan view_omniscient game_start round_start wall

For view_omniscient, the model is expected to continue the wall block with 136 tile tokens before the normal round metadata. Tile tokens are the 34 tile types plus red fives: m0, p0, and s0.

Token Grammar Notes

  • rule_player_3 / rule_player_4 describes sanma or yonma.
  • rule_length_tonpu / rule_length_hanchan describes game length.
  • round_start opens a hand and round_end closes the hand result section.
  • game_start and game_end delimit the full game log.
  • Actions are represented by opt_*, then matching take_* or pass_* decisions in the same option order.
  • Win details begin with hule_{seat}; score deltas are emitted once after all win details.
Downloads last month
65
Safetensors
Model size
1.05M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including mitsutani/mahjonglm-1m