Blackjack Q-table Policy

This repository contains the trained tabular Q-learning policy used in the IEMS5726 Blackjack Reinforcement Learning Trainer project.

Model

  • Algorithm: tabular Q-learning
  • Policy artifact: q_model.json
  • Actions: 0 = stand, 1 = hit, 2 = double_down
  • State representation: (player_total, dealer_upcard, usable_ace, can_double, true_count_bucket)
  • Rules: one player versus dealer, finite 6-deck shoe, Hi-Lo running count, true-count bucket, double down, dealer stands on soft 17, natural Blackjack pays 3:2.

Evaluation

The selected policy was evaluated over 1,000,000 Blackjack hands.

Metric Value
Average reward -0.0086975
Win rate 0.433611
Loss rate 0.481673
Draw rate 0.084716

q_learning_model_comparison.csv compares the selected expert-prior/count policy against 5,000,000-hand fine-tuning variants.

Files

  • q_model.json: trained Q-table policy and evaluation metadata
  • policy_table.csv: exported policy table
  • policy_heatmap.svg: policy visualization
  • training_history.csv: training-history file generated by the training pipeline
  • q_learning_model_comparison.csv: final model comparison table

Usage

The browser demo can load this model JSON directly from the public URL. In the project submission, place the public link in application/model_link.txt.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading