Blackjack Q-table Policy

This repository contains the trained tabular Q-learning policy used in the IEMS5726 Blackjack Reinforcement Learning Trainer project.

Model

Algorithm: tabular Q-learning
Policy artifact: q_model.json
Actions: 0 = stand, 1 = hit, 2 = double_down
State representation: (player_total, dealer_upcard, usable_ace, can_double, true_count_bucket)
Rules: one player versus dealer, finite 6-deck shoe, Hi-Lo running count, true-count bucket, double down, dealer stands on soft 17, natural Blackjack pays 3:2.

Evaluation

The selected policy was evaluated over 1,000,000 Blackjack hands.

Metric	Value
Average reward	-0.0086975
Win rate	0.433611
Loss rate	0.481673
Draw rate	0.084716

q_learning_model_comparison.csv compares the selected expert-prior/count policy against 5,000,000-hand fine-tuning variants.

Files

q_model.json: trained Q-table policy and evaluation metadata
policy_table.csv: exported policy table
policy_heatmap.svg: policy visualization
training_history.csv: training-history file generated by the training pipeline
q_learning_model_comparison.csv: final model comparison table

Usage

The browser demo can load this model JSON directly from the public URL. In the project submission, place the public link in application/model_link.txt.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning