Blackjack Q-table Policy
This repository contains the trained tabular Q-learning policy used in the IEMS5726 Blackjack Reinforcement Learning Trainer project.
Model
- Algorithm: tabular Q-learning
- Policy artifact:
q_model.json - Actions:
0 = stand,1 = hit,2 = double_down - State representation:
(player_total, dealer_upcard, usable_ace, can_double, true_count_bucket) - Rules: one player versus dealer, finite 6-deck shoe, Hi-Lo running count, true-count bucket, double down, dealer stands on soft 17, natural Blackjack pays 3:2.
Evaluation
The selected policy was evaluated over 1,000,000 Blackjack hands.
| Metric | Value |
|---|---|
| Average reward | -0.0086975 |
| Win rate | 0.433611 |
| Loss rate | 0.481673 |
| Draw rate | 0.084716 |
q_learning_model_comparison.csv compares the selected expert-prior/count policy against 5,000,000-hand fine-tuning variants.
Files
q_model.json: trained Q-table policy and evaluation metadatapolicy_table.csv: exported policy tablepolicy_heatmap.svg: policy visualizationtraining_history.csv: training-history file generated by the training pipelineq_learning_model_comparison.csv: final model comparison table
Usage
The browser demo can load this model JSON directly from the public URL. In the project submission, place the public link in application/model_link.txt.