Alpha Four - Figure 1 Figure 1 (Click to Interact)Overview plot: shows overview of the training process with parameters
Alpha Four - Figure 2 Figure 2 (Click to Interact)Performance plot: shows each overall performance in trainig phase
Alpha Four - Figure 3 Figure 3 (Click to Interact)2D Win Contribution plot: trend shows each model's performance
Alpha Four - Figure 4 Figure 4 (Click to Interact)3D Win Contribution map: shows each model's performance in 3D
Alpha Four - Figure 5 Figure 5 (Click to Interact)3D Win Prediction map: predict future model's performance using KNN
Try Alpha Four AI achieved super human-level performance.

Project Information

  • Category: Web App Development & Deep Learning
  • Personal Project
  • Technology Used: Reinforcement Learning (Q-Learning), Deep Neural Networks, CUDA.jit, Python, Flask, PyTorch, Amazon Web Services (AWS)
  • Date: September 2021 - March 2025
  • Github (AlphaFour): github/Alpha_Four
Download Alpha Four

Project Overview

This project centers on developing Connect Four game featuring an intelligent, adaptive opponent. The standout innovation is the hybrid reinforcement learning system that fuses Deep Q-Networks (DQN) with an optimized, dynamic Monte Carlo Tree Search (MCTS), delivering both rapid state evaluation and deep, robust decision-making.

Highlights include:

  • Hybrid Architecture: Seamlessly integrates DQN for rapid state evaluation with MCTS for robust action selection, achieving a balance between speed and strategic depth.
  • Dynamic Leveling: Features an adaptive mechanism that scales MCTS simulations based on a fixed 90% win rate threshold, ensuring additional computational resources are deployed only when the agent consistently performs at a high level.
  • Optimized Training: Boosts training efficiency by blending TD targets from DQN with MCTS rollouts, and accelerates MCTS simulations using CUDA.jit—yielding up to a 1000% speedup over CPU implementations.
  • State Management & Checkpointing: Maintains dynamic training state in memory (tracking simulation levels, exploration rate, and episodes per level) with periodic disk checkpointing every 100 episodes to ensure reliable training resumption.