Dota 2 Winning Team Prediction


Business Agenda:

In the increasingly popular area of Esports, there is an extremely popular game called Defence of the Ancients 2 (Dota 2). While there are many who play the game simply for enjoyment, at the top echelons the game is very competitive and large sums of money are on the line for every game. In 2019, a best of 5 game series decided the difference between a $15.6 million dollar first prize and a second prize of $4.5 million.

The game begins with both sides choosing 5 heroes for their team from a large hero pool of over 100 unique heroes. With this much on the line, could a machine learning approach determine who has an advantage before the game even starts based on the heroes that are chosen? Apparently, we can! Using historical data based previous match results, we can create a model that predicts the likelihood of a team winning based on the hero matchups between the two teams.

Created a Project to predict the winner of a Dota 2 Professional Match based on hero composition.

  • Pulled about 5700 professional matches from OpenDota API utilizing requests
  • Substituted hero ids with hero names in the dataframe
  • Created a dummy variable dataframe for the hero picks for each side
  • Ran and optimized Random Forest and Light GBM Classifier Models
  • Pickled the Random Forest Model
  • Created a Django Interface for Users to use the model

Steps taken:

Data Collection
Utilizing the Open Dota API and requests pulled the matchids for about 5700 professional games and then ran a loop on matchids to pull the following for each match:

  • Radiant Hero 1
  • Radiant Hero 2
  • Radiant Hero 3
  • Radiant Hero 4
  • Radiant Hero 5
  • Dire Hero 1
  • Dire Hero 2
  • Dire Hero 3
  • Dire Hero 4
  • Dire Hero 5
  • Radiant Victory

Data Cleaning and Feature Engineering:
After scraping the data I needed to clean it up for use in the model. Made the following changes:

  • Cleaned up various issues related to apostrophes and spaces in the Hero names
  • Did One hot encoding for hero name and side combinations
    • First created empty sets for each combination of hero and side
    • Created a loop which edited the hero name with the cleanup issues mentioned above
    • Checked if hero name appeared in one of five columns for each side
      • Appended a 1 if hero was present on the side for that match and 0 if not
    • Created a data frame off the set for hero and side combination and transposed said data frame
    • Concatenated the data frames for each side

Model Building:
Created a LightGBM model and a Random Forest Model to predict which side would win the game

Model Performance:
The Random Forest Model proved to be the most effective and outperformed the Light GBM model . Therefore chose to package the Random Forest model.

  • Random Forest: 70% Accuracy
  • Light GBM = 67% Accuracy
  • Category: Classifier Prediction Algorithm
  • Date: July 2020
  • Github: View on Github