Dota 2 Prediction Project

Description

Business Agenda:

In the increasingly popular area of Esports, there is an extremely popular game called Defence of the Ancients 2 (Dota 2). While there are many who play the game simply for enjoyment, at the top echelons the game is very competitive and large sums of money are on the line for every game. In 2019, a best of 5 game series decided the difference between a $15.6 million dollar first prize and a second prize of $4.5 million.

The game begins with both sides choosing 5 heroes for their team from a large hero pool of over 100 unique heroes. With this much on the line, could a machine learning approach determine who has an advantage before the game even starts based on the heroes that are chosen? Apparently, we can! Using historical data based previous match results, we can create a model that predicts the likelihood of a team winning based on the hero matchups between the two teams.

Overview:
Created a Project to predict the winner of a Dota 2 Professional Match based on hero composition.

Pulled about 5700 professional matches from OpenDota API utilizing requests
Substituted hero ids with hero names in the dataframe
Created a dummy variable dataframe for the hero picks for each side
Ran and optimized Random Forest and Light GBM Classifier Models
Pickled the Random Forest Model
Created a Django Interface for Users to use the model

Steps taken:

Data Collection
Utilizing the Open Dota API and requests pulled the matchids for about 5700 professional games and then ran a loop on matchids to pull the following for each match:

Radiant Hero 1
Radiant Hero 2
Radiant Hero 3
Radiant Hero 4
Radiant Hero 5
Dire Hero 1
Dire Hero 2
Dire Hero 3
Dire Hero 4
Dire Hero 5
Radiant Victory

Data Cleaning and Feature Engineering:
After scraping the data I needed to clean it up for use in the model. Made the following changes:

Cleaned up various issues related to apostrophes and spaces in the Hero names
Did One hot encoding for hero name and side combinations

First created empty sets for each combination of hero and side
Created a loop which edited the hero name with the cleanup issues mentioned above
Checked if hero name appeared in one of five columns for each side

Appended a 1 if hero was present on the side for that match and 0 if not

Created a data frame off the set for hero and side combination and transposed said data frame
Concatenated the data frames for each side

Model Building:
Created a LightGBM model and a Random Forest Model to predict which side would win the game

Model Performance:
The Random Forest Model proved to be the most effective and outperformed the Light GBM model . Therefore chose to package the Random Forest model.

Random Forest: 70% Accuracy
Light GBM = 67% Accuracy

Test Now!

Details

Category: Classifier Prediction Algorithm
Date: July 2020
Github: View on Github

Dota 2 Winning Team Prediction