Basketball games in the NBA are nice for a logistic regression approach because they only have two possible outcomes: you win or you lose. We dont have to deal with draws, even in the regular season.
We can use a individual team effects in a logistic regression approach to estimate the likelihood that a specific team gets a win against another given team. Clearly stated, we might ask based on the last 3 seasons of games, what’s the probability that the Toronto Raptors beat in the Golden State Warriors for a game played in the Bay Area?
The nbastatR
package for R
is a conventient API wrapper to pull rich player and game statistics from the NBA Stats API and Basketball-Reference (as well as other sources for non-NBA basketball leagues). You can install the package through github with devtools.
We’ll load in the team and player reference tables, as well as game data for the past 3 seasons. We’ll then transform the data structure somewhat in preparation for analysis. To starts, let’s create dummy variables of the winners and losers for each game.
## Acquiring NBA basic player game logs for the 2018-19 Regular Season
## Warning: All elements of `...` must be named.
## Did you want `dataTables = c(typeSeason, dateGame, idGame, numberGameTeamSeason, nameTeam,
## idTeam, isB2B, isB2BFirst, isB2BSecond, locationGame, slugMatchup,
## slugTeam, countDaysRestTeam, countDaysNextGameTeam, slugOpponent,
## slugTeamWinner, slugTeamLoser, outcomeGame, namePlayer, numberGamePlayerSeason,
## countDaysRestPlayer, countDaysNextGamePlayer, idPlayer, isWin,
## fgm, fga, pctFG, fg3m, fg3a, pctFG3, pctFT, hasVideo, fg2m,
## fg2a, pctFG2, minutes, ftm, fta, oreb, dreb, treb, ast, stl,
## blk, tov, pf, pts, plusminus, fpts, urlTeamSeasonLogo, urlPlayerStats,
## urlPlayerThumbnail, urlPlayerHeadshot, urlPlayerActionPhoto,
## urlPlayerPhoto)`?
## Warning: `cols` is now required when using unnest().
## Please use `cols = c(dataTables)`
## Acquiring NBA basic team game logs for the 2018-19 Regular Season
## Warning: All elements of `...` must be named.
## Did you want `dataTables = c(typeSeason, dateGame, idGame, numberGameTeamSeason, nameTeam,
## idTeam, isB2B, isB2BFirst, isB2BSecond, locationGame, slugMatchup,
## slugTeam, countDaysRestTeam, countDaysNextGameTeam, slugOpponent,
## slugTeamWinner, slugTeamLoser, outcomeGame, isWin, fgmTeam,
## fgaTeam, pctFGTeam, fg3mTeam, fg3aTeam, pctFG3Team, pctFTTeam,
## hasVideo, fg2mTeam, fg2aTeam, pctFG2Team, minutesTeam, ftmTeam,
## ftaTeam, orebTeam, drebTeam, trebTeam, astTeam, stlTeam,
## blkTeam, tovTeam, pfTeam, ptsTeam, plusminusTeam, urlTeamSeasonLogo)`?
## Acquiring NBA basic team game logs for the 2019-20 Regular Season
## Warning: All elements of `...` must be named.
## Did you want `dataTables = c(typeSeason, dateGame, idGame, numberGameTeamSeason, nameTeam,
## idTeam, isB2B, isB2BFirst, isB2BSecond, locationGame, slugMatchup,
## slugTeam, countDaysRestTeam, countDaysNextGameTeam, slugOpponent,
## slugTeamWinner, slugTeamLoser, outcomeGame, isWin, fgmTeam,
## fgaTeam, pctFGTeam, fg3mTeam, fg3aTeam, pctFG3Team, pctFTTeam,
## hasVideo, fg2mTeam, fg2aTeam, pctFG2Team, minutesTeam, ftmTeam,
## ftaTeam, orebTeam, drebTeam, trebTeam, astTeam, stlTeam,
## blkTeam, tovTeam, pfTeam, ptsTeam, plusminusTeam, urlTeamSeasonLogo)`?
## Acquiring NBA basic team game logs for the 2020-21 Regular Season
## Warning: All elements of `...` must be named.
## Did you want `dataTables = c(typeSeason, dateGame, idGame, numberGameTeamSeason, nameTeam,
## idTeam, isB2B, isB2BFirst, isB2BSecond, locationGame, slugMatchup,
## slugTeam, countDaysRestTeam, countDaysNextGameTeam, slugOpponent,
## slugTeamWinner, slugTeamLoser, outcomeGame, isWin, fgmTeam,
## fgaTeam, pctFGTeam, fg3mTeam, fg3aTeam, pctFG3Team, pctFTTeam,
## hasVideo, fg2mTeam, fg2aTeam, pctFG2Team, minutesTeam, ftmTeam,
## ftaTeam, orebTeam, drebTeam, trebTeam, astTeam, stlTeam,
## blkTeam, tovTeam, pfTeam, ptsTeam, plusminusTeam, urlTeamSeasonLogo)`?
## Warning: `cols` is now required when using unnest().
## Please use `cols = c(dataTables)`
Hierarchical Bayesian Logistic Regression