Instructions for Preparing Your Files

Before preparing your files, you should first read the Learning Pages about the Shapley Value and the calculation of SRC and OSWC.  These pages provide useful information about event-trigger scorekeeping.

OVERVIEW

SRC and OSWC can be calculated by hand for innings and games that have only very simple play-by-play sequences.  In general, however, the best way to calculate SRC and OSWC is to have a computer calculate them using the information that you provide about the plays in the game.

To calculate SRC and OSWC on this web site, you must provide the computer with two files:

  • The play-by-play csv data file that lists the event triggers for every play of the game..
  • A transition-matrix csv file that has the trigger transition information that the computer will use to simulate hypothetical sequences of play for sub-coalitions of the players..

To create the play-by-play csv file, the play-by-play scorekeeping of the game must be re-done as event-trigger scorekeeping.  Event-trigger scorekeeping consists of replacing the official scorekeeper’s designation for each play with an appropriate event trigger label that corresponds to a multi-dimensional, event-trigger description of that play. Most users of this web site will do the event-trigger scorekeeping manually.  Instructions for doing this are provided on this page below.

The transition-matrix csv file must contain the event-trigger transition information for all event triggers that are cited in the play-by-play csv file.

The formatting of these files must match those of the sample files.

SAMPLE FILES

Download these two sample files.

FILEDESCRIPTION
2020ws1.csvThe play-by-play data file for Game 1 of the 2020 World Series between the Tampa Bay Rays and the Los Angeles Dodgers.  This file uses my own event-trigger scorekeeping in Column 7.
transitionmatrix.csvThe transition matrix file used by the python code to 

The table below describes each column of the play-by-play csv file.  The example in the third column is from the first row of the sample play-by-play csv file.

COLUMNITEMEXAMPLEDESCRIPTION
1Game labelLAN202010200A name for the baseball game from which the play-by-play data is used.  This should be the same in all rows of column 1.  For example, it could designate the home field and date of the game.
2Visiting teamTBRThe name of the visiting team.  This should be the same in all rows of column 2.
3Home teamLANThe name of the visiting team.  This should be the same in all rows of column 2.
4Inning1The inning in which the play “event trigger” in column 7 occurred.  This should be an integer that is 1 or higher.
5Half-inning0The half-inning in which the play “event trigger” in column 7 occurred.  This should be either 0 or 1, where 0 denotes top of the inning and 1 denotes bottom of the inning.
6PlayerDiazThe name of the player who is to receive credit for the play “event trigger” in column 7.  This name must be spelled exactly the same in any row with an “event trigger” that is credited to this player.
7Event-trigger labelSGL_AThis is the name of the event trigger that designates how this player’s play will change the state of play in that half-inning.  See details below.

The first three columns use naming conventions from retrosheet.org, but you can use other naming conventions.  Notice that the retrosheet.org naming convention differs from the baseball-reference.com naming convention.  E.g., for the Los Angeles Dodgers, retrosheet.org uses LAN while baseball-reference.com uses LAD.  Diaz led off the inning with a single that changed the actual state of the game from “nobody on, nobody out” to “runner on first, nobody out.”  This play was assigned the event trigger “SGL_A” in which at the conclusion of the play the batter reaches first base safely and all other runners, should there be any, safely advance one base.  Notice that assigning event trigger SGL_A conveys more information than just coding it as a “single.” because the event triggers describes how that offensive event would change the state of play for every possible initial state.  Thus, event-trigger scorekeeping provides a more detailed description of a play than typical baseball scorekeeping.

The sample transition-matrix CSV file lists over 150 event triggers that were used to calculate the SRC and SRCC statistics reported on this website.  Each column (except the first) corresponds to a different initial state that is listed in the top row of the file.  The initial state is the base-out state at the start of the play.  The first three numbers denote the location of runners on third-second-first.  A zero denotes the base is empty, and a 1 denotes a runner is on the base.  For example, the state 110-1 means “runners on third and second but not first with one out.”  For example, the first event trigger in the transition-matrix CSV file is BALK, which moves the state of play from 110-1 to 100-1-1.

PREPARING YOUR FILES

Your play-by-play csv file must have the data organized in the same columns as the sample file.  Specifically:

  • There must not be a header row.
  • Each individual play must be on a separate row with the appropriate game, team, and inning labels in the first 5 columns.
  • The individual plays must be in the proper sequence, from the first play of the game which must be the top row in the file, to the last play which must be the last row in the file.
  • Every event trigger label that is referenced in the last column for at least one play must be included in the transition-matrix csv file.
  • The file must be saved in csv format.

Your transition-matrix csv file must have the data arranged in a similar format as the sample file.  Specifically:

  • The first row must be a header row which states the initial states using the same numerical convention (e.g., 110-1 means runners on third and second but not first with 1 out).
  • The first column must content the labels used to designate the event triggers.
  • The information in each column for an event trigger must contain the resulting base-out and runs-scored using the numerical convention (e.g., 101-1-1 means runners on third and first but not second with one out and one run scored on the play).
  • The event triggers can be arranged in any order (e.g., they do not have to be arranged in the same order as they are referenced in the play-by-play file).  The sample file has them arranged in alphabetical order.

TIPS FOR PREPARING YOUR FILES

  1. Create the files using your favorite spreadsheet program (e.g., Microsoft Excel), but save as a csv file.
  2. When deciding what event trigger to assign to a play, you can first look at the event triggers in the sample transition-matrix csv file to see if one of those event triggers would be appropriate to assign to the play.
    • If you find an appropriate event trigger, then assign its label in your play-by-play file and add copy that event trigger’s row from the sample transition-matrix file and paste into your transition-matrix csv file.
    • If you do not find an appropriate event trigger, then create a name for a new event trigger, add the event trigger’s state-change information in the respective columns of your transition-matrix csv file, and assign its label in your play-by-play csv file.  Be sure that the label that is assigned to the event trigger is spelled exactly the same in your transition-matrix csv file as it is when assigned in the play-by-play csv file.
  3. Rather than making your own transition-matrix csv file, you can instead use the sample file.  However, if you do not find an appropriate event trigger in the sample file, then create a name for a new event trigger, add the event trigger’s state-change information in the respective columns of your transition-matrix csv file, and assign its label in your play-by-play csv file.  Be sure that the label that is assigned to the event trigger is spelled exactly the same in your transition-matrix csv file as it is when assigned in the play-by-play csv file
  4. Individual plays in the game may be separated into separate plays in the file if you decide that different players involved in the play deserve their own credit for their part of the play.  (E.g., see the splitting of a wild pitch into two plays for Barnes and Betts on the How to Calculate SRC page.)
  5. The event triggers can be arranged in any order in the transition-matrix csv file (e.g., they do not have to be arranged in the same order as they are referenced in the play-by-play file).  The sample file has them arranged in alphabetical order for ease in finding event triggers to assign while doing the event-trigger scorekeeping.
  6. The game file you submit can have any number of innings.  E.g., you can submit a file that has just one inning or five innings or any other number.  You might want to start by submitting a file for just one inning to verify that you are on the right track.

When your files are ready for calculation, submit them on the CALCULATE page.

PRINCIPLES FOR EVENT-TRIGGER SCOREKEEPING

Human judgments matter in baseball scorekeeping.  One scorekeeper might rule a ball in play to be a hit while another rules it an error, and this decision affects BA and ERA calculations.  To improve consistency, baseball scorekeepers follow official scorekeeping guidelines and apply years of accumulated wisdom.  Yet, perfect consistency is not possible.

The same is true with event-triggers scorekeeping.  The assignment of the event triggers determines the value function and, consequently, the Shapley Value calculations, so different event-trigger scorekeepers can potentially apply different judgment which would result in different Shapley Value calculations.

To improve consistency, the calculations presented on this web site have followed a few principles for assigning event triggers.

The first is actually a strict rule rather than a principle:  the change in state that arose in the game must also result from the event trigger when in that same initial state.  If this rule is not followed, then the run value for the full coalition will not necessarily equal the number of runs that actually scored, and this results in an incorrect distribution of credit.

A next principle is “conservatism:”  in initial states not observed, a runner should not advance beyond a natural minimum.  For example, a single that in the real game advanced a runner from first to second should not be assumed to score a runner from second in a simulated half-inning.  This principle 

Another principle is “essence:”  the key feature of a play should be reflected in changes in base-out states from initial states that did not arise in the actual game.  For example, if a defender makes a throwing error in the actual game, then that same throwing error is assumed to occur in states with a different set of base runners.  The error is deemed to be the key essence of the play to carry through to the other states.

These last two principles usually align, but sometimes they imply different event-trigger assignments.  When they differ, the calculations on this website have typically followed essence if there was a dominant essence of the play.  However, perfect consistency in applying these principles is not possible.

If you are new at assigning event triggers, then just try your best.  You can experiment with different event triggers, recalculate as many times as you would like, and learn how different event triggers change the Shapley Values.