Ultimately the goal for a team is to win, so we want to evaluate players on how much they help their teams win games. Unfortunately the number of games that the team actually wins is a lousy way to judge individual players. There are too many things out of the control of an individual player, so we prefer to judge players based upon things that they have control over. Depending upon your preferences there are several routes you can take though. Let's start with offense, as the theory is better understood here both by the most advanced sabermetricians and the users of this site. The two components of scoring runs are getting on base and advancing runners around the base paths. This is why OPS works so well, as it takes both of those main components and combines them in a crude way. The problem with OPS is that it doesn't give the events the proper value. Specifically, OBP (i.e. not getting out) is much more important than SLG...a better formula would be 1.7*OBP + SLG actually, but I digress. One possible approach then would be to find the proper value of each offensive event (single, double, etc.) over all situations. This is a little complicated, but luckily Tom Tango did so in The Book (which I highly recommend). The value is dependent upon the run environment, but in a run environment like we have today a home run is worth about 2.2 times what a single is, for instance. Compare this with OBP which treats them as equals and SLG which says a HR is 4 times as valuable as a single. These correct values can then be converted into a rate state, wOBA, or a count stat like wRAA. Stats like wOBA are not without their problems though. First of all, the premise is that we are trying to judge things over which the players have control, but the hitters obviously do not control the defense that they are hitting into, the quality of pitcher they face, or the park in which they are playing. There are adjustments you can make for things like that, but I won't get into them here. The other main issue is that the relative values for each event are averaged out over all situations. However, even though on average a HR is 2.2 times as good as a single, this is not always true. With the bases loaded in the bottom of the 9th in a tie game, a HR is worth exactly the same amount as a single with respect to contributing towards a team win, and there are similar examples where a HR is worth much more than a single. The solution here is to take the context into consideration in valuing each event. If you want to consider the context and not simply average the values over all events, then win probability added (WPA) might be what you are looking for. At any given point in a game, a team finds itself in one of the many possible base-out states: bases loaded with nobody out, runners at the corners with 2 outs, etc. There are 24 such combinations. When you factor in the difference in the score and whether you are the home team or road team, there are a lot more possible states. We have historical data that indicates how likely a team is to win a game from each of the various states. For instance, in the bottom of the 9th with 2 outs and the bases empty in a tie game, the home team is expected to win about 53.2% of the time based on historical data. A home run in this situation means that the home team now wins 100% of the time, so the batter has added 46.8% of a win. A single, on the other hand, progresses the game to a state where the home team wins 56.2% of the time, so a single in this situation adds 3% of a win. The difference in win probability before and after each player's contribution is WPA. This stat gives the appropriate value to events based upon the context in which they took place. Now you might wonder if this is fair. This state gives a lot of weight to positive events that happen to occur in high leverage situations. Depending upon your beliefs the spread over which the events take place may or may not be random. However, regardless of whether or not the batter can control how much better he plays in important situations, he is not in control of the baserunners that are on when he bats. Some might then say that WPA is not fair to the hitter because it is measuring a lot of factors over which the batter is not in control. If this is your belief, you may want to consider WPA relative to how important the situations were when the hitter batted. For this you want WPA/LI, where LI stands for leverage index. Leverage index is just a way to indicate how crucial the situations were in which a hitter happened to bat. If you think players can do a lot to adjust there approach depending upon the game situation, then this might be for you. So you have some flexibility over what you think is fair to consider when evaluating a player's offense, but the above metrics are very good at measuring this stuff. If you want to consider who is worthy of an MVP award you may want to look at WPA because that indicates they did the most to help their team win. However, this does not have a lot of predictive value, largely because there can be great variation in the relative importance of the situations in which a given player happens to come up to bat. If you find this unfair you can switch to WPA/LI to counter this aspect. If you are of the belief that it's not fair to give someone extra credit just because their big hits happen to come up at important times of the game, then you can just use the averaged values of different events and use something like wOBA or wRAA. Take your pick depending upon what thing you want to measure. Evaluating pitchers is a bit more complicated. Pitchers obviously have a great deal of control over whether they strike out the batter, walk the batter, or hit the batter. They also have a good deal of control over whether or not they give up a home run, but this depends largely upon the park and there is some evidence that it is really the fly ball percentage that they have control over. Regardless, these are what the pitchers can mostly affect. However, when a ball is hit into the playing field and not over the fences, the defense matters a lot. Pitchers have little control over whether a ball in play is converted into an out or not. However, there is some difference, but the difference is not huge and there is a lot of noise. There is so much noise that it is not terribly uncommon for one of the worst pitchers in BABIP in one year to be one of the best in the following year. So most pitchers will have a true mean ability for BABIP of something like .290-.300, but there is a lot of fluctuation in a given year. One year's worth of data is not really enough to say if a pitcher has a lot of skill in keeping a low BABIP, but over the course of their careers some pitchers have demonstrated a degree of skill in this area. Unfortunately by the time you have enough data to measure this skill it's likely that the pitcher's skill has changed, so there's not a ton to be gained by treating a pitcher as having anything but league average skill in this area. There might be something to be gained if you want to get into some hardcore details (things like line drive percentage), but we won't lose a ton by ignoring it completely. There are a couple explanations as to why there is not a lot of variation in BABIP skill. One is that pitchers are naturally selected for this skill, so all pitchers at the MLB level must have demonstrated this skill over their amateur careers and are all pretty good at it. If they were bad at it, they wouldn't be Major Leaguers. Another explanation is just that it is inherently difficult for a pitcher to control this. Regardless, there is an effect that can be measured if we really care to do some hardcore stuff, but for our intents and purposes we are fairly safe ignoring the existence of this skill. If we assume every pitcher has the league average skill relating to BABIP, then a great way to evaluate pitching performance is through FIP. Essentially this is what a pitcher's ERA should be independent of the defense, park, and luck. Tom Tango created this using similar ideas to properly value the various events over which a pitcher has control. I believe it has been shown to be a better predictor of ERA than ERA itself. If you want to take another step and notice that the home run to fly ball ratio a pitcher has is not really a skill either, then xFIP normalizes for this factor. There are further steps you can take if you are really interested in examining things like line drive percentage. Now you may be wondering about WAR...what is it good for? It can take the events over which a player has control and estimate how many actual wins those events should be responsible for (over a replacement level player). There are some fairly straightforward conversions to change things like wOBA and FIP into runs, and then runs can be converted into wins based upon Pythagorean expectations. The main issue with WAR for hitters is that it uses defense, and many people don't trust defensive stats. I don't know all the details about UZR, but from what I do know it's the best defensive stat widely available, and there are ways in which you can convert UZR to run values as well. UZR is not without noise, so in addition to not being understood, it can lead to counter-intuitive results over small sample sizes. Personally I've struggled with the question of how important defense is relative to offense and pitching, but given the fact that there are many things over which a pitcher has little control, it seems that for years I have been drastically underestimating the value of defense. I've been doing some of my own research recently to estimate the relative importance of offense, defense, and pitching, and the preliminary results are that defense is much more important than I used to suspect (but still secondary to offense and pitching of course). Because I trust wOBA/wRAA, UZR, and FIP, I believe WAR is the best widely available metric to judge a player's overall past performance. That doesn't mean it's perfect (or all that matters), but it's very good at what it does. If you try to use it for something it's not designed to do, then you may get some weird results. For instance, if you want to predict future performance, you probably want to look at similar players as PECOTA does or do a lot of regression towards the mean like MARCEL. If you don't like some of the assumptions about wOBA/wRAA then you might try WPA or WPA/LI. If all this stuff sounds good you might want to check out FanGraphs and The Book (The Book goes into more detail about how a lot of these stats were derived). |