Saturday, February 18, 2012

The Draft Man Cometh Part 1

The Draft.
My heart enters into that elevated and erratic pitter patter that would put less fit people in the hospital. My too-close-together eyes dilate as if thrown into blinding sunlight from a dark lair. My blood boils into my wang like that Typhoon that wiped Japan off the map a year ago.
The Draft.
The hours spent preparing, the petaflops spent calculating, the photons generated deep inside Sol travelling through the vacuum of space to hit printed words in Fantasy Baseball magazines and then be deposited into my rods or cones. All wasted when Lenny Dykstra picks Miguel Cabrera 4th and my entire strategy is thrown into disarray.
The Draft.
120 seconds stretches out to infinity as all sounds die away and my world distills down to myself, my computer screen, and a long list of players. Hunter Pence? Brandon Phillips? I could get a top-line starter here. If I don't take a second baseman now, I will have to get someone at the bottom of the barrel. Ichiro will get hits and steals, but without the Ding-a-lings, how can I make up RBIs and OPS? Bill James predicts a big year from Joseph Bats, but that hack comedy writer Matthew Berry doesn't buy in; who to trust! I love numbers but yet Berry has a podcast! What does that mean! What does any of it mean! The timer begins to beep and what had been an eternity is now but 4 seconds. Ichiro? Phillips? Bats? Rios? Help me Jebus! "Jayson Werth!" I hear a disembodied voice call out. And just like that, my fantasy season is destroyed.
The Draft.

But wait a minute here, what does the draft really mean? The year that my beloved Ackbars strode easily to victory, my first pick was solid, but my next 4 were pretty much garbagio. Other than Woo Woo, we pretty much all make our hay on the free agent wire; the drafted players do yeoman work, but they are rarely the decisive factor. Is the draft pointless? Does draft position have anything to do with how strong a player will be after all accounts have been closed? Unclear. Let's take a look at some hard data and see if we can draw any conclusions.
I looked back at the drafts from the past 3 years and then pulled out the ESPN power rankings of the top 500 players for these years as they stood at the end of the year. If a player finished outside of the top 500 for that year, he got a power ranking of 500. All keepers have been excluded in this analysis. Before you look below, think about what you would expect: The guys at the beginning of the draft would be the highest rated at the end of the year, so you would expect pretty much a linear pattern with the power rankings going up as the draft picks do. Is this what we get? Of course it isn't!
Much to my surprise, there are picks all over the place all the time. Early in the draft, there are picks that end up being total duds. Later in the draft, there are picks who end up being studly leaders who hoist their teams on their back and make their owners consider homosexuality or, in Justine's case, heterosexuality. I figured that the data is noisy, sure, but perhaps there were trends lurking that I would need the computer to tease out for me. But no! That is not the case! You can see that the linear regression, such as it is, fits the data insanely poorly yielding an R-squared of but 0.13. To the un-stat-savvy, or as my wife likes to call them, "normal people," this means more or less that there is only a 13% chance that there is any trend in this data!
"Well," you say casting a critical eye to my graph, "that is because the late rounds skew the data too much. If you look early, there will be a much stronger regression. True?
No! Untrue! There is even less of a trend in the early rounds than in the late (Again, looking at R-squared shows only a 2% chance that there is any trend in this data! 2% There is more cream in milk than that! Well, only in half and half. It is the exact same amount as is in 2%. But even that is too rich for me!)! I think that the only thing that you can say is that the first 20 guys picked will be pretty good, but really, will sort of be a similar mish-mash of strong players. While there is a good chance that you will get a buy who will carry your team in those first rounds, there is also an equally good chance that you will get a guy who is just OK, or even worse, totally stinks up the joint!
How can this be! How can the draft mean so little in the final reckoning? There are several notable cases that bear looking at:
-The top player at the end of the season has only come from the top 20 picks in the draft twice (Albert Pujols picked number 2 in 2009 and Matt Kemp picked number 15 last year). The other top player was Carlos Gonzalez in 2010 picked all the way in the 11th round at pick 109.
-The 5th highest rated player has never come from a round higher than the 4th, with the highest being Joseph Batthews picked #35 in 2011, Josh Hamilton picked #44 in 2010, and Joe Mauer picked #81 in 2009.
-The average number pick for the top-10 in the final power ranking was #26, as in you would be more likely to find a top-10 guy in the 3rd round or above than in the first round (*nb - I have a more thorough analysis of this type of phenomenon coming up later, but rest assured, this statement is not a wild misinterpretation of the data).
-3 players picked in the top 5 picks have finished ranked #100 or below: Hanley Ramirez picked 2nd in 2011 finished a disastrous #258, Jose Reyes picked 4th in 2009 finished a basically-didn't-matter 456th, and Chase Utley picked #5 in 2010 finished #110. How are those Marlins of Miami's chances looking now, boyee?
-The safest player in the last 3 years: Albert Pujols. He has been picked #2, #2, and #1 and finished #1, #2, and #11 in 2009, 2010, and 2011, respectively. Miguel Cabrera, the first off the board in a lot of the literature I have been reading was picked #6, #8, and #4 and finished #9, #4, and #4. Two new situations, probably no new outcomes. Another new situation: Ryan Braun. He was picked #5, #4, and #5 and finished #3, #13, and #2 in the past. With these 50 games possibly being levied against him for whatevers, where will he go in the draft?
-Consistently overrated: Captain Crunch Sabathia. Picked #16, #19, and #24, he finished only #48, #56, and #61. With wins out of the scoring this year, I can't imagine CC being on any of my teams, which of course means that I will get nervous and take him waaaaay too early. Also overrated: Jimmy Rollins. Picked #9, #18, and #66, he finished #123, #369, and #78.

Friday, February 17, 2012

Batting Facts



Looking at the teams' batting statistics is a different beast than the pitching. Whereas a good starter will maybe put in 35 starts, a good hitter will easily clear 500 plate appearances. The sheer weight of the data is enough to change the tilt of the Earth I say! The tilt! To start off, let's just look at a few interesting facts from last year:

-Kemporer Mattpaltene (that is would be Matt Kemp) was not only an integral cog in his fantasy machines last year, but to his real life team as well. Matt Kemp had the greatest part of his total team's offense in the league last year accounting for 18.4% of his team's runs and fully 21% of his team's RBIs last year. To give you an idea of how large a proportion of the team's offense was centered around this one guy, the next largest proportion of runs on a team came from Curtis Granderson who put up only 15.7% of the Yankees many many runs. That means that the Kemporer contributed 20% more runs to his team than the next guy down the line. Alternately, compare it to Prince Fielder, who accounted for the second highest proportion of his team's RBIs with but 17.7% of his team's total RBI. On the low end theory of these metrics are the Giants, whose leading scorer was the Kung Fu Panda who accounted for just under 10% of his team's runs, and to the Metropolitans, whose leading rbi-guy was Carlos Beltran with just under 10% of his team's RBIs.

-The entire premise of Moneyball was that despite normal metrics used to assess baseball, Jonah Hill is better when he is fat. Also, getting on base is the most important thing, regardless of how you do it. With this in mind, I looked at which teams were most effective in converting men on base into men without pants, hubba hubba! The Rangers bring fully 40.5% of the runners they send out to the bases back home, a healthy 854 runs from 2109 men on base (hits+walks+hbp). They were followed by the Yankees, who brought home 40.2% of their runners (867 runs, 2152 men on base). This was surprising as it always seemed like the Yankees couldn't get guys in when they most needed it, but perhaps it was simply that they scored so often that the times when they didn't were glaring or, more as I remember it, heart rending, soul crushing, and making one believe in anything anymore impossible. The team that stranded the most men on the base paths was the sad Giants, who brought in only 31.2% of their runners.

-For pitchers and their fantasy managers, the unearned run is something to be pleased with. For batters and their fantasy managers, the exact opposite is true. These two are diametrically opposed to what their real life teams feel, interestingly. The team who had to earn highest percentage of the runs that they scored was the Brewers who had to earn 97.3% of their runs. They were followed by the Orioles, who had to earn 96.9% of their runs. The Twinkies had to earn only 92.2% of their runs, the lowest in the league. It is interesting that these numbers did not at all mesh with the number of stolen bases that a team has. I had always thought that one of the best things about the speedsters is that they distract the pitcher and cause confusion among the infielders leading to errors on the basepaths and more unearned runs. This is not the case however, as the teams with the greatest number of stolen bases, the Rays and the 'Dres (155 and 162, respectively), had middling percentages of unearned runs (both 95.2%).

Friday, February 3, 2012

Stats Dependent on Teams: Saves vs. Wins

One of the reasons that has been cited for the recent and somewhat common move from wins to quality starts is that pitchers should be rewarded for pitching well rather than being at the whim of their offense or bullpen (Actually, no reasons have even been given for this authoritarian directive that was handed down without notice or discussion. Historically, people have liked having rules imposed on them by an tyrannical regime.). However, saves remain a vital category in this league as in many others. Do saves rely on the strength of a team in the same was as wins do? How do quality starts fit into the mix? Let us examine these questions in absurd detail.
The first question is whether wins do, in fact, rely on the strength of a team. I looked at the amount of wins from a team's starting staff versus the amount of wins that a team put up in total.

Not surprisingly, these two numbers were pretty well associated. From the last 5 years of pitching, each team win would mean an 84% chance of an individual win (the slope of the line, the number before the "x" in the equation). Moreover, this association was quite strong as the R-square was up at 0.65. No one would be surprised that when teams win, their starters win, but for fantasy purposes, this statistic becomes somewhat troubling; independently of the other pitching categories, if you pick a pitcher on a team that wins a lot, he will quite likely put up some wins. The true value of the pitcher will be watered down by the strength of the team. This would add value to pitchers like Phil Hughes in 2010 (18 wins, 4.18 ERA) and AJ Burnett in 2008 (18 wins, 4.06 ERA) who each benefitted from the dominance personified Yankees offense. The Red Sox of 2007 were able to support Tim Wakefield to an insane degree (17 wins, 4.76 ERA), as did the 2007 Rockies for Jeff Francis (17 wins, 4.22 ERA) and more recently the 2011 Tigers for Ackbar-killing Max Scherzer (15 wins, 4.43 ERA). Conversely, stat-outlier Matt Cain has seen his value diminished by his team's lack of accumen at the game upon which we are currently elocuting as in 2011, he posted a 2.96 ERA while accruing only 12 wins in 26 outings. Similarly, Jake Peavey (10 wins, 2.85 ERA in 2008, 27 games), Ryan Vogelsong (13 wins, 2.70 ERA in 2011, 30 games), and famously Felix Hernandez (13 wins, 2.27 ERA in 2010, 34 games) all were hurt by their team's lack of wins.
However, the question remains: Do quality starts have a similar association with team wins? Will we be able to pick up 4th starters on the Yankees and count on them to have quality starts? Survey says....

...not nearly as much. Each additional team win was associated with only 0.44 of a quality start, and that association was quite weak, yielding an R-square of only 0.20. While one interpretation of this data would be that when a pitchers puts up quality, his team will most likely win, this is not the case: for each quality start, a team will only have 0.45 extra wins, an association which is similarly weak at 0.20 R-square (data not shown). Indeed, by and large, it would seem that scoring quality starts rather than wins would take the team out of the equation. This brings up the interesting question, though: Are saves not similarly dependent on the team? I mean, surely, the Yankees with 100+ wins will have more saves than the Buccos. Should we drop saves as a category? Data, do you have something to say on this topic?


Strangely enough, each team win yielded only 0.44 of a save, a similar yield to quality starts. The strength of association was also right in the middle of the previous two with an R-square of 0.43. Despite this weak association, it was much stronger than the association of team wins with save opportunities (the yellow guys):


Each additional team win was associated with only 0.26 of an extra save opportunity, and that association was the weakest of the bunch with an R-square of 0.14. Indeed, it appears that teams will stumble into save opportunities and only teams that are able to convert those into wins will be able to have any association at all. In other words, save opportunities will come to every team, but the closer will have to be good in order to convert them.

The bottom line of all of this appears to be that the scoring changes are somewhat valid if the aim is to shift the focus from the team to the individual player. I guess that the armed revolt will wait another day. Perhaps that day will when FAAB is brought into the equation.....