Mr Baso-ball by the Numbers

Tuesday, March 20, 2012

The Orphan Categories: Saves

As I start to get into level 3 draft preparation, I have been parsing which rounds to target which guys with an eye to having a team that will be able to put up good numbers in each one of the 12 categories. To aid me in this pursuit, I have looked at who is projected to put up the best numbers in each category and when these players are being drafted. Two trends have started to make their presence known: the independence of steals and the differential calculus of relievers.
Looking at relievers in particular: the guys who come up as the tops in the categories are noticeably lower in overall ranking than their more potent position-mates. This is not as much of a surprise with closers than with speed guys, but it still makes you scratch your head. The top 5 relivers are Craig Kimbrel, Mariano Rivera, Jonathan Papelbon, John Axford, and Drew Storen. The highest amoung them is Craig Kimbrel at in the 6th round, Mo and Papelbon in the 8th, and then the rest after the 10th. Compare that with the top 5 starters, Justin Verlander, Clayton Kershaw, Roy Halladay, Cliff Lee, and Felix Hernandez who are all going in the first 3 rounds. Of course, these guys will pitch fewer innings and so will be able to contribute less to the pitching staff than starters, but let us consider the implications of that for a second: is a good closer really worth so much less than a starter? What about 2? Perhaps 3? The top closer, Craig Kimbrel, is projected to put up a line of 108 Ks, an ERA of 2.29, and a WHIP of 1.11. Good numbers, to be sure, but only accrued in 71 IP. Compare these numbers to a single elite starter, Cliff Lee for example, who will have 204 Ks, an ERA of 2.89, and a WHIP of 1.07 over more than 3 times those innings, 221 IP. Looking at those two together gives a better idea of how little the reliever would contribute: if you had both of them together, the joint ERA would drop only 0.15 from Lee's 2.89 for a 2.74 ERA, and the K/9 would rise from Lee's 8.31 to 9.62, a far cry from Kimbrel's 13.69. It would take the top three closers to have an equal effect as a single elite starer. Craig Kimbrel, Mariano Rivera, and Jonathan Papelbon are together projected to have 240 Ks, an ERA of 2.49 and WHIP of 1.067 in a total of 195 IP. Indeed, it is clear that closers can contribute pretty exclusively to saves, and you must rely on your starters for the rest of the categories.

Saturday, February 18, 2012

The Draft Man Cometh Part 1

The Draft.
My heart enters into that elevated and erratic pitter patter that would put less fit people in the hospital. My too-close-together eyes dilate as if thrown into blinding sunlight from a dark lair. My blood boils into my wang like that Typhoon that wiped Japan off the map a year ago.
The Draft.
The hours spent preparing, the petaflops spent calculating, the photons generated deep inside Sol travelling through the vacuum of space to hit printed words in Fantasy Baseball magazines and then be deposited into my rods or cones. All wasted when Lenny Dykstra picks Miguel Cabrera 4th and my entire strategy is thrown into disarray.
The Draft.
120 seconds stretches out to infinity as all sounds die away and my world distills down to myself, my computer screen, and a long list of players. Hunter Pence? Brandon Phillips? I could get a top-line starter here. If I don't take a second baseman now, I will have to get someone at the bottom of the barrel. Ichiro will get hits and steals, but without the Ding-a-lings, how can I make up RBIs and OPS? Bill James predicts a big year from Joseph Bats, but that hack comedy writer Matthew Berry doesn't buy in; who to trust! I love numbers but yet Berry has a podcast! What does that mean! What does any of it mean! The timer begins to beep and what had been an eternity is now but 4 seconds. Ichiro? Phillips? Bats? Rios? Help me Jebus! "Jayson Werth!" I hear a disembodied voice call out. And just like that, my fantasy season is destroyed.
The Draft.

But wait a minute here, what does the draft really mean? The year that my beloved Ackbars strode easily to victory, my first pick was solid, but my next 4 were pretty much garbagio. Other than Woo Woo, we pretty much all make our hay on the free agent wire; the drafted players do yeoman work, but they are rarely the decisive factor. Is the draft pointless? Does draft position have anything to do with how strong a player will be after all accounts have been closed? Unclear. Let's take a look at some hard data and see if we can draw any conclusions.
I looked back at the drafts from the past 3 years and then pulled out the ESPN power rankings of the top 500 players for these years as they stood at the end of the year. If a player finished outside of the top 500 for that year, he got a power ranking of 500. All keepers have been excluded in this analysis. Before you look below, think about what you would expect: The guys at the beginning of the draft would be the highest rated at the end of the year, so you would expect pretty much a linear pattern with the power rankings going up as the draft picks do. Is this what we get? Of course it isn't!

Much to my surprise, there are picks all over the place all the time. Early in the draft, there are picks that end up being total duds. Later in the draft, there are picks who end up being studly leaders who hoist their teams on their back and make their owners consider homosexuality or, in Justine's case, heterosexuality. I figured that the data is noisy, sure, but perhaps there were trends lurking that I would need the computer to tease out for me. But no! That is not the case! You can see that the linear regression, such as it is, fits the data insanely poorly yielding an R-squared of but 0.13. To the un-stat-savvy, or as my wife likes to call them, "normal people," this means more or less that there is only a 13% chance that there is any trend in this data!
"Well," you say casting a critical eye to my graph, "that is because the late rounds skew the data too much. If you look early, there will be a much stronger regression. True?

No! Untrue! There is even less of a trend in the early rounds than in the late (Again, looking at R-squared shows only a 2% chance that there is any trend in this data! 2% There is more cream in milk than that! Well, only in half and half. It is the exact same amount as is in 2%. But even that is too rich for me!)! I think that the only thing that you can say is that the first 20 guys picked will be pretty good, but really, will sort of be a similar mish-mash of strong players. While there is a good chance that you will get a buy who will carry your team in those first rounds, there is also an equally good chance that you will get a guy who is just OK, or even worse, totally stinks up the joint!
How can this be! How can the draft mean so little in the final reckoning? There are several notable cases that bear looking at:
-The top player at the end of the season has only come from the top 20 picks in the draft twice (Albert Pujols picked number 2 in 2009 and Matt Kemp picked number 15 last year). The other top player was Carlos Gonzalez in 2010 picked all the way in the 11th round at pick 109.
-The 5th highest rated player has never come from a round higher than the 4th, with the highest being Joseph Batthews picked #35 in 2011, Josh Hamilton picked #44 in 2010, and Joe Mauer picked #81 in 2009.
-The average number pick for the top-10 in the final power ranking was #26, as in you would be more likely to find a top-10 guy in the 3rd round or above than in the first round (*nb - I have a more thorough analysis of this type of phenomenon coming up later, but rest assured, this statement is not a wild misinterpretation of the data).
-3 players picked in the top 5 picks have finished ranked #100 or below: Hanley Ramirez picked 2nd in 2011 finished a disastrous #258, Jose Reyes picked 4th in 2009 finished a basically-didn't-matter 456th, and Chase Utley picked #5 in 2010 finished #110. How are those Marlins of Miami's chances looking now, boyee?
-The safest player in the last 3 years: Albert Pujols. He has been picked #2, #2, and #1 and finished #1, #2, and #11 in 2009, 2010, and 2011, respectively. Miguel Cabrera, the first off the board in a lot of the literature I have been reading was picked #6, #8, and #4 and finished #9, #4, and #4. Two new situations, probably no new outcomes. Another new situation: Ryan Braun. He was picked #5, #4, and #5 and finished #3, #13, and #2 in the past. With these 50 games possibly being levied against him for whatevers, where will he go in the draft?
-Consistently overrated: Captain Crunch Sabathia. Picked #16, #19, and #24, he finished only #48, #56, and #61. With wins out of the scoring this year, I can't imagine CC being on any of my teams, which of course means that I will get nervous and take him waaaaay too early. Also overrated: Jimmy Rollins. Picked #9, #18, and #66, he finished #123, #369, and #78.

Friday, February 17, 2012

Batting Facts

Looking at the teams' batting statistics is a different beast than the pitching. Whereas a good starter will maybe put in 35 starts, a good hitter will easily clear 500 plate appearances. The sheer weight of the data is enough to change the tilt of the Earth I say! The tilt! To start off, let's just look at a few interesting facts from last year:

-Kemporer Mattpaltene (that is would be Matt Kemp) was not only an integral cog in his fantasy machines last year, but to his real life team as well. Matt Kemp had the greatest part of his total team's offense in the league last year accounting for 18.4% of his team's runs and fully 21% of his team's RBIs last year. To give you an idea of how large a proportion of the team's offense was centered around this one guy, the next largest proportion of runs on a team came from Curtis Granderson who put up only 15.7% of the Yankees many many runs. That means that the Kemporer contributed 20% more runs to his team than the next guy down the line. Alternately, compare it to Prince Fielder, who accounted for the second highest proportion of his team's RBIs with but 17.7% of his team's total RBI. On the low end theory of these metrics are the Giants, whose leading scorer was the Kung Fu Panda who accounted for just under 10% of his team's runs, and to the Metropolitans, whose leading rbi-guy was Carlos Beltran with just under 10% of his team's RBIs.

-The entire premise of Moneyball was that despite normal metrics used to assess baseball, Jonah Hill is better when he is fat. Also, getting on base is the most important thing, regardless of how you do it. With this in mind, I looked at which teams were most effective in converting men on base into men without pants, hubba hubba! The Rangers bring fully 40.5% of the runners they send out to the bases back home, a healthy 854 runs from 2109 men on base (hits+walks+hbp). They were followed by the Yankees, who brought home 40.2% of their runners (867 runs, 2152 men on base). This was surprising as it always seemed like the Yankees couldn't get guys in when they most needed it, but perhaps it was simply that they scored so often that the times when they didn't were glaring or, more as I remember it, heart rending, soul crushing, and making one believe in anything anymore impossible. The team that stranded the most men on the base paths was the sad Giants, who brought in only 31.2% of their runners.

-For pitchers and their fantasy managers, the unearned run is something to be pleased with. For batters and their fantasy managers, the exact opposite is true. These two are diametrically opposed to what their real life teams feel, interestingly. The team who had to earn highest percentage of the runs that they scored was the Brewers who had to earn 97.3% of their runs. They were followed by the Orioles, who had to earn 96.9% of their runs. The Twinkies had to earn only 92.2% of their runs, the lowest in the league. It is interesting that these numbers did not at all mesh with the number of stolen bases that a team has. I had always thought that one of the best things about the speedsters is that they distract the pitcher and cause confusion among the infielders leading to errors on the basepaths and more unearned runs. This is not the case however, as the teams with the greatest number of stolen bases, the Rays and the 'Dres (155 and 162, respectively), had middling percentages of unearned runs (both 95.2%).

Friday, February 3, 2012

Stats Dependent on Teams: Saves vs. Wins

One of the reasons that has been cited for the recent and somewhat common move from wins to quality starts is that pitchers should be rewarded for pitching well rather than being at the whim of their offense or bullpen (Actually, no reasons have even been given for this authoritarian directive that was handed down without notice or discussion. Historically, people have liked having rules imposed on them by an tyrannical regime.). However, saves remain a vital category in this league as in many others. Do saves rely on the strength of a team in the same was as wins do? How do quality starts fit into the mix? Let us examine these questions in absurd detail.
The first question is whether wins do, in fact, rely on the strength of a team. I looked at the amount of wins from a team's starting staff versus the amount of wins that a team put up in total.

Not surprisingly, these two numbers were pretty well associated. From the last 5 years of pitching, each team win would mean an 84% chance of an individual win (the slope of the line, the number before the "x" in the equation). Moreover, this association was quite strong as the R-square was up at 0.65. No one would be surprised that when teams win, their starters win, but for fantasy purposes, this statistic becomes somewhat troubling; independently of the other pitching categories, if you pick a pitcher on a team that wins a lot, he will quite likely put up some wins. The true value of the pitcher will be watered down by the strength of the team. This would add value to pitchers like Phil Hughes in 2010 (18 wins, 4.18 ERA) and AJ Burnett in 2008 (18 wins, 4.06 ERA) who each benefitted from the dominance personified Yankees offense. The Red Sox of 2007 were able to support Tim Wakefield to an insane degree (17 wins, 4.76 ERA), as did the 2007 Rockies for Jeff Francis (17 wins, 4.22 ERA) and more recently the 2011 Tigers for Ackbar-killing Max Scherzer (15 wins, 4.43 ERA). Conversely, stat-outlier Matt Cain has seen his value diminished by his team's lack of accumen at the game upon which we are currently elocuting as in 2011, he posted a 2.96 ERA while accruing only 12 wins in 26 outings. Similarly, Jake Peavey (10 wins, 2.85 ERA in 2008, 27 games), Ryan Vogelsong (13 wins, 2.70 ERA in 2011, 30 games), and famously Felix Hernandez (13 wins, 2.27 ERA in 2010, 34 games) all were hurt by their team's lack of wins.
However, the question remains: Do quality starts have a similar association with team wins? Will we be able to pick up 4th starters on the Yankees and count on them to have quality starts? Survey says....

...not nearly as much. Each additional team win was associated with only 0.44 of a quality start, and that association was quite weak, yielding an R-square of only 0.20. While one interpretation of this data would be that when a pitchers puts up quality, his team will most likely win, this is not the case: for each quality start, a team will only have 0.45 extra wins, an association which is similarly weak at 0.20 R-square (data not shown). Indeed, by and large, it would seem that scoring quality starts rather than wins would take the team out of the equation. This brings up the interesting question, though: Are saves not similarly dependent on the team? I mean, surely, the Yankees with 100+ wins will have more saves than the Buccos. Should we drop saves as a category? Data, do you have something to say on this topic?

Strangely enough, each team win yielded only 0.44 of a save, a similar yield to quality starts. The strength of association was also right in the middle of the previous two with an R-square of 0.43. Despite this weak association, it was much stronger than the association of team wins with save opportunities (the yellow guys):

Each additional team win was associated with only 0.26 of an extra save opportunity, and that association was the weakest of the bunch with an R-square of 0.14. Indeed, it appears that teams will stumble into save opportunities and only teams that are able to convert those into wins will be able to have any association at all. In other words, save opportunities will come to every team, but the closer will have to be good in order to convert them.

The bottom line of all of this appears to be that the scoring changes are somewhat valid if the aim is to shift the focus from the team to the individual player. I guess that the armed revolt will wait another day. Perhaps that day will when FAAB is brought into the equation.....

Monday, January 16, 2012

Interlude - Saves

Some thoughts as I am getting more concrete data on relievers. Prepare to have your mind blown!

-Do you know how many blown saves there were last year? That number would be 575. That is a staggering number. There are fewer stars in the galaxy than that! That is untrue, of course. Still staggering.

-Do you know that there are 8 teams that have 10 or more blown saves from pitchers who never even recorded a single save (Colt 45s, Klue Klays, Bravos, Brewers, Marlins, Natinals, Padres, and Rockies)! 3 guys blew 7 saves each without converting a single one. One of them was Aaron Crow! That guy was on my team for a short period last year! Why do they keep getting the ball in the 9th! 7 saves blown! Not a single one converted! What is this, the '06 Yankees season and we are talking about Kyle Farnsworth? Managers: these guys stink. Don't give them the ball. What are we talking about here!

-In the entire league last season, there were 1818 save opportunities. 31.6% of them were blown. Have I truly been so spoiled by a decade of Mariano Rivera and his delicious steaks outside of New Rock City that this number is almost certainly incorrect (of course it is correct)?

-How do you think the Colt 45's numbers of saves matches up with their number of blown saves. Those two numbers are approximately equivalent, or more precisely, exactly the same. They saved 25 and they blew 25. While Mark Melancon didn't exactly light the world on fire with 20 whole saves, without him, they would have blown 4 times the number of saves that they accrued (20 blown, 5 saved). Wilton Lopez blew 6 saves himself without ever successfully converting a chance. Brandon Lyon, a legit fantasy option the season before last, blew 3 himself. I drafted him in the 20th round. Do you know who else I could have gotten in that round? Albert Pujols, that's who. Well, actually, that's not who. I could have gotten some other trash like Jason Kubel, but the fact remains!

Pitchers: A New Look - HR Rate

So I have this big spreadsheet with all sorts of data from the 2011 campaign for pitchers that I was using for my previous post on the quality starts rule change, and I sit here looking at it and saying to myself, "Is 1 post all that I can really derive from such a wealth of data? So many columns were unused! So many metrics undiscovered and unreported on! There must be something more." Well, it turns out that there is more (yes, I know it is hard to believe that a huge spreadsheet could yield more than a pithy analysis of quality starts). Behold: Your More!

You know what is the worst? When you place a big take out order and you get home and, of the 10 people who are getting food, you are the only one whose order was forgotten. Or perhaps when you are going to sleep and you reach over to turn of the light, and you knock over your glass of water necessitating a lengthy clean up process. For the porpoises of this post, though, lets consider the case of when your pitcher is throwing a decent game, the bases get loaded up, and rather than hitting a sac fly or striking out, he gives up a ding-a-ling thus destroying his outing in one swing of the bat. Indeed, the dong is a complete force-multiplier for the both offense and the defense; one swing can boost up 5 hitting categories and very realistically destroy 3 defensive categories (even more now that quality starts are included in the scoring).
I looked at which pitchers had the highest ratio of HRs per Earned Runs thinking that the pitchers who had the highest HR/ER ratio would be the guys who would have most been hurt by the long ball. In essence, if these guys had the wind blowing in a little bit more or played with outfielders who timed their jumps a little better, their ERs would go down. The pitcher with the highest HR/ER ratio was the Reds' Bronson Arroyo (Bernie) who had a staggering 46 HRs to a still-pretty-high 112 earned runs. More interesting is the next guy on the list, the Tigers' Justin Verlander (Peachz) who gave up 24 HRs to only 67 earned runs. His last season was Ghengis Khan-ian in its murderous ferocity, and there was room for improvement. Oh, and he is a keeper. Thanks Justine! Another notable name at the top of the range are the Yankees newest acquisition, Hiroki Kuroda (Fooey) with 24 HRs to only 69 earned runs. Last week, you would have thought that his under-spoken-of 3.07 ERA could get even lower, but in the bandbox that A-Rod built, this number may go up. Others who got stung by the long ball are Josh Beckett (Dykstra), Jeremy Helleckson (Dykstra), Wandy (WooWoo), YoGa (WooWoo), and Theodore Roosevelt Lilly (Ackbar).
On the flip side, the Buccaneer Charlie Morton gave up a paltry 6 HR to 73 earned runs. The extreme ground-ballers will always limit their damage from loaded bases, but they will balance out this benefit with a lack of Ks. As our league favors power pitchers, guys like Morton or Wang will never find permanent homes on any teams. Matt Cain (Ackbar) only gave up 9 HR to 71 earned, followed by Derek Lowe (Bernie) with 14 HR to 105 earned, and Buzzsaw Billingsley (Allah) with 14 HR to 88 earned. These guys could see their ERAs rise this coming season, though they also could not. Who the hell knows.

Pitchers: A New Look - uERA

So I have this big spreadsheet with all sorts of data from the 2011 campaign for pitchers that I was using for my previous post on the quality starts rule change, and I sit here looking at it and saying to myself, "Is 1 post all that I can really derive from such a wealth of data? So many columns were unused! So many metrics undiscovered and unreported on! There must be something more." Well, it turns out that there is more (yes, I know it is hard to believe that a huge spreadsheet could yield more than a pithy analysis of quality starts). Behold: Your More!

There were a number of times last year when I would be playing one of you punks and I would check out the MLB score before bed and see that a team facing an opposing fantasy team's pitcher put up a monster number of runs. Cheered, I would drift off to a troubled sleep with dreams filled with zombies and awkward moments. It would only be in the morning that I checked the box scores and noticed that, despite giving up 7 runs, the offending pitcher was only charged with 1 as his defense failed him. I understand the rules of earned versus unearned runs (actually, I sort of don't, they seem to shift like the desert's sands), but really, when a pitcher loads them up and then gives an error clears the bases, how can he get so few earned! I feel like those pitchers had their ERAs artificially lowered and are, in fact, worse pitchers than their ERAs say. The pitcher (who played on a team in our league) who gave up the highest percentage of unearned runs was Jaime Garcia (Dykstra), who gave up 100 runs but was only charged with only 77 of them, hanging fully 23% on the defense. Had those runs been earned, he would have been saddled with with a 4.63 uERA (Unearned Run Average) rather than the 3.55 ERA that he posted. The Cardinals, in fact, were the worst defensive team in terms of supporting their starting pitchers with only 87.3% of the total runs given up by their starters being charged to them. Next was Johnny Cueto (Peachz), with 21% of runs on the defense (51 given up versus 40 charged) bringing his uERA to a still-pretty-good 2.94 rather than 2.30. Third was Matt Garza (Bernie) who had 18% of his runs caused by the defense (90 given up versus 73 charged) having a Mendoza-Linian 4.09 uERA versus his serviceable 3.31 ERA. On the flip side, the pitchers who were big earners were Esmil Rodgers (Bernie), Phil Hughes (Ackbar), and Travis Wood (Ackbar) who earned every single one of their runs. In the "actual decent pitchers" category, Cole Hamels' (Ackbar) defense only fudged 1 run for him and Ian Kennedy's (Peachz) only 2. Think about that for a second: those two guys were aces in every sense of the word except for the actual definition, but if 8% of their runs been charged to the defense (the league average), their ERAs would be 2.60 for Hamels (versus 2.79) and 2.71 for Kennedy (versus 2.87). For Hamels, this is to be expected as the Phillies defense was the most supportive of their starters with fully 95.1% of the runs that the starters gave up being earned. Kennedy was another story with the DBack pitchers having a middle-of-the-pack 92.1% of runs charged to their starters. Do you remember when Ian Kennedy came up from the minors? What a mess he was in New York. WTF Ian Kennedy?