Field of Science

Showing posts with label On ESPN. Show all posts
Showing posts with label On ESPN. Show all posts

Baseball Models


The Red Sox season opener was delayed yesterday by rain. In honor of Opening Day 2.0 (this afternoon), I point you to an interesting piece in the Times about statistical simulations in baseball. According to the article, the top simulator available to the public is Diamond Mind.

Alzheimer's, Autism & the NCAA: Science News for 3/17

Do vaccines give Somalis autism? Can diabetes give you Alzheimer's? Does losing make you win? Anyone scanning the science news articles this week would know the answers to these questions.

First, Freakonomics has a discussion of a recent paper showing that NCAA basketball teams are more likely to win if they are 1 point behind at halftime than if they are 1 point ahead. It seems that when people are slightly behind in a game at halftime, they work even harder in the second half relative to people who are way behind, slightly ahead or way ahead.

Second, the New York Times (byline: Donald McNeil Jr.) discusses the abnormally high rate of autism among Somali immigrants in Minneapolis. The article gives several explanatory hypotheses (including a statistical fluke), but a lot of time is spent on the "possibility" that these cases of autism are caused by vaccinations. The fact that the article doesn't mention that this is simply absurd is glaring (though it does mention "some children" had autistic tendancies before being vaccinated). More interesting is that many of these kids appear to have had seizures, something which is mentioned only in passing.

Finally, Amanda Schaffer at Slate discusses the possible relationship between insulin and Alzheimer's (Diabetes of the Brain: Is Alzheimer's disease actually a form of diabetes?).

How to win at baseball (Do managers really matter?)

It's a standard observation that when a team does poorly, the coach -- or in the case of baseball, the manager -- is fired, even though it wasn't the manager dropping balls, throwing the wrong direction or striking out.

Of course, there are purported examples of team leaders that seem to produce teams better than the sum of the parts that make them up. Bill Belichick seems to be one, even modulo the cheating scandals. Cito Gaston is credited with transforming the Blue Jays from a sub-.500 team into a powerhouse not once but twice, his best claim to excellence being this season, in which he took over halfway through the year.

But what is it they do that matters?

Even if one accepts that managers matter, the question remains: how do they matter? They don't actually play the game. Perhaps some give very good pep talks, but one would hope that the world's best players would already be trying their hardest pep talk or no.

In baseball, one thing the manager controls is the lineup: who plays, and the order in which they bat. While managers have their own different strategies, most lineups follow a basic pattern, the core of which is to put one's best players first.

There are two reasons I can think of for doing this. First, players at the top of the lineup tend to bat more times during a game, so it makes sense to have your best players there. The other reason is to string hits together.

The downside of this strategy is that innings in which the bottom of the lineup bats tend to be very boring. Wouldn't it make sense to spread out the best hitters so that in any given inning, there was a decent chance of getting some hits.

How can we answer this question?

To answer this question, I put together a simple model. I created a team of four .300 hitters and five .250 hitters. At every at-bat, a player's chance of reaching base was exactly their batting average (a .300 hitter reached base 30% of the time). All hits were singles. Base-runners always moved up two bases on a hit.

I tested two lineups: one with the best players at the top, and one with them alternating between the poorer hitters.

This model ignores many issues, such as base-stealing, double-plays, walks, etc. It also ignores the obvious fact that you'd rather have your best power-hitting bat behind people who get on base, making those home-runs count for more. But I think if batting order has a strong effect on team performance, it would still show up in the model.

Question Answered

I ran the model on each of the line-ups for twenty full 162-game seasons. The results surprised me. The lineup with the best players interspersed scored nearly as many runs in the average season (302 1/4) as the lineup with the best players stacked at the top of the order (309 1/2). Some may note that the traditional lineup did score on average 7 more runs per game, but the difference was not actually statistically significant, meaning that the two lineups were in a statistical tie.

Thus, it doesn't appear that stringing hits together is any better than spacing them out.

One prediction did come true, however. Putting your best hitters at the front of the lineup is better than putting them at the end (291 1/2 runs per season), presumably because the front end of the lineup bats more times in a season. Although the difference was statistically significant, it still amounted to only 1 run every 9 games, which is less than I would have guessed.

Thus, the decisions a manager makes about the lineup do matter, but perhaps not very much.

Parting thoughts

This was a rather simple model. I'm considering putting together one that does incorporate walks, steals and extra-base hits in time for the World Series in order to pick the best lineup for the Red Sox (still not sure how to handle sacrifice flies or double-plays, though). This brings up an obvious question: do real managers rely on instinct, or do they hire consultants to program models like the one I used here?

In the pre-Billy Beane/Bill James world, I would have said "no chance." But these days management is getting much more sophisticated.

Do ballplayers really hit in the clutch?

If you've been watching the playoffs on FOX, you'll notice that rather than present a given player's regular-season statistics, they've been mostly showing us their statistics either for all playoff games in their career, or just for the 2007 post-season. Is that trivia, or is it an actual statistic? For instance, David Ortiz hits better in the post-season than during the regular season. OK, one number is higher than the other, but that could just be random variation. Does he really hit better during the playoffs?

Why does this even matter? There is conventional wisdom in baseball that certain players hit better in clutch situations -- for instance, when men on base. This is why RBIs (runs-batted-in) are treated as a statistic, rather than as trivia. Some young Turks (i.e., Billy Beane of the Oakland A's) have argued vigorously that RBIs don't tell you anything about the batter -- they tell you about the people who bat in front of him (that is, they are good at getting on base). Statistically, it is said, few to no ballplayers hit better with men on and 2 outs.

So what about in the post-season?

I couldn't find Ortiz's lifetime post-season stats, so I compared this post-season, during which he's been phenomenally hot (.773 on-base percentage through the weekend -- I did this math last night during the game, so I didn't include last night's game), compared with the 2007 regular season, during which he was just hot (.445 on-base percentage).

There are probably several ways to do the math. I used a formula to compare two independent proportions (see the math below). I found that his OBP is significantly better this post-season than during the regular season. So that's at least one example...

Here's the math.

You need to calculate a t statistic, which is the difference between the two means (.773 and .445) divided by the standard deviation of the difference between those two means. The first part is easy, but the latter part is complicated by the fact that we're dealing with ratios. That formula is:

square root of: (P1*(1-P1)/N1 + P2*(1-P2)/N2)
where P1 = .773, P2 = .445, N1 = 659 (regular season at-bats - 1), N2 = 22 (post-season at-bats - 1).

t = 2.99, which gives a p value of less than .01.

I was also considering checking just how unusual Colorado's winning streak is, but that's where my knowledge of statistics broke down (maybe we'll learn how to do that next semester). If anybody has comments or corrections on the stats above or can produce other MBL-related math, please post it in the comments.