Monday, September 27, 2010

Rethinking Pythagorean Winning Percentage

Baseball-reference.com describes Pythagorean winning percentage as "an estimate of a team's winning percentage given their runs scored and runs allowed. Developed by Bill James, it can tell you when teams were a bit lucky or unlucky." Click through for the formula.

An example is the 2009 Washington Nationals. The team finished 59-103. However, the Nationals scored 710 runs while allowing 874 runs, so their Pythagorean W-L is 66-96. The Baseball Reference folks would thus argue that bad luck cost the Nats 7 wins, which is a substantial number in comparison to their actual number of wins.

However, while winning is a function of runs scored and runs allowed, runs in the late innings are also partially a function of runs in earlier innings. Your team's run expectations in a given inning are highly sensitive to what's happened earler in the game.

Consider a game where your team is tied in the top of the seventh. And another where you're losing (or winning) by 10 runs. On average, you'll score more runs in the latter scenario, because both teams will insert inferior relief pitchers, saving their star bullpen guys for another day. Hell, they might even have a position player pitch. Losing by 15 or 5 is just as bad as losing by 10, so it makes sense to conserve baseball resources, even at the expense of the final score in a hopeless game. Any cheap runs you score in such situations won't do much to improve your winning percentage, but Pythagorean winning percentage has no way to account for this.

On the other hand, in a tied game, the other team will do everything possible to prevent you from scoring, because a run or two could mean the ballgame.

Of course, teams also conserve offensive resources in blowouts. Pinch runners and defensive replacements often swap in for the likes of Adam Dunn or Barry Bonds. I suppose it's possible a weaker offense could offset the effect of inferior pitching, or even trump it.

Additionally, batters could be trying less hard in blowouts, just wanting to get the game over with, but I don't think there's any evidence to support this (why not pad your batting average against bad pitchers?).

Thursday, September 23, 2010

Proof that Washington Nationals Tickets Are Overpriced


Around the time that they set another record for lowest attendance, the Washington Nationals announced a "Buy 2 Get 2" promotion (see screen grab above once this link becomes obsolete) in which fans can get two full-season tickets free when they buy two in select sections (read: near or behind the foul poles).

At first, this seems like an arbitrage opportunity, or opportunity to make a profit with no risk. There are some time and money costs to reselling tickets, such as researching prices on StubHub or haggling with scalpers, but someone could theoretically make easy money just by selling each ticket for half of its face value. They just need to buy four tickets (two at face value and two free) from the Nationals for $60 and sell each ticket for a little more than $15.

The fact that someone couldn't expect to sell all these $30 Nats tickets for as little as $15 each, and thus make a profit, says a lot about the lack of demand and the sorry state of the franchise.

Abstract Thoughts on the Chance of Rain

I have a day in mind, but I'm not telling you. My question is: what is the chance that it rained that day in Washington DC?

You have no knowledge of the cloud formations, temperatures, or time of year. However, suppose that over the last four decades, it rained 15% of the time in DC. So your best guess is 15%, but you're not very confident.

Suppose that I told you that my mystery day was in September. You could look at historical September predictions and adjust your guess accordingly, now with slightly more confidence.

Yet when we look at the 10-day forecast, we routinely see that the chance of rain is around 50%. This could mean that, historically, it has rained about 50% of the time during this time of year. More likely, it means that historical patterns mean very little when we want to know if it will rain any particular day; instead, the forecast is primarily influenced by the latest meteorological readings.

If it's for sure not going to rain tomorrow, the chance is 0%. If it's for sure going to rain, the chance is 100%. It's amazing to me that, given modern technology, we so often see chance of rain predictions for the very near future around 50%, essentially a coin flip.

Friday, September 10, 2010

Last Observation Carried Forward: A Dangerous Technique?

The statistical technique last observation carried forward is often used in clinical trials. I find it quite troubling.

One example is this recent obesity drug study from The New England Journal of Medicine. Patients in both the study drug group and the placebo group met with doctors monthly for counseling and to have various vital signs measured.

As with many obesity studies, about half of the patients dropped out before the end of the 2-year study. To account for this, the patient's last recorded weight is carried through for the rest of the study to replace any missing values. If a patient last weighed in at 240 pounds at month 3 and then stopping showing up, the statisticians would assume that her weight remained 240 pounds in months 4, 5, 6, and so on. See here for a graphical example, as well as some concerns that this technique may cause.

This methodology paints a much different picture than does the reality that about half of patients will give up on the drug within 2 years. If all of those observations were instead treated as missing (after all, they are  missing in reality), that would severely hamper the statistical power of the results. Of course, this is the last thing the researchers want to do.

Additionally, last observation carried forward requires the assumption that the last recorded observation equals any remaining observations. However, it's easy to imagine some patients who don't find the drug effective, experience additional weight gain, become discouraged, and thus stop participating in the study. This weight gain is not only not observed by the study, but is also being entered into the records as not having happened. Because of this, last observation carried forward creates a clear bias in favor of the study drug.

Thursday, September 2, 2010

Social Change vs. Generational Change

"Bowling Alone" describes how Americans have become less connected socially in various ways over the past few decades. One interesting point from the book is that there are two ways that we can change in the aggregate.

Suppose that bowling has become less popular in recent times. This could be caused by social change: mostly all of the same people decided that they would rather do other things instead. Bowling could be less popular among all age groups than it has been in years past, even comparing the attitudes of today's 50-year-olds to how they felt about bowling two decades ago, when they were in their 30s.

Or the shift could be caused by generational change: the bowling fanatics have been growing old and dying off over the years, only to be replaced by their younger, bowling-apathetic counterparts.

Wednesday, September 1, 2010

Annoying Away Hoodlums

The Washington Post reports that a device that emits an annoying noise that only young people can hear has been installed at Gallery Place. It's an effort to drive away deviant youth that often hang out in the area.

I'm surprised that this is legal.

While the area businesses (a good chunk of which are bars) might benefit by driving young people away, plenty of young people who aren't causing trouble and who have a legitimate reason to be in that neighborhood will be harmed for no reason. Young people (yes, even potential hoodlums) should have just as much right to a public street as the businesses do.