Saturday, March 31, 2012

Say What?

I can't figure out if this is the worst statistic I've read this week, or just the most poorly phrased good statistic.  I'm leaning towards that first one:

"....more than half of students earning bachelor’s degrees at public colleges – 56 percent – are graduating with $22,000 of debt, on average."  -Nancy Zimpher on

If I'm reading this correctly, they tossed out everyone at private college, then anyone who didn't graduate, then (most disturbingly) 44% of who was left?  Why did they get the boot?  I mean, if we're just tossing out arbitrary numbers of students, can't we get any average we want?

Nancy Zimpher, what are you up to??? 

Weekend moment of zen 3-31-12

"In God we trust.  All others must have data."

Thursday, March 29, 2012

Why most nutritional research is useless

Nutrition research is big money these days.  Our national obsession with weight loss is at a fever pitch, and any new or interesting research is sure to make headlines.

Here's some basic guidelines on what to look for in nutritional research (any study, not just this one):
  1. Was the data self reported? Even CNN brought this up in their article.  People, especially those embarrassed about their weight, don't accurately assess what they eat.  My mother, skinny little thing that she is, could eat one peppermint patty and tell you she'd had a serving of chocolate.  I don't think I'd even count it until I had 3 or so.
  2. How much was "more"? I actually can't find this for this study.  Is it the difference between 1 and 2 servings per week?  Or the difference between 1 and 5?  Both would produce statistically significant correlations, but the practical outcome would be different.  In 2005, researchers made news by saying that eating more fruits and veggies did not, in fact, prevent cancer.  The cancer treating establishment (which I work in, btw) promptly responded by pointing out that they compared people who ate half a fruit per day to those who at 1-2 fruits per day.  It was all reported in grams too, so the data look extra impressive "Those eating less than 114 grams showed no difference from those eating 367 grams".  The link gives more examples, but 250 grams is one medium apple.  Watch out for this.
  3. Who classified people as "normal weight" or "overweight"?  If this was also self reported (and in this study, it looks like there were clinic visits), then look out.  There's a great study I can't find right now that shows that women tend to lie about weight, and men tend to lie about height.  Both lies will screw up the BMI calculation (the most common metric for assessing "normal").
  4. Were the overweight people actively (or even somewhat) trying to modify their diets to lose weight?  A few years ago, I heard about the study that suggested diet soda was linked to obesity.  I remember my first reaction was "are we sure they're all not just on diets?".  This seemed like a classic correlation/causation issue.  All the analysis seemed to presume they were overweight because they drank diet soda.  I wondered why they never seemed to look at the idea that they could be drinking diet soda because they were overweight.  That's one of the first swaps most people I know make when they try to lose weight.  
  5. Don't even get me started if it's a population study.  That's a big topic for another time, but lets just say they're really really tricky.
If you ever want a fabulous crash course in how nutrition research can be skewed, pick up two diet books that contradict each other, and read through their parts on research.  Take something like Atkins (high protein, low carb) and Joel Fuhrman (nearly vegan), and watch them rip to shreds the research the other one builds their whole case on.  

He may have his own controversy, but this is why I like Michael Pollan.  The book I linked to has a great crash course in why most nutritional research just sees what it wants to.  He refused to take a strict nutritional stance and instead condensed it down to a few "rules" that he gleaned from quizzing nutritionists on "what they could say for sure".  The answer? Eat real food, not too much, mostly plants.  

Wednesday, March 28, 2012

Blog Rules

Thanks to some links from the kind people at Assistant Village Idiot  and Maxed Out Mama, I have gotten a bit more traffic than I expected in the past two days.  As such, I realized it might be a good moment to spell out some of the rules for this blog I've had bouncing around in my head.  These are rules for me really, not for commenters, as no one can hope to tame the internet:

  1. I will try my best to provide a link for every study I cite, and this link will get as close to source data as possible.  Nothing drives me crazier than reading about "new research" with absolutely no clue as to where to find it.  I spent almost 20 minutes trying to find where the heck Jack Cafferty got his numbers for this article, and it made me mad.  I won't do that to you.  And here are the numbers he reported on, as a sign of good faith.
  2. I will attempt to remain non-partisan. I have political opinions.  Lots of them.  But really, I'm not here to try to go after one party or another.  They both fall victim to bad data, and lots of people do it outside of politics too.  Lots of smart people have political blogs, and I like reading them...I just don't feel I'd be a good person to run one.  My point is not to change people's minds about issues, but to at least trip the warning light that they may be supporting themselves with crap.  That being said, if I start to lean to far to one side, smack me back to center.
  3. I will admit that I will probably fail at #2, and have lots of other biases as well.  What, you thought I was going to claim to be neutral?  No special snowflake here, we humans can't help ourselves.
  4. I will, when I can, declare those biases up front.  When I review a study on changing last names, I think it's relevant that I didn't change mine.  When mentioning healthcare reform, I think it's relevant that I live in the one state in the nation that won't be affected by it either way.  It makes it easier 
  5. I will attempt to explain all stats words that are used.  I am not a stats teacher, I am just someone who uses a lot of data to get a job done.  I would love to do more than just preach to the choir, and thus I will try not to have any prereqs for this class.  For the very smart commenters I have here, this may get tedious, but bear with me.
  6. I will try to improve my use of apostrophe's.  I'm really not good at those.
  7. Suggestions always welcome.  The internet is awesome because I get to learn from smart people I normally wouldn't meet.  

Tuesday, March 27, 2012


Hate's a strong word.  I get that.  I also get that data and survey types are not always the sort of thing that inspires people to strong hatred, but here we are.

In this post I mentioned my annoyance at perception/prediction polls.  The one I referenced was based on women who didn't change their last names and their level of marital commitment.  Commenter Assistant Village Idiot mentioned another example, which I also liked ""Do you think earthquakes are more likely now because of climate change?" What we think has nothing to do with anything. The earthquakes will happen according to their own rules."  

In writing that post however, I forgot to mention that same study included an even worse piece of data.  As a rebuttal to the "Midwestern college kids don't think non-name changing women are committed" they included a remark that women who didn't plan on changing their names didn't feel less committed. 


I would really love it if someone could tell me if there's a proper name for this sort of thing, but I always think of it as "the embarrassing question debacle".  Basically, researchers ask people questions with a potentially embarrassing answer, and then report it as meaningful when people do not answer embarrassingly.

There are only two types of people I have ever heard who will admit they went in to their marriages less than completely committed:

  1. Those who have been married successfully for quite some time who are now comfortable in admitting they were totally naive when they walked down the aisle.
  2. Those who are already divorced and reflecting on what went wrong.
Level of commitment is best assessed in retrospect, and I look with great skepticism at anyone who says they can gauge it before the fact.  

Getting at the reasons people do things can be brutal.  Your only source for your data also has the biggest motivation to conceal it from you.  Some people are actually doing things for good reasons, some just want to look like they are, and some are lying to themselves.  Unless a study at least attempts to account for all 3 scenarios, I would hold all answers suspect.

Monday, March 26, 2012

It's not the question, it's how you ask it

Data gathering is a lot harder than most people imagine.  It's an interesting exercise to take a study and prior to reading it start asking yourself "how would I, if pressed, get the data they claim to have gotten?".  It's amazing how many fall apart quickly when you realize how bad the source data is.

I face this all the time at work.  The simplest questions...what is our demand for transplants? can be a never ending labyrinth of opinion, observation, anecdote, and data....all completely enmeshed.  I spend much of my day trying to untangle these strings, and I never underestimate how difficult getting a simple answer can be. ran a great piece today illustrating this challenge.  In a post titled "How Many Would Repeal Obamacare?"  they review 4 different surveys that all try to get to the same number: how many people think healthcare reform should be repealed?

It's a great article that covers sampling practices, question phrasing, date of the poll, and history of the polling organization.

If you looking at the numbers, it shows up pretty quickly that when given dichotomous choices (repeal/keep), people often look like they gave a strong opinion.  In the polls where more moderate answers are given ("it may need small modifications, but we should see how it works), people trend towards that answer.

The phrasing was extremely intriguing though:
“Turning to the health care law passed last year, what is your opinion of the law?”
“If a Republican is elected president in this November’s election, would you strongly favor, favor, oppose, or strongly oppose him repealing the healthcare law when he takes office?”
“Do you think Congress should try to repeal the health care law, or should they let it stand?”

In one, the question focuses on personal opinion, in the next the focus is the presidency, in the third it's Congress.  All of this for a law that most Americans have yet to feel the effects of in any practical way.

Of course this is not to say that a public opinion poll (or 4) makes one side right or wrong. If constitutionality or effectiveness are your concern, nothing here addresses either.  I am enjoying it immensely for the educational value though, and kind of wishing I was teaching a class so I could use this as an example.  Those of us in Massachusetts do have the luxury of sitting back and just sort of pondering all of this has been our world for 7 years now.

That reminds me....were these samples controlled for that????

Sunday, March 25, 2012

Correlation and Causation: the Housework Edition

After yesterday's comic, I was hoping to find a good example of a news story where they equated correlation and causation.  In case you're curious, it took me under 5 minutes.

Headline: Why Being Less of a Control Freak May Make You Happier

To start, let me just mention that correlation implies that two things are moving one goes up, so does the other.  Alternatively, as one goes up, the other goes down, or vice versa.  Either way, their outcomes appear to be tied.

Causation on the other hand, says that one thing is causing another.  What yesterday's post was referring to is the often made mistake that just because two things are correlated, we can infer that one is causing the other.  This is not always true, and believing so may get you drawn as a stick figure.  

Anyway, the article above illustrates that point nicely.  The author set out to find out if being a control freak mom made people unhappy....and low and behold it appears to.  55% of women who said they delegate to a partner or spouse at least once a week reported themselves as "very satisfied" with their life.  For those who did not delegate that often, the number was 43%.  

Now, I'll mostly skip the use of the word "delegate" in this article, though it does bother me.  My husband does plenty around the house, but we mostly just consider that "teamwork" not "delegating".  I don't start the week handing out tasks to him, and he doesn't consider the work he does around the house a favor to me.  It's just what needs to get done.

More importantly however, is the articles conclusion that delegating will make people happier.  While delegating and happiness are perhaps correlated, they are not necessarily causal.  It's possible that the women who don't delegate do so because their spouse is lazy, hostile, or generally not involved....all things which would also make them less happy over all.  It's also possible that women who don't delegate are controlling, martyr's, passive aggressive, etc, and that makes them unhappy too.

I had a great stats professor once who opened every class with this:

"If you get one thing out of this class, let it be this:

When x and y or correlated, you have 3 possibilities:

  1. X is causing Y
  2. Y is causing X
  3. Something else is causing both X and Y "
Lack of delegating could cause unhappiness.
Unhappiness could cause people to stop delegating.
Something else entirely could cause people to not delegate and to be unhappy.

Thursday, March 22, 2012

When in Doubt, Blame the Journalist

Within minutes of hitting "publish post" on my mission statement, I found an article that reminded me of one of my worst pet peeves when it comes to data/science/studies of all types.  The headline read  "Keeping Your Name? Midwesterners Are Judging You".  My ears (eyes?) perked up at this headline, as I am among those women who declined to change her name post-nuptial.  Despite knowing that Jezebel is not often the best place for unbiased reporting, I gave it a read.  

The article linked to a much more well nuanced article here, but the basics are as follows: students at a small midwestern college feel that women who don't change their last names when they get married are less committed to their relationships than those who do.  This was interesting in part because the number of people who felt negatively about this quadrupled between 1990 and 2006.  

For the personal reasons listed above, I find this interesting.  However, when you look at the numbers (2.7% of 256 and 10.1% of 246 which Jezebel did include) and do a little math, you realize that this "jump" is a difference of 18 people.  

A few things to consider about this:
  1. I couldn't find that this was published anywhere.  It seemed to be a sort of "FYI for the headlines".
  2. Apparently there's no data on whether or not this perception is true.  My bias would be that it's not, but I couldn't find data actually saying if the perception was correct.  This happens in many "perception" studies....they quote percentages who believe something with the implication that a certain belief is wrong without ever proving it.
  3. There wasn't a gender breakdown of who those 18 people were.  If most were female, then isn't their perception likely to be based on experience?  As in "well if I didn't do it, it would be because I wasn't committed"?  That not judgement of others, that's judgement of self.
  4. Have any of their professors (or TV shows, or other media sources) recently made disparaging remarks about this?  18 people who all very well might know each other (the university surveyed was under 1000 students) could easily be influenced in their answer  by even one strong source.
  5. As college students, presumably very few of those polled were actually married.  From my experience in college, I would conjecture that this is a phase of life during which people are very idealistic regarding their future mates without having many real experiences to back it up.  I put much more stock in what people who are actually married use to feel out level of commitment than what someone who's never walked down that aisle thinks.
All that being said, it looked like the study authors were careful to address several of these points (especially the "this is not a representative sample" point.  It was only in the translation that conclusions were drawn that were more dubious.  

Scientists have very little incentive to exaggerate the meaning of their findings.  They are in a profession where that could be very damaging.  Reporters for both old and new media have EVERY incentive to spin things in to good headlines.  Remember that.

Wednesday, March 21, 2012

Mission Statement

Numbers never lie.  

Unlike people, who are constantly confused by their own biases and perspectives, numbers behave....if you know how to use them.  

This is what I do for work every day:
First, I get what management thinks is the problem.  Second, I talk to the people involved and find out what they think is the problem.  Third, I get to retreat in to the numbers.  I spend time looking at what we're doing, where we are, and where we'd need to be for everyone to be happy.  It's the third part that's my favorite.  No one argues, no political pressure, just puzzles, problems, and unexpected truths.  

I use data every day to help improve health care, and I've been pretty successful at it so far.  As I look around though, I realize how few people really understand the importance of good data in our lives.  One needs look no further than election year politics to see bad data, poor interpretations of good data, and blatant misuses that make me cringe.  In the healthcare realm, we don't have this luxury.  I come from a world where you can't take chances, where misrepresenting your stats can result in very real human suffering.  

This is why improper uses of data drive me nuts.  Once you know what to look for, it's hard to stop seeing it. It's everywhere.  Thus, I am giving myself an ambitious goal.  It's no longer enough for me to use good data science for my own purposes.  I want to educate others, and hone my own skills along the way.  I want people to know what research is, how to read it, and how to question it.  I want others to be as passionate as I am, and I want a place to vent about the reporting that annoys me.  

Stay tuned.