Bad Data, Bad!: The (ACS) Devil and Daniel Webster

Wednesday, May 23, 2012

The (ACS) Devil and Daniel Webster

As a New Hampshire native, I am prone to liking people named Daniel Webster.

It is thus with some interest that I realized that the Florida Congressman who is sponsoring the bill to eliminate the American Community Survey happens to share a name with the famous NH statesman. I have been following this situation since I read about it on the pretty cool Civil Statistician blog, run by a guy who runs stats for the census bureau.

Clearly there's some interesting debate going on here about data, analysis, role of the government, and the classic "good of the community vs personal liberty" debate.

I'm going to skip over most of that.

So why then, do I bring up Daniel Webster?

Well, I was intrigued by this comment from him , as reported in the NYT article on the ACS:

“We’re spending $70 per person to fill this out. That’s just not cost effective,” he continued, “especially since in the end this is not a scientific survey. It’s a random survey.”

It was that last part of the sentence that caught my eye.

I was curious, first of all, what the background was of someone making that claim. I took a look at his website, and was pleased to discover that Rep. Webster is an engineer. It's always interesting to see one of my own take something like this on (especially since Congress only has 6 of his kind!).

That being said, is a random survey unscientific?

Well, maybe.

In grad school, we actually had to take a whole class on surveys/testing/evaluations, and the number one principal for polling methods is that there is no one size fits all. The most scientifically accurate way to survey a group is based on the group you're trying to capture. All survey methods have pitfalls. One very interesting example our professor gave us was the students who tried to capture a sample of their college by surveying the first 100 students to walk by them in the campus center. What they hadn't realized was that a freshman seminar was just letting out, so their "random" survey turned out to be 85% freshman. So over all, it's probably worse when your polling methodology isn't random than when it is.

There's all kinds of polling methods that have been created to account for these issues:

simple random sampling - attempts to be totally random
systematic sampling - picking say, every 5th item on a list
stratified sampling - dividing population in to groups and then picking a certain percentage from each one (above this would have meant picking 25 random people from each class year)
convenience sampling - grabbing whoever is closest
snowball sampling - allowing sampled parties to refer/lead to other samples
cluster sampling - taking one cluster of participants (one city, one classroom, etc) and presuming that's representative of the whole

There are others, though most subtypes off of these types (see more here).

So what does the ACS use?

As best I can tell, they use stratified sampling. They compile as comprehensive a list as they can, then they assign geocodes, and select from there. So technically, their sampling is both random and non-random.

Now, NYT analysis aside, I wonder if this is really what Webster was questioning. The other meaning one could take from his statement is that he was challenging the lack of scientific method. As an engineer, he would be more familiar with this than with sampling statistics (presuming his coursework looked like mine). What would a scientific survey look like there? Well, here's the scientific method in a flowchart (via Sciencebuddies.org):

So it seems plausible he was actually criticizing the polling being done, not the specific polling methodology. It's an important distinction, as all data must be analyzed on two levels: integrity of data, and integrity of concept. When discussing "randomness" in surveys, we must remember to acknowledge that there are two different levels going on, and criticisms can potentially have dual meanings.

3 comments:

DubbahdeeMay 24, 2012 at 9:46 AM
Speaking of discussing issues around types of sampling and survey errors, you might find this little piece from The Atlantic of interest:

http://www.theatlantic.com/politics/archive/2012/05/pushback-on-npr-vs-fox/257620/
ReplyDelete
Replies
AnonymousMay 24, 2012 at 9:52 AM
Hello bs king,

I don't know if you are interested but I worked at the Census Bureau from 1990 to 1996 (on the demographic side) and I can make an educated guess at the type of sample they are using. The surveys I worked with used a stratified cluster sample which, as the name implies, sampled a cluster from each strata. This was a complicated but very efficient design.

The sampling method was both quite scientific and random. Congressman Webster may have a different complaint with the survey or he may not understand. Engineers don't like randomness.

Glenn
ReplyDelete
Replies

Add comment

Wednesday, May 23, 2012

The (ACS) Devil and Daniel Webster

3 comments:

Subscribe To