Hold on to your hats: there's math involved!

This teacher isn't allergic to chalk dust...
Credit: University of Michigan

Let's start with a paraphrase of an old lawyer joke:

Q:  How can you tell when someone using statistics is lying?

A:  His lips are moving.

There are, or so we hear, three kinds of lies: lies, damned lies, and statistics. These days, we're bombarded by people spouting statistics from every direction. All you need to do is click on a random blog link to find an astounding array of numbers that people present to inform, persuade, or convince you; depending on the situation. Unfortunately, the sad fact is that quite a lot of the statistics people cite are... well... let's just say "they're not as true as they seem to be." Does that come as a surprise? I sure hope it doesn't.

Here's something you should have learned by now (and I wonder about you if you haven't): every last one of us is lied to every day by people who cite their "statistical analysis." Given that observation, it's up to us to figure out which "facts" are lies and where the truth actually lies.

How do you do that? Well, you start by educating yourself about how the numbers we call statistics are generated, where the information that goes into them comes from, and the methods by which advertisers and "people with an axe to grind" perform what has been called "statisticulation"[1]: the use of statistics to lie. So, here are a few things to watch out for.

You knew there'd be a bar graph, right?

The moving average
Credit: investor / wikimedia commons

How to lie with statistics I

Use a biased sample

One of my personal favorites in the world of bogus statistics is of delivered the form "90% of dentists surveyed use Gump's toothbrushes." That word surveyed is a clue that either GumpCo concocted themselves a pre-biased sample, kept sampling small groups of dentists until they found one of which 90% use Gump's, or used some other means of biasing their sample. Well, of course 90% use their toothbrushes!

This isn't just a phenomenon of advertising, either - an unbiased sample is very difficult to come by, and many so-called "opinion polls" are actually constructed and administered specifically to reinforce a biased sample instead of avoiding one. That would include almost any poll that comes from an organization like the NRA or NORML, for instance; or those nasty "push polls" that political campaigns like so much. Perhaps the most famous would be a "push" against John McCain just before the North Carolina presidential primary in 2000[2], concocted by Karl Rove, that almost guaranteed Rove's candidate would win. 

How to Lie with Statistics
Amazon Price: $12.95 $7.22 Buy Now
(price as of Sep 24, 2015)
Even though it was published 60 years ago, the math hasn't changed!

How to lie with statistics II

Don't define your terms

We all think we know what's someone means when he talks about an "average" value, right? I hate to tell you, but you're probably wrong. We always think that they mean the arithmetic average, more accurately known as the arithmetic mean. But there are other "averages" (values that statisticians call "measures of central tendency"), too: the geometric mean, the median, even the mode. What that means is that two different people can each cite an "average" value based on the precise same set of numbers and end up proving two different points.

Want an example? OK, consider these five hourly wages: $6, $6, $18, $55, $75. The arithmetic average of those five hourly wages is $32 an hour; a fairly decent wage. No doubt it's the one that GumpCo's management uses in their publicity releases to brag about how well their employees are paid. On the other hand, the mode (the most common number) is $6, which is the number on which a union trying to organize at a GumpCo operation might focus. The median wage is $18 (half of the numbers are more; half are less) and the geometric mean is a tad over $19. So when someone cites an "average," you can see just how important it is to know which measurement he means by "average." Keep this in mind, though: you can almost always assume people use whichever measure of central tendency best proves their point.

I don't need no stinkin' attachment!

Ick! Another formula!
Credit: darnok / morguefile.com

How to lie with statistics III

Use the semi-attached figure

This one's a favorite of statistical liars everywhere. Go right ahead and do your study. Better yet, hire an "independent laboratory," the more high-falutin'-sounding its name the better, to perform the study.

Onve it's complete, you can advertise that an "exhaustive" study by the Foggy Bottom Institute of Holistic Research found that the active ingredient in GumpSoLene kills 99% of germs on common household items. We've certainly heard that claim before!

Next, promote GumpSoLene as a means of preventing colds and flu. Never mind that neither a cold not the  flu is caused by germs (think virus) and never mind that the researchers tested the active ingredient in concentrations that would probably the skin off a human body on contact, just go ahead and hope that the ginormous  universe of germophobes out there will rush out to buy gallons of GumpSoLene - 'cause they will. The statement that the active ingredient kills germs on contact is demonstrably true, but it is not completely "attached" to the purpose for which you're supposedly selling GumpSoLene.

Correlation does not equal Causation

By George, I think he's got it...
Credit: xkcd.com

How to lie with statistics IV

The "post hoc" fallacy

Great googly-moogly, we see this one every day. In the college town where I used to live, a particularly outspoken bar owner (who also happened to be a PhD research scientist) testified at a hearing on a proposed anti-smoking ordinance; leaping to his feet and waving around a 26-page list of bars and restaurants that had gone out of business in various cities during the first six months after passage of local anti-smoking ordinances

Now if this research scientist had used that sort of logic - the well-known post hoc, ergo propter hoc logical fallacy - in a scientific paper, he'd have been laughed out of the seminar. But what he wanted was for the city council to assume that every last one of those bars went out of business solely because an anti-smoking ordinance had been passed. He didn't do any research into mismanagement, embezzlement, health-code violations, death of owners, natural disasters, economic downturn, or any of hundreds of other reasons businesses fail. No, his message was, "Pass your damned ordinance and this list will grow to twenty-seven pages within six months."

Don't be fooled, people: just because B happens after A does not necessarily mean that A caused B. No matter how fancy the "statistical analysis" someone may employ, you cannot necessarily infer causality from the order in which events occur.

How to lie with statistics V

Prey on the arithmetically challenged

When Mary lost her job, she was forced to take another job at a 30% pay cut. Through diligence and hard work, though, she's been promoted to the head of the department and received a 30% raise. She's back where she started, right? Wrong: she's still 9% short. If she was laid off at $100K and went took a 30% cut, she was working for $70K. A 30% increase is $21K, meaning that she's now making $91K: 9% less than in her original job.

Now although no one would actually claim that Mary was now making the same as before, the matching 30% numbers are prominently displayed in what just might be an effort to fool a less-than-careful reader. This is the same as a store that advertises that their appliances, which normally sell for 20% off list price have been reduced another 30%. They're half  off now, right? Wrong: they're 44% off...

The Boxer
Amazon Price: $1.29 Buy Now
(price as of Sep 24, 2015)
A timeless observation in a great song

Welcome to Spin City

It used to be that most of  the liars were advertisers and public relations hacks, errr, flacks. That was before political hacks became so enamored of polls and statistics. These days it seems that most lying with statistics is performed while trying to get into office, stay there, or influence the ones who are already there.

If, however, it's the job of PR flacks, ad execs, politicians, political pollsters, and spin doctors to flog the raw data until they can present a statistic that supports their views; then it's the job of the everyday consumer - whether it be of cold medications or public works - to become educated enough to recognize a statistical lie when you see it. Sadly, most of us never recognize the ones that support our own beliefs...

To quote Paul Simon in "The Boxer":

Still a man hears what he wants to hear

And disregards the rest...

Lie la lie...