The era of rapid fact checking is upon us. On an almost daily basis, websites like
The Fact Checker
give us in-depth analysis of the facts, and how they compare to what politicians say. PolitiFact and The Fact Checker go two steps further by using categories to rank factuality on an intuitive scale while maintaining up-to-date report cards on individuals who have been fact checked.
Categorical ranking systems make it easier to internalize and remember the results of a detailed fact check. Together with individual report cards, the categories give us a sense of someone's overall factuality. Yet these report cards are only a small sample of the statements that individuals (or groups) make. Furthermore, a list of counts in different fact checking categories provides no simple, singular measure of factuality that most people can easily interpret.
I created Malark-
-Meter to solve these two problems by using sophisticated statistical and computational methods, whereby I can make inferences about an individual's factuality from a small sample of statements. Beyond the measurement of factuality, I hope to convince people that they must consider the certainty with which they can make statements about the relative truthfulness of different people, especially political opponents. As you'll see, my analyses also belie the hyperbole spoken by one side against the other.
-Meter starts with a simple scoring system. I assign numeric values to each category in a ranking system, with more false statements receiving higher values. Then I multiply those values by the percentage of statements made in that category. Finally, I sum the results. I call the end product "bullpucky". A bullpucky of zero means you are always factual. A bullpucky of 100 means you are 100% full of bullpucky. The bullpucky scale is continuous between those two values.
Sound familiar? That's because Jeremy Kalgreen did something similar with his hilarious and beautifully laid out website,
. But I go two steps further than Kalgreen, who personally approved of my decision to duplicate his scoring system.
First, Kalgreen only used PolitiFact report cards. In an attempt to account for variation among fact checkers, Malark-
-Meter averages scores based on both PolitiFact and The Fact Checker, and can easily be extended to incorporate any number of fact checkers. Second, Malark-
-Meter doesn't just calculate bullpucky scores. I measure our uncertainty in bullpucky scores due to the small sizes, and in our comparisons of one individual or party to another.
I encourage you to navigate this website if you want to learn more details about my scoring and statistical methods, especially if you aren't well-versed in statistics. Then, I urge you to come back here and read my first official analysis.
Tonight, I analyze the comparative bullpucky of the 2012 presidential candidates and their running mates overall, and specific to their performance in the 2012 debates so far. This analysis is especially salient now given that there is one more debate tomorrow, and just 15 days until election day, November 6th.
Let's start by measuring the overall bullpucky of each candidate, and establishing a range of values for the bullpucky score that we can be reasonably certain they have, given the available data. Before we use fancy statistical methods to estimate probability distributions of bullpucky, let's look at the bullpucky scores we observe directly from the report cards. Below are bullpucky scores calculated from the candidates' report cards from PolitiFact and The Fact Checker, respectively.
The Fact Checker bullpucky
It looks like there are some differences between the fact checkers. Overall, The Fact Checker detects more bullpucky than PolitiFact for everyone except Paul Ryan, for whom The Fact Checker detects less bullpucky. It's differences like this that motivate an average of bullpucky scores across fact checkers. Here are the observed average bullpucky scores for each candidate.
I hope I've convinced you that averaging bullpucky over multiple fact checkers is better than trusting only one fact checker. Whether or not I have, I'm going to focus on the "average" fact checker in all the analyses that follow. Anyway, the observed scores suggest that Obama spews less bullpucky than Romney, and Biden less than Ryan, but that none of the candidates are much better (or worse) than half truthful (bullpucky score of 50). From this, we might conclude that the Republican ticket spews more bullpucky than the Democratic ticket.
Logically, then, we want to compare the two campaign tickets against one another by calculating one bullpucky score for each ticket. I do this in two ways. First, I calculate the
bullpucky of the ticket. This method adds together the number of statements in each category from each ticket member before calculating a bullpucky score. Collated bullpucky measures the average factuality of the
made by the members of a party's ticket. Second, I average the bullpucky scores of the politicians on each ticket. This measures the average factuality of the party
on a ticket. Here are the observed collated and average ticket bullpucky for the Republicans and Democrats.
collated ticket bullpucky
average ticket bullpucky
Sure enough, the Democratic ticket appears to spew less bullpucky than the Republican ticket, although neither party is much better (or worse) than half truthful (bullpucky score of 50).
The trouble is that making comparisons like this based only on observationsfrom small samples is...well...bullpucky. We also need to measure our statistical confidence in those statements. That is, we must treat each report card like an experiment in which we sample a few among the many statements that politicians make during their political career, or evening a political debate. Then we use a random number generator to virtually repeat that experiment many many times. This process results in a whole universe of possible bullpucky scores (or comparisons between them). We can calculate the percentage of virtual experiments in this universe that would take on a particular value or range of values. We can also calculate the average bullpucky score (or score comparison) that we would expect. Finally, we can calculate an interval of values that we can be, say, 95% certain would result from such experiments (this is called the 95% confidence interval).
Let's compare the bullpucky scores of the candidates and tickets from this more sophisticated perspective. We'll start by calculating for each candidate the mean bullpucky and its 95% confidence interval, then plotting it on a histogram.
Below are those histograms, labeled with 95% confidence intervals on either side of the candidate's mean bullpucky score. The thick white line marks a half truthful bullpucky score of 50.
Already, we're getting somewhere. See how Obama and Romney's distributions barely overlap? The lack of overlap suggests we can be reasonably confident that the difference in their observed bullpucky is real.
The same is not the case for Biden and Ryan. First, Biden and Ryan now appear to have equal average bullpucky scores. Second, their distributions are very wide compared to the presidential candidates. That's because there are far fewer statements rated for each of the individuals by either of the fact checkers. Third, Biden and Ryan's bullpucky distributions overlap considerably. Together, these findings suggest we shouldn't place much confidence in the observed differences between Biden and Ryan. We just don't have enough evidence to draw a clear distinction.
But how much certainty do we have that Romney spews more bullpucky than Obama, or Ryan more than Biden? Just like we can build a universe of possible bullpucky scores, we can build a universe of possible ratios between bullpucky scores.
Below, I plot comparisons between presidential candidates, vice presidential candidates, collated ticket bullpucky scores, and average ticket bullpucky scores. The red area of the histogram represents the portion of the virtual universe in which the Republican(s) spew(s) more bullpucky than the Democrat(s). The blue area is the opposite. The white line marks the point where the two have equal bullpucky. The scale on the horizontal axis is the ratio of the Republican bullpucky score to the Democrat score.
Indeed, it looks like we can be quite confident that Obama spews less bullpucky than Romney, but not so confident that Biden spews less than Ryan. Moreover, we can be quite confident that the average bullpucky of the statements made by the Democratic ticket is less than that of the Republicans. It also looks like we can be somewhat confident that the members of the Democratic ticket spew less bullpucky on average than the members of the Republican ticket.
We say, "It looks like we can be certain," but how certain we can be? From the virtual universe of comparisons, we can calculate the total percentage of experiments in which, for example, Obama spews less bullpucky than Romney. Doing so results in the following statements associated with the histograms above.
We can be 99.95% certain Romney spews more bullpucky than Obama. So, very certain. Like, almost completely certain.
We can be 55.24% certain Ryan spews more bullpucky than Biden. We're not doing much better than flipping a coin to make our decision about who spews more bullpucky.
We can be 99.93% certain Romney/Ryan spew more (collated) bullpucky than Obama/Biden. Again, almost completely certain.
We can be 91.79% certain Romney/Ryan spew more (average) bullpucky than Obama/Biden. Not completely certain, but pretty certain.
Not only can we examine the comparative bullpucky spewed over all of an individual's statements that have been fact checked. We can do the same for a subset of statements that occurred during a particular event, such as a presidential or vice presidential debate. We can compare the bullpucky scores not only between different candidates, but between a candidate's debate performance and their overall factuality.
Here are the histograms of simulated bullpucky scores for each of the presidential candidates during each of their debates so far, labeled with the 95% confidence interval on either side of the simulated mean bullpucky. The white line represents half truthfulness (bullpucky score of 50).
Notice that the confidence intervals are wider now because the sample size of statements is smaller. At first glance, there are some clear departures in the candidates' debate performance from their usual lamount of bullpucky. But are these departures "real"? Let's plot the comparisons, much as we did with the comparisons between candidates. The horizontal scale is the ratio of a candidate's usual bullpucky to the bullpucky spewed during a particular debate. The lighter portion of the plot represents the portion of the virtual universe in which the overall bullpucky is greater than the debate bullpucky. The white line lies at the point where both overall and debate bullpucky are equal.
It looks like we can be somewhat confident that Obama spews less bullpucky normally than he did during the 1st debate, and more bullpucky normally than he did during the 2nd debate. It doesn't look like we can be so certain that Romney spews more bullpucky normally than he did during the 1st debate, but that we can be quite certain he spews more bullpucky normally than he did during the 2nd debate.
Again, we say, "It looks like...," but what is the probability of a given comparison? I calculated those probabilities from the virtual universe of comparisons.
We can be 76.19% certain Obama spews
bullpucky normally than he did during the 1st debate. Not certain, but better than three to one odds.
We can 50.92% certain Romney spews
bullpucky normally than he did during the 1st debate. Less than 1% better than the toss of a coin.
We can 86.29% certain Obama spews
bullpucky normally than he did during the 2nd debate. Not certain, but about six to one odds.
We can 98.25% certain Romney spews
bullpucky normally than he did during the 2nd debate. Not 100% certain, but pretty certain.
The vice presidential debate between Biden and Ryan was particularly heated. We can do a similar analysis for this debate. Here are the debate bullpucky scores of the two vice presidential candidates in their one and only debate.
And here is the plot of the comparison between their overall bullpucky and their performance during the debate, as we saw before with the presidential candidates.
It looks like we can't be too confident that Vice President Joe Biden spewed less bullpucky during the debate than he does normally, but we can be somewhat confident that Ryan spewed more bullpucky during the debate than he does normally. Here are the probabilities that describe our level of certainty in such statements:
We can be 70.37% certain that Biden spews
bullpucky normally than he did during the vice presidential debate. Not completely certain, but better than two to one odds.
86.35% certain that Ryan spews
bullpucky normally than he did during the vice presidential debate. Not completely certain, but about six to one odds.
What about comparisons between the presidential and vice presidential candidates during the debates? Here is are comparison plots analogous to the ones we've constructed before, drawing the histograms of the simulated ratio between the Republican and Democrat bullpucky during a particular debate.
It looks like we can't make heads or tails of which presidential candidate spewed more bullpucky during either of their first two debates. It does look promising that Ryan spewed more bullpucky than Biden during their debate. And here are the probability statements that confirm these graphical hunches:
We can be 64.14% certain Romney spewed more bullpucky than Obama during the 1st debate. That's nearly 2 to 1 odds.
We can be 92.38% certain Ryan spewed more bullpucky than Biden during the vice presidential debate. Not completely certain, but pretty certain.
We can be 53.11% certain Romney spewed more bullpucky than Obama during the 2nd debate. Not much better than a toss up.
Monday is the third and final presidential debate. Can we predict from these analyses who will spew more bullpucky than who? Unfortunately, not with much precision. But I hope the candidates make a trend out of the second debate's pattern, wherein both candidates appeared to spew less bullpucky than normal. On Tuesday, I will do an analysis of the fourth debate, as well as an analysis of Obama and Romney's overall debate performance, and also an analysis comparing the two tickets overall.
What do we learn from these analyses? First, we learn that there is a lot uncertainty in the amount of bullpucky that politicians spew compared to one another. Second, despite this uncertainty, we can be reasonably confident that Obama spews less bullpucky than Romney, and the Democratic ticket spews less bullpucky than the Republican, but not as much less as left-leaning pundits would have you believe. Third, all of the candidates have mean bullpucky scores that are within ten points of half truthful, and all of their distributions overlap the half truthful mark. Fourth, we can do similar measurements and comparisons of individual performance during key debates.
Factuality isn't the only factor you should consider when you vote. But I hope that you'll use Malark-
-Meter to inform your decisions in this year's presidential election and beyond.
Welcome to Malark-