Exploring the world of software development, technology, data science, and life

Stop Lying With Statistics

I had a different blog entry for this week, but recent events have, well, changed things. For those of you living under a rock (and is there room under there for one more) on Wednesday last week a mob stormed the Capitol Building where Congress was in the process of formalizing Joe Biden’s victory in last year’s presidential election. Congress was evacuated, five people lost their lives, and the Capitol Building was occupied by enemy forces for the first time since the British did so in 1814.

In a sense I don’t blame those who participated in the insurrection. They honestly believed that the election was being stolen and that there was an ongoing coup against their president. Had Trump’s calls to the governors and secretaries of state in several states he lost in desperate attempts to get them to reverse the results been successful I’m sure many on the left would be rioting. Democracy is worth saving. The problem is these people were lied to and were under the mistaken impression that their votes were being stolen.

It’s past time to start correcting those lies. Today I’m going to go over some of the claims that Biden’s win was “statistically impossible”, specifically those by Charles Cicchetti in an affidavit supplied with the Texas lawsuit (starting at page 20).

What is Statistical Impossibility?

But first let’s stop for a second and define what we mean when we say something is “statistically impossible.” It means that we have built a mathematical model that we have reason to believe corresponds to the situation we are studying. Maybe we believe a certain feature is normally distributed within the population with a given mean and standard deviation. Maybe we we believe a particular event is following a Bernoulli distribution with a given probability. We can then look at the data we have and see how likely it is that our model could have generated it or something as extreme. If the probability is sufficiently low, we can say that the model can be rejected.

It should be noted that we never confirm a model using this approach. We only reject models. That is why scientists often use null hypotheses (hypotheses that indicate there is no effect being studied) and use statistics to reject them to establish evidence that there is an effect.

https://xkcd.com/892/

One in a quadrillion?

So on to the analysis by Charles Cicchetti. What hypothesis was he testing?

7. I was asked to analyze some of the validity and credibility of the 2020 presidential election in key battleground states. I analyzed two things that seem to raise doubts about the outcome. First, I analyzed the differences in the county votes of former Secretary of State Hillary Clinton (Clinton) compared to former Vice President Joseph Biden (Biden). Second, many Americans went to sleep election night with President Donald Trump (Trump) winning key battleground states, only to learn the next day that Biden surged ahead. Therefore, I compared and tested the significance of the change in tabulated ballots earlier in the reporting to subsequent tabulations. For both comparisons I determined the likelihood that the samples of the outcomes for the two Democrat candidates and two tabulation periods were similar and randomly drawn from the same population. I used a standard statistical test in this comparison.

There is quite a bit there but the key here is the second to last sentence. “For both comparisons I determined the likelihood that the samples of the outcomes for the two Democrat candidates and two tabulation periods were similar and randomly drawn from the same population.”

Here is some more detail.

10. In 2016, Trump won Georgia with 51.0% of the vote compared to Clinton’s 45.9% with more than 211,000 votes separating them. In 2016, Clinton received 1,877,963 votes and Trump received 2,089,104. In 2020, Biden’s tabulated votes (2,474,507) were much greater than Clinton’s in 2016. Trump’s votes also increased to 2,461,837. The Biden and Trump percentages of the tabulations were 49.5% and 49.3%, respectively.

11. I tested the hypothesis that the performance of the two Democrat candidates were statistically similar by comparing Clinton to Biden. I use a Z-statistic or score, which measures the number of standard deviations the observation is above the mean value of the comparison being made. I compare the total votes of each candidate, in two elections and test the hypothesis that other things being the same they would have an equal number of votes.’ I estimate the variance by multiplying the mean times the probability of the candidate not getting a vote. The hypothesis is tested using a Z-score which is the difference between the two candidates’ mean values divided by the square root of the sum of their respective variances. I use the calculated Z-score to determine the p-value, which is the probability of finding a test result at least as extreme as the actual results observed. First, I determine the Z-score comparing the number of votes Clinton received in 2016 to the number of votes Biden received in 2020. The Z-score is 396.3. This value corresponds to a confidence that I can reject the hypothesis many times more than one in a quadrillion times’ that the two outcomes were similar.

He’s not calculating the likelihood that Biden won Georgia in 2020. He’s calculating the likelihood that Biden won Georgia in 2016. His model assumes that the Georgia electorate in 2020 was approximately the same as the George electorate in 2016, when Trump won by over 5 percentage points. It also assumes that Georgia voters saw no difference between Biden and Clinton.

This is an absurd assumption. And his statistics back that up. It is statistically unlikely that the Georgia electorate in 2016 and 2020 were identical and judged Biden and Clinton equally. One possibility is that the Republican governor and Secretary of State conspired with Democrats to elect Joe Biden. Another is that 2016 and 2020 were different years and Biden and Clinton were different candidates.

I’ll go with the second possibility.

Now he does address this. Kind of.

13. There are many possible reasons why people vote for different candidates. However, I find the increase of Biden over Clinton is statistically incredible if the outcomes were based on similar populations of voters supporting the two Democrat candidates. The statistical differences are so great, this raises important questions about changes in how ballots were accepted in 2020 when they would be found to be invalid and rejected in prior elections.

There are no “statistics” presented about why people might change their voting preferences between 2016 and 2020. Which is funny because such statistics could be found if he looked for them. In 2016 the polls had Trump leading the state between 1 and 9 percentage points (Trump ended winning by just over 5 percentage points). In 2020 the polls had the race up in the air, with Biden having a slight lead. And while yes, in some states (such as Michigan and Wisconsin) polling has been unreliable, in Georgia it has not.

Now let’s see what he makes of the overnight flip.

14. At 3:10 AM EST on November 4 the Georgia reported tabulations were 51.09% for Trump and 48.91% for Biden (eliminating third-party candidates). The total voles reported for the two major candidates were 4.662.328. On November 18 at 2 PM EST. the reported percentages were Trump 49.86% and Biden at 50.14%. The Biden advantage over Trump in the final tabulations reported was less than 14,000 votes, or 0.28%. For this turnaround to occur, the subsequent additional late” ballots totaling 268,204 votes (5.4% of the votes reported on November 18) had to split 71.60% for Biden and 28.40% for Trump. The two periods report shifts in the percentage favoring Trump from 51.09% to 49.86%, which is a percentage difference of 1.23%.

15. The Georgia reversal in the outcome raises questions because the votes tabulated in the two time periods could not be random samples from the same population of all votes cast. I use a Z-score to test if the votes from the two samples are statistically similar. I estimate a Z-score of 1.891. There is a one in many more than quadrillions of chances that these two tabulation periods are randomly drawn from the same population. Therefore. the mooned tabulations in the early and subsequent periods could not remotely plausibly be random samples from the same population of all Georgia ballots tabulated. This result was not expected because the tabulations reported a13 AM EST represented almost 95% of the final tally, which makes a finding of similarity for random selections likely and not statistically implausible.

The model he is testing here is that the results that arrived early are from the same population as the results that arrived late. But we know they weren’t. Different counties take longer than others to count. And the different demographic makeups of those counties means they come from populations with very different voting patterns. And to make things worse, this election had an unusually high level of absentee voting due to the coronavirus pandemic. These absentee votes tilted heavily for Biden in no small part due to the Biden campaign’s push for mail in votes and the Trump campaign’s shunning of mail in voting. Those ballots were counted late in many states, including Georgia.

Again, this analysis does not prove there was fraud. It just disproves a population model we already knew was false.

He also opines on ballot rejection rates.

22. In 2016, the rejection rate for mail-in absentee ballots in Georgia was 6.42%.
2016 Mail-in Absentee Ballots
2016 Mail-in Volume 213,033
2016 Mail-in Ballots Rejected 13,677
2016 Mail-in Rejection Rate 6.42%

23. In 2020, many more mail-in absentee ballots were tabulated in Georgia, while the rejection rate dropped to less than 0.37%.

2020 Mail-in Absentee Ballots
2020 Mail in Volume 1,316,943
2020 Mail in Ballots Rejected 4,786
2020 Mail in Rejection Rate 0.3634%

24. There were 1,316,943 absentee mail-in ballots submitted in Georgia in 2020. The Biden and Trump combined absentees mail-in ballots equaled 1.300.886. There were 4,786 absentee ballots rejected in 2020. This is a rejection rate of 0.3634% out of all the absentee mail-in ballots tabulated. This is much smaller than the number of absentee ballots rejected in 2016, when 13,677 absentee mail-in ballots were rejected out of 213,033 submitted. The 2016 rejection rate was 6.42%, which is more than seventeen times greater than 2020. This decrease in rejection rates is very unexpected, since there was more than a six-fold increase in absentee ballot use.

Again, all this is arguing is that 2020 was a different year than 2016. Which we already knew. In particular absentee voting looked very different in 2020 than in 2016. There were many public campaigns designed to educate voters on how to to vote absentee. There were changes to the laws to make it easier to vote absentee. And the people voting absentee were doing so much more deliberately than in 2016.

The fact that there were fewer rejections in 2020 is not the result of some fiendish plot to count votes that should have been thrown out due to technicalities. It is the intended result of efforts across the state to reduce the number of voters disenfranchised by errors.

In Conclusion

Charles J. Cicchetti is a smart man. He has a PhD in economics. I’m certain he knows this analysis is false. Yet he signed his name to it and helped convince many people their democracy was being stolen from them. The most charitable explanation here is that he is honestly ignorant of how voting works in the United States or how political landscape has changed over the past four years. But then he has no reason to be signing an affidavit that is dependent not only on the knowledge of statistics but on voting procedures and patterns.

Most people don’t have the knowledge to understand statistics on their own. They are dependent on others to do that work and interpret the results for them. That’s perfectly fine. We live in a specialized world and we are dependent on experts to help us. But that is why it is critically important that people who are trusted to provide expert guidance be fully honest with those relying on them. Wednesday is an example of what happens when they fail.

Leave a Reply

Your email address will not be published. Required fields are marked *