Geopolitics Review — 2nd September 2024

Geopolitics Explained

Published in

Areas & Producers

9 min readSep 2, 2024

Hypothesis Testing Trends In US Elections

Bitesize Edition

Last week, I provided a large amount of data on the 2016 and 2020 elections. Under this data, there are many trends in voting behaviour that we can explore. I’m going to do that today through a process called hypothesis testing.
I’ll explore the behavior of female voters in 2016 and 2020. Donald Trump has a history of misogyny, and so do female voters who prefer to vote for any other candidate but him. Also, did Clinton or Biden gain a significant amount of support from female voters?
In a topic with more data availability, how does education level align with voting patterns? My hypothesis is that a higher education of a voter implies this voter has a higher likelihood of voting Democrat. Does the data support this?

Introduction

It’s clear from many subsets of the data that we explored last week, that there exists some evidence for trends underlying the data. This week, I’m going to explore two of those trends via a process called hypothesis testing. I’ll explore whether women are more likely to vote Democrat rather than Republican, and if higher-education voters show greater support for Democratic candidates versus Republican candidates.

Hypothesis Testing of Female Voting

A hypothesis is a predicted answer to a research question based on existing knowledge. Importantly, this hypothesis can be tested based on sample data, which I will do today based on the data from last week’s Geopolitics Review.The process is as follows:

Define the hypothesis/prediction
State the error percentage under which we reject our prediction. This is known as the significance level.
Collect data.
Analyse the data via a suitable test. Today, I’ll be using the chi-squared test.
Calculate your degrees of freedom. I’ll explain how we do this below but it is based on the size of our dataset.
From this, we can calculate our p-values. We want a p-value lower than the significance level because this means there is evidence to support our prediction.
Interpret these results.

Let’s start with an example.

Hypothesis: Women are more likely to vote Democrat than Republican.

Null Hypothesis H(0) = Women voters show no preference towards Democrats over Republicans.
Alternate Hypothesis H(1) = Women are significantly more likely to vote Democrat than Republican.

The null hypothesis H(0) usually states a level of equality. In this case, that equality refers to no preference between female voters for Democrats and Republicans. The alternate hypothesis H(1) is the opposite of this, with women being more likely to vote Democrat than Republican. The hypotheses are opposites so only one hypothesis can be true at a time.

Significance Level = 5%. This is the probability that we make an error when failing to reject or rejecting the null hypothesis, H(0).

Note, that we don’t accept H(0), we fail to reject. This is because our hypothesis test doesn’t imply complete truth. It states that the data supports the prediction, hence we fail to reject it.

We will later use our significance level of 5% to reject or fail to reject our H_{0}.

Type 1 and Type 2 Errors

A Type 1 error occurs when we reject H(0) but the data supports the hypothesis. In this case, we would reject that women have no preference towards Democrats or Republicans, but the data would actually support that women show no preference towards Democrats or Republicans.

A Type 2 error occurs when we fail to reject H(0), but it’s a false prediction. In this case, we would fail to reject that women have no preference to vote Democrat than Republican, even though the data supports that women are significantly more likely to vote Democrat than Republican.

It’s important we’re aware of these errors so we stand a better chance of recognising them when they occur.With these potential errors in mind, let’s continue with the process.

In our hypothesis test, we will be using the Chi-squared Test. The specific tests used are based on the nature of the data, the sample size, and the characteristics of the test being performed. Other tests include the student t-distribution or the z-test.The Chi-squared test may only be carried out on actual numbers, and hence we will assume a sample of 100 voters and use the percentages in the datasets above to attribute to a number of voters for Democrats and Republicans.The Chi-Squared Formula is as follows:

O = Observed Frequency (Our Dataset)

E = Expected Frequency (Calculated)Woman Votes Democrat In 2016: (93*109)/192 = 52.80Woman Votes Republican In 2016: (93*83)/192 = 40.20Woman Votes Democrat In 2020: (99*109)/192 = 56.20Woman Votes Republican In 2020: (99*83)/192 = 42.80

We hence failed to reject H(0) and so believe there is no preference of female voters to either Democrats or Republicans. We made these decisions using the significance level of 5% we chose earlier.A P-value of less than the significance level of 5% would have shown there is sufficient evidence to reject our null hypothesis. It’s worth noting that if we have more data, we can have more confidence in our results. If we believe in our prediction, more data could also help to confirm our prediction. I’ll now explore a hypothesis involving voter tendencies based on their education. In this test, we have more data.

Hypothesis Testing of Voter Education

Hypothesis: More educated means more Democrat-aligned.

H(0) = Higher education does not affect the likelihood of voting for the Democratic candidate.
H(1) = Higher education increases the likelihood of voting for the Democratic candidate.
Significance Level = 5%

For the observed frequencies, we take the percentage figures as raw numbers, assuming a sample size of 100 voters. This is exploring the 2016 election only.

To find the expected frequencies, we use the proportion of Democratic and Republican voters across all education levels.For Expected Frequencies:

High School or Lower Democrats: (95*162)/283 = 54.38
High School or Lower Republicans: (95*121)/283 = 40.62
Bachelor’s Degree Democrats: (93*162)/283 = 53.24
Bachelor’s Degree Republicans: (93*121)/283 = 39.76
Postgraduate Democrats: (95*162)/283 = 54.38
Postgraduate Republicans: (95*121)/283 = 40.62

We then calculate the degrees of freedom:

Degrees of Freedom = (3–1)*(2–1) = 2. This is calculated by taking one away from the number of columns and rows in your dataset. We have three rows for high school or lower educated, Bachelor’s Degree, and Postgraduate Degree. We have two columns for Democrat and Republican.

From this, we use our formula to calculate the Chi-squared statistic.Using the degrees of freedom of 2, we then calculate the p-value from the chi-square distribution. Typically, Excel, R, or Python can be used to calculate these values. Our achieved p-value here is 0.0052. This is below the 5% significance level, so we reject the null hypothesis and we believe there is significant evidence that higher-educated voters are more inclined to vote Democratic candidates.

For the 2020 election, and a combined dataset of the 2016 and 2020 elections, I underwent the same process as above and achieved the following results.For the 2020 Election:

For Expected Frequencies:

High School or Lower Democrats: (97*164)/294 = 54.11
High School or Lower Republicans: (97*130)/294 = 42.89
Bachelor’s Degree Democrats: (98*164)/294 = 54.67
Bachelor’s Degree Republicans: (98*130)/294 = 43.33
Postgraduate Democrats: (99*164)/294 = 55.22
Postgraduate Republicans: (99*130)/294 = 43.78
Chi-Squared Test Statistic (2020) = 12.94
P-value = 0.0015

With this p-value being below the 5% significance level, we reject H(0) in favour of H(1). We again see evidence that highly educated voters are more likely to vote for Democratic candidates.

For A Combined Dataset of the 2016 and 2020 Elections:

For Expected Frequencies:

High School or Lower Democrats: (192*326)/577 = 108.48
High School or Lower Republicans: (192*251)/577 = 83.52
Bachelor’s Degree Democrats: (191*326)/577 = 107.91
Bachelor’s Degree Republicans: (191*251)/577 = 83.09
Postgraduate Democrats: (194*326)/577 = 109.61
Postgraduate Republicans: (194*251)/577 = 84.39
Chi-Squared Test Statistic (Combined 2016 and 2020) = 23.16
P-value = Less than 0.0001

All the p-values are below the 5% threshold, so we reject the null hypothesis H(0) in favour of the alternate hypothesis H(1). There is significant evidence in 2016, 2020, and the combined dataset that higher education is associated with voting for Democratic candidates.

Key Questions For 2024

These are the questions I want to know when it comes to these demographic trends and the 2024 election. Hypothesis testing can give us an idea of where previous election data can be used to support these questions and predictions. But the true answer won’t be confirmed until after November has unfolded.If you want some insight into these questions, you can collect your own data and go through the process we went through above with hypothesis testing. If not, we’ll return to these after November and see if our hypothesis tests stand up. For now, consider these potential trends and questions below. If you have any thoughts on any, let me know!

Will women, who generally vote Democrat more than Republican, see a larger proportion vote for Harris because she herself is female? Or will a potential trend of women becoming more open to voting for Trump continue?
Will male voters support Trump because a male Democrat won’t pull them away as Biden potentially did? Or will the selection of Tim Walz as Harris’ VP maintain a similar proportion of male voters for the Democrats?
Will independents vote for Harris at similar levels to Biden or Clinton?
Will those voters of higher education once again increase the total proportion of votes received by Trump? Will this albeit increased proportion for Trump still be dramatically lower than educated voters who vote for Democrat candidates?

Will urbanisation in the United States see a prolonged period of Democrat leadership in the United States, or does urbanisation do little to sway elections?
How will religious beliefs play into this election, and elections in the long-term? Are people becoming less religious in the United States, and as a result will Democrats gain vote proportion in the long-term?
Will 18–29 voters still largely align with Democrats?
Will the White population continue to lean towards Republicans?
Will Harris gain the votes of the Black, African-American, and multiracial communities more than Clinton or Biden did previously?
Will the proportion of Hispanic voters selecting Trump continue to grow?
Will Harris be a more popular candidate than 2020 Biden? Will Harris be more popular than 2016 Clinton?

Concluding Remarks

In analysing demographics, we can see this election in the United States on a micro-level. Next week, we’ll zoom out and cover the macro with a state-level analysis of the upcoming election.

Thanks for reading! I’d greatly appreciate it if you were to like or share this post with others! If you want more then subscribe on Substack for these posts directly to your email inbox. I research history, geopolitics, and financial markets to understand the world and the people around us. If any of my work helps you be more prepared and ease your mind, that’s great. If you like what you read please share with others.