20+ Politics Related Data Sets In A Single Line of Code
Dec 11, 2022TLDR: here are 20+ interesting and easily loadable politics-related datasets for you to begin exploring right away.
Introduction
One of the biggest - and often overlooked - challenges that come with working with politics-related data analysis is coming across trustworthy and unbiased data sources. Spoiler alert: This article doesn't necessarily solve that problem. But, it can at least maybe get you pointed in the right direction.
After all, as with anything in politics, people often try and twist opposing points of view to their personal interests, so we'd better be cautious with the data we use.
So with that in mind, in this article, I am going to present you with a few sources for finding politics-related datasets.
What's more, I also took into account the ease of loading the data sets. I most cases you shouldn't need more than one line of code to both download and load them.
At the very bottom of this article, you will find a few caveats and limitations brought by this single-line constraint.
Standard Data Science Python Imports
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import requests
1 - Supreme Court Database
The Supreme Court Database is run by the Washington University of Saint Louis, and it proclaims itself to be the "definitive source for researchers, students, journalists, and citizens interested in the U.S. Supreme Court."
There are a few things that make this data one of the most interesting political datasets out there.
First, it has in its records all the cases ever judged by the Supreme Court, even those dating back to the late 1700s.
Secondly, it keeps track of almost multiple possible nuances each case, thus allowing us to analyze even the smallest minutia of cases and capture insights that would otherwise possibly easily be overlooked. There are 55 columns of data in this one.
Below, we will load the data from the famous Brown v. Board of Education of Topeka from the 1950s. For context, this is the famous case in which the Supreme Court ruled against racial segregation in public schools. This case became one of the most decisive moments for the civil rights movement.
Important note: If you want, you can load any supreme court case simply by inserting its citation inside the .loc method, or you can load the entire database by removing the .loc method.
US Supreme Court Case by Case Data
# Originally sourced from http://supremecourtdatabase.org/data.php
pd.read_csv('https://corgis-edu.github.io/'\
+'corgis/datasets/csv/supreme_court/'\
+'supreme_court.csv').set_index(
'citation.us').loc[
['347 U.S. 483']]
2-8 FiveThirtyEight
The news website Five Thirty-Eight - in a allusion to the total number of electors in the US electoral college - focuses its work mostly on opinion polls about politics and sports. They frequently run opinion polls about key important topics for the nation, and on top of that, they also publicly share the data they gathered in order to allow people to explore the data on their own.
So with that in mind, let's load some of their most interesting political datasets: US House Of Representatives Polls of the current election cycle.
pd.read_csv('https://projects.fivethirtyeight.com/'\
+'polls/data/house_polls.csv')
For the result (excerpted):
United States Hate Crimes
There are
pd.read_csv('https://raw.githubusercontent.com/'\
+'fivethirtyeight/data/master/'\
+'hate-crimes/hate_crimes.csv').head(
).transpose()
For the result (excerpted and transposed to show all columns):
United States 2020 Election Campaign Endorsements
pd.read_csv('https://projects.fivethirtyeight.com/'\
+'endorsements-2020-data/'\
+'endorsements-2020.csv')
For the result (excerpted):
Special Prosecutor Investigations Since 1973
pd.read_csv('https://raw.githubusercontent.com/'\
+'fivethirtyeight/data/master/'\
+'russia-investigation/'\
+'russia-investigation.csv')
For the result (excerpted):
State Legislative District (SLD) Statistics
pd.read_csv('https://github.com/'\
+'fivethirtyeight/data/raw/master/'\
+'redistricting-2022-state-legislatures/'\
+'fivethirtyeight_state_legislative_'\
+'district_analysis.csv')
For the result (excerpted):
9-14 DataUSA
DATA USA's mission is to aggregate and make publicly available data for key issues regarding US politics, health, the economy, and more. What's interesting about this platform is that it also allows you to create in-browser data visualizations.
So if you are feeling in a hurry and don't have time to spin up an IDE, you don't have to go through the trouble of downloading their data and analyzing it on your own, you can easily do it all in-browser and share your work with others.
On the other hand, if you do want to programmatically download their data, you will first have to use the requests.get method (in order to download the webpage content) and only then you will be able to load the data into a pandas DataFrame.
Below, I will present you with a few of their datasets and show you how to download them with a single line of code.
Electoral College Votes Since 1992
pd.DataFrame(requests.get('https://api-ts-vibranium.datausa.io/'\
+'tesseract/data?'\
+'cube=Data_USA_Electoral_College_'\
+'president&drilldowns=State,Party,Year'\
+'&measures=Electoral+College+Votes').json()['data'])
For the result (excerpted):
Presidential Popular Vote Since 1976
pd.DataFrame(requests.get('https://api-ts-vibranium.datausa.io/'\
+'tesseract/data?Nation=01000US&cube=Data_USA_'\
+'President_election&debug=false&drilldowns=Year,'\
+'Candidate,Party,Nation&measures=Candidate+Votes,'\
+'Total+Votes&parents=false&sparse=false').json()['data'])
For the result (excerpted):
Household Income Statistics
pd.DataFrame(requests.get('https://datausa.io/api/'\
+'data?measure=Household%20Income,'\
+'Household%20Income%20'\
+'Moe&geo=01000US,01000US&drilldowns='\
+'Household%20Income%20Bucket').json()['data'])
For the result (excerpted):
Real Estate Taxes
pd.DataFrame(requests.get('https://datausa.io/api/'\
+'data?measure=Real%20Estate%20Taxes%20by%20Mortgage,'\
+'Real%20Estate%20Taxes%20by%20Mortgage%20Moe&geo=01000US,'\
+'01000US&drilldowns=Real%20Estate%20Taxes%20Paid').json(
)['data'])
For the result (excerpted):
Household Income By Race
pd.DataFrame(requests.get('https://datausa.io/'\
+'api/data?measure=Household%20Income%20by%20Race,'\
+'Household%20Income%20by%20Race%20Moe&Geography='\
+'01000US,01000US:similar').json()['data'])
For the result (excerpted):
Workforce Status + Occupations By Race
pd.DataFrame(requests.get('https://datausa.io/api/'\
+'data?Geography=01000US&measure=Average%20Wage,'\
+'Average%20Wage%20Appx%20MOE,Total%20Population,'\
+'Total%20Population%20MOE%20Appx,'\
+'Record%20Count&drilldowns=Race&Workforce%20'\
+'Status=true&Detailed%20Occupation='\
+'1191XX,533030,252020,291141,412010&'\
+'Record%20Count>=5').json()['data'])
For the result (excerpted):
15-17 Washington Post
Founded in 1877, the Washington Post remains one of the most famous and widely read newspaper publications (around the world). The post boasts more than 10,000 staff writers, +60 million monthly readers, and more than $300 Million in yearly revenue.
Luckily for us, they (sometimes) make the data behind their compelling stories publicly available for the sake of transparency.
So with that in mind, let's load some of their most thought-provoking datasets in a single line of code.
Fatal Police Shootings In the US
pd.read_csv('https://raw.githubusercontent.com/'\
+'washingtonpost/data-police-shootings/'\
+'master/fatal-police-shootings-data.csv')
For the result (excerpted):
Homicide Victims
pd.read_csv('https://github.com/washingtonpost/'\
+'data-homicides/raw/master/'\
+'homicide-data.csv',
encoding='latin-1')
For the result (excerpted):
School Shootings
pd.read_csv('https://raw.githubusercontent.com/'\
+'washingtonpost/'\
+'data-school-shootings/'\
+'master/school-shootings-data.csv',
encoding='latin-1')
For the result (excerpted):
18-21 Federal Reserve bank of Saint Louis
The Federal Reserve Bank of Saint Louis is one of the most important institutions within the US central banking system. And as such, they are responsible for key economic decisions that affect not only the US, but also the entire world.
And as any serious decision maker ought to be, they are data-driven in their approach, using hundreds of economic indicators and algorithms to arrive at (hopefully) the best outcome for a particular problem.
On top of that, they also aggregate and make economic data publicly available for government officials, business leaders, and everyday people to analyze and come to their own conclusions on key matters for the economy.
Below, let's load some of their publicly available data in a single line of code. Keep in mind that I've had to use lots of line continuations and string concatenations in other to make the code easily readable.
US Consumer Price Index (minus food and energy) by month.
pd.read_csv('https://fred.stlouisfed.org/graph/'\
+'fredgraph.csv?bgcolor=%23e1e9f0&chart_type='\
+'line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&'\
+'height=450&mode=fred&recession_bars='\
+'on&txtcolor=%23444444&ts=12&tts=12&width='\
+'1168&nt=0&thu=0&trc=0&show_legend=yes&'\
+'show_axis_titles=yes&show_tooltip=yes&id='\
+'CORESTICKM159SFRBATL&scale=left&cosd=1967-12-01'\
+'&coed=2022-09-01&line_color=%234572a7&link_values='\
+'false&line_style=solid&mark_type=none&'\
+'mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq='\
+'Monthly&fam=avg&fgst=lin&fgsnd=2020-02-01&'\
+'line_index=1&transformation=lin&vintage_date='\
+'2022-10-14&revision_date=2022-10-14&nd=1967-12-01')
For the result (excerpted and transposed to better show more columns):
Consumer Price Index for Urban Consumers
pd.read_csv('https://fred.stlouisfed.org/graph/'\
+'fredgraph.csv?bgcolor=%23e1e9f0&chart_type='\
+'line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&'\
+'height=450&mode=fred&recession_bars='\
+'on&txtcolor=%23444444&ts=12&tts=12&width='\
+'1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles'\
+'=yes&show_tooltip=yes&id=CPIAUCSL&scale=left&'\
+'cosd=1947-01-01&coed=2022-09-01&line_color=%234572a7'\
+'&link_values=false&line_style=solid&mark_type='\
+'none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq='\
+'Monthly&fam=avg&fgst=lin&fgsnd=2020-02-01&'\
+'line_index=1&transformation=lin&vintage_date=2022-10-14&'\
+'revision_date=2022-10-14&nd=1947-01-01')
For the result (excerpted and transposed to better show more columns):
Unemployment Rate Month by Month
pd.read_csv('https://fred.stlouisfed.org/graph/'\
+'fredgraph.csv?bgcolor=%23e1e9f0&'\
+'chart_type=line&drp=0&fo=open'\
+'%20sans&graph_bgcolor=%23ffffff&'\
+'height=450&mode=fred&recession_bars='\
+'on&txtcolor=%23444444&ts=12&'\
+'tts=12&width=1168&nt=0&thu=0&trc=0&'\
+'show_legend=yes&show_axis_titles='\
+'yes&show_tooltip=yes&id=UNRATE&'\
+'scale=left&cosd=1948-01-01&coed='\
+'2022-09-01&line_color=%234572a7&'\
+'link_values=false&line_style=solid'\
+'&mark_type=none&mw=3&lw=2&ost=-99999&'\
+'oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd='\
+'2020-02-01&line_index=1&transformation='\
+'lin&vintage_date=2022-10-14&revision_date='\
+'2022-10-14&nd=1948-01-01')
For the result (excerpted and transposed to better show more columns):
Limitations
This article has at least few limitations and caveats.
Style
The code examples in this article break many coding style conventions. Do not use this article as a model for style.
Line Continuation
This article often uses line continuation. In Python the backslash \ accomplishes line continuation. Line wrap (or line continuation), for purposes of this article, does not count as a second (or third... or more) line of code.
String Concatenation
In order to make the code look better and to avoid odd line-wrapping the article also uses string concatenation.
Indexing Lists Tables Lists
As you can read in the documentation for pd.read_html()this code returns a list of Pandas data frames. The list index in square brackets following pd.read_html()[i], here where i represents an index on the list is what finds the data frame of interest.
This article also uses requests.get().json() or requests.get().json()[data] to grab and identify json data from online. In a few examples above, the square bracket notation isolates the data of interest.
Now Offering Live Free Online Data Science Lessons.
Â