05 November 2020 / Data Science

Building Covid-19 Dashboard with Python and Tableau

Minh-Anh Nguyen
Machine Learning Instructor / Cat Addict

Introduction

Months have passed since Covid-19 first made the headlines of major publishers. This deadly virus has only grown since, and caused tremendous damage to global health and economy. Attempts across the globe are being made to bring a stop to this pandemic and recover from its impacts. Being a tech-savvy and enthusiast in data science, I also want to join these efforts by creating an interactive dashboard to visualize the Covid-19 situation in the world.

In this article, I want to share my experiences in making this project. I will discuss the steps in preprocessing data with Python, creating the Tableau dashboard as well as share the codes/workbook so that you can also create one on your own!

The dataset that I used in this project is the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (Source: Github). This is probably one of the most comprehensive datasets publicly available about the Novel Coronavirus. It reports the number of confirmed, deaths and recoveries cases in 188 countries/regions around the world.

Data Preprocessing

Note: This section assumes you have some basic understandings of Python and Pandas. If you are not familiar with these, check out my previous article about Data Manipulation with Pandas to get started!

Transforming data from wide format to long format

The CSSE Covid-19 dataset consists of three tables about daily confirmed, deaths and recoveries cases per country/region. Each table presents the data in wide (crosstab) format, with each day in a column. This format is very difficult to work with in Tableau so the first major preprocessing step is to pivot the data in these columns into rows (Long format).

Note that this operation can be done in Tableau with the Pivot function. However, Python is often my go-to choice for most preprocessing tasks because of its flexibility and speed when working large datasets.

Transforming from wide to long format can be done quite simple with Pandas function melt

confirmed = pd.melt(confirmed, id_vars=confirmed.columns[:4],
value_vars = confirmed.columns[4:],
var_name = 'date',
value_name = 'confirmed')

Combining Tables

For more convenient analysis, the next step is to combine confirmed, deaths and recoveries tables into a single one. One problem emerges, however, with the data of Canada: The confirmed & deaths tables show Canada data by Province/State, while the recoveries table only displays the total number of cases in the whole country. This conflict will need to be addressed first before combining the tables together as unmatched values will be omitted.

The approach that I took was to aggregate the confirmed and deaths data of Canada to country-wise to match with recoveries table.

# Summarize confirmed and deaths data by date
confirmed_canada = confirmed[confirmed['Country/Region'] == 'Canada'].groupby('date').sum()[['confirmed']]
deaths_canada = deaths[deaths['Country/Region'] == 'Canada'].groupby('date').sum()[['deaths']]
# Extract columns from recoveries table
recoveries_canada = recoveries[recoveries['Country/Region'] == 'Canada']
canada_template = recoveries_canada[recoveries_canada.columns[:-1]].reset_index(drop=True)
# Join aggrregated confirmed and deaths data with extracted columns
confirmed_canada = canada_template.merge(confirmed_canada, how='inner', left_on='date', right_index=True)
deaths_canada = canada_template.merge(deaths_canada, how='inner', left_on='date', right_index=True)
# Add the agrregated data for Canada back to confirmed and deaths table
confirmed = confirmed[confirmed['Country/Region'] != 'Canada'].append(confirmed_canada)
deaths = deaths[deaths['Country/Region'] != 'Canada'].append(deaths_canada)

Now the three tables can be combined using Pandas function merge .

data = confirmed.merge(deaths, how='inner', on=['Country/Region','Province/State','date'])
data = data.merge(recoveries, how='inner', on=['Country/Region','Province/State','date'])

Note: Similar to the previous step, Tableau can also handle data joining in its Data Source section.

Population Data

One metric that I wanted to use is the infection rate: confirmed / population. Countries’ population is, however, not available in the CSSE’s Covid-19 dataset so we will need to combine with another source.

One common problem when combining different sources of data is unmatched value names, and in our case: country names. These values can be identified simply by comparing the Country columns of the datasets.

for c in data['Country/Region'].unique():
if c not in population['Country (or dependency)'].unique():
print(c)

There are 13 unmatched countries names. Unfortunately, there is no quick way to replace them, so let’s just use brute force!

country_mapper = {
'Congo (Brazzaville)': 'Congo',
'Congo (Kinshasa)': 'Congo',
"Cote d'Ivoire": "Côte d'Ivoire",
'Czechia': 'Czech Republic (Czechia)',
'Korea, South': 'South Korea',
'Saint Vincent and the Grenadines': 'St. Vincent & Grenadines',
'Taiwan*': 'Taiwan',
'US': 'United States',
'West Bank and Gaza': 'Israel',
'Saint Kitts and Nevis': 'Saint Kitts & Nevis',
'Burma': 'Myanmar',
'Sao Tome and Principe': 'Sao Tome & Principe'
}
data['Country/Region'] = data['Country/Region'].replace(country_mapper)


Making Tableau Dashboard

Now that our data is cleaned, let’s build a dashboard!

Tableau is my tool of choice, mainly because it can make a wide variety of charts and is highly customizable while still relatively fast and convenient.

The complete Covid-19 dashboard has plenty of components. Instead of discussing each of them, I will highlight the more important or complex charts/features. You can also find my Tableau workbook at the end of this article to see how each component is made.

Note: If you are new to Tableau, I would highly recommend the following articles:

Tableau Tutorial — The Basics

Data Visualisation with Tableau

Filtering Latest Date

Our data is a Time Series, with each entry showing cumulative cases at a particular date in a location. For such reason, simply aggregating the data would be meaningless. For example, there are 2 cases in Day A, 3 cases on Day B. The sum of the two days (5 cases) does not tell any meaningful information.

Instead, we might have a question like: What is the number of confirmed cases today or at any given day in the past? A solution is to create a calculated field that returns the latest date in the selected date range, then use that field as a filter for every chart in the dashboard.

Note: Simply using the existing date column in the dataset would not be sufficient as it will always find the latest day in the whole dataset, regardless of what date range is being selected. This is due to the Order of Operation in Tableau. Read more about this concept here.

Daily New Cases

One might be interested in knowing the number of daily new cases to see which period of time is more critical in a country. This can be done by applying the Difference Tableau Calculation.

Tooltips

This is probably one of my favorite features in Tableau. Tooltips allow us to add more levels of information to a chart without taking up any additional space! Tooltips in Tableau can contain all sorts of things, from simple texts, numbers to a whole chart!

Simply create a chart that you want to use as tooltip then add it to another chart through the Tooltip section in the Mark card.

Actions

Another key feature of this dashboard is that most elements in the dashboard are interactable i.e. users can select any date or country to apply the relevant filter to the whole dashboard.

These behaviours are can be added and customized by the Actions feature (Worksheet > Actions). Here you can add new actions or edit existing ones in the worksheet. The below example is an action that applies a country filter to the Target Sheets when a country is selected on the Source Sheets.

Conclusion

You can find the complete dashboard HERE. If you are interested in recreating the dashboard on your own, you can find the data, preprocessing codes as well as the Tableau workbook in this Google Drive folder.

If you have any question, let me know in the comments below! Stay safe everyone!

,