Making the Most of Mobility

I became aware of the Community Mobility Reports data courtesy of this tweet from Mike Schussler:

The data are available for download from the Google COVID-19 Community Mobility Reports. There’s a global CSV file (data for all regions) as well as individual CSV files (one for each region, packaged in a ZIP archive). There are also automated PDF reports for each region per day. For example, this is the report for South Africa on 28 March 2021.


The mobility data records how visits to various place categories have changed in frequency and duration. The place categories are:

  • Workplaces
  • Residential
  • Grocery & Pharmacy
  • Retail & Recreation
  • Parks and
  • Transit Stations.

The data for a specific day are relative to a baseline for that day of the week. The baseline is the median value for the corresponding day of the week during the 5 week period from 3 January 2020 to 6 February 2020. Theer is generally a delay of 2 to 3 days for new data, this being the time required to process and validate the data.

The data are gathered from users who opted in to Google’s Location History. As such it is only a sample of the wider population. It may or may not be representative and might be biased.

Load the Data

I’m focusing on the data for South Africa (ZA) and using both years of available data.

YEARS <- c(2020, 2021)

Use purrr::map_df() to iterate over each year, load the corresponding CSV and concatenate into a single data frame. A few of the columns are empty, so apply janitor::remove_empty() to drop those.

mobility <- map_df(
  function(year) {
    filename <- glue("{year}_{COUNTRY}_Region_Mobility_Report.csv" )
    filepath <- file.path(FOLDER_REGIONAL_CSV, filename)
) %>%
  # Remove any empty columns.
  remove_empty(which = "cols") %>%
  # Remove columns with just one value.
  remove_constant() %>%
  # Rename specific columns.
    region = sub_region_1,
    region_iso = iso_3166_2_code
  ) %>%
  # Strip "_percent_change_from_baseline" from column names.
    ~ str_replace(.x, "_percent_change_from_baseline$", "")

How much data?

[1] 4080   10

There are 4080 and 10 columns per record. The data span the period from 15 February 2020 to 28 March 2021.

What are the (revised) column names?

 [1] "region"                "region_iso"            "place_id"             
 [4] "date"                  "retail_and_recreation" "grocery_and_pharmacy" 
 [7] "parks"                 "transit_stations"      "workplaces"           
[10] "residential"          

What are the unique place identifiers?

mobility %>% select(region, region_iso, place_id) %>% unique()
# A tibble: 10 × 3
   region        region_iso place_id                   
   <chr>         <chr>      <chr>                      
 1 <NA>          <NA>       ChIJURLu2YmmNBwRoOikHwxjXeg
 2 Eastern Cape  ZA-EC      ChIJu5znKjRWYh4RkqxyqdKUajo
 3 Free State    ZA-FS      ChIJGRTWM2HFjx4RRwqiTVWK9e0
 4 Gauteng       ZA-GT      ChIJn3cRVJUSlR4R4jhUy8fnnm0
 5 KwaZulu-Natal ZA-NL      ChIJVQ7iWQ4Q8R4Rjdnka6d4YYI
 6 Limpopo       ZA-LP      ChIJwTDNNhTJxh4RStzIZh49iWI
 7 Mpumalanga    ZA-MP      ChIJPSAvTvpg6h4RhGvk9A3foGQ
 8 North West    ZA-NW      ChIJ612A6EIKmB4R_5BkMf6qLUc
 9 Northern Cape ZA-NC      ChIJbUtwf_UhJBwRkEyPkNb4AAM
10 Western Cape  ZA-WC      ChIJ841peohdzB0Ri6I2IY95juk

So there’s data for each province as well as for the country as a whole. We’ll confine our attention to the country as a whole.

Visualise the Data

We’ve got daily observations of various mobility metrics for each of the provinces as well as the country as a whole. Making sense of this is going to require pictures!

Work & Home

Below are the two plots of the mobility percentage for workplaces and residential areas. Superimposed are solid vertical lines that indicate the onset of each lockdown level, starting with Level 5 (L5) on 27 March 2020. On that date there was a precipitous drop in the number of people moving to their workplaces and a simultaneous increase the people staying at home. Vertical dashed lines indicates public holidays, which also appear to have a significant effect on mobility.

There’s a clear weekly variation in these data, indicating that, despite the lockdown, people’s behaviour is different on weekends and during the week.

Shopping & Recreation

What about shopping and recreation habits?

The data indicate that following the initial lockdown there was a substantial reduction in visits to supermarkets and pharmacies, but that this has largely recovered.

Including restaurants, cafés, shopping centres, libraries, cinemas and other recreational venues paints a different picture. The impact on the entertainment industry due to restrictions on the sale of alcohol has no doubt played a role in this.

Interestingly it seems that public holidays do not have a major effect on people going shopping or hitting recreational venues.

Out & About

Public spaces, like beaches, parks and gardens, have also been impacted. There’s some interesting variation here which I don’t fully understand right now. For instance, why was there less use of public spaces following the transition to Level 1 in September 2020?

Public holidays cause major spikes in the use of public spaces, at least under lockdown levels 3, 2 and 1.

Finally, transport hubs, which includes train and bus stations as well as airports, were practically empty following the initial lockdown. However, they gradually became more busy during the course of 2020. Travel activity dropped again after Christmas 2020.

This is a very rich data set with lots of opportunities for interesting analyses. Time permitting I’ll be back to look at it again.