Workshop: Web Scraping with R

Join Andrew Collier and Hanjo Odendaal for a workshop on using R for Web Scraping.

Who should attend?

This workshop is aimed at beginner and intermediate R users who want to learn more about using R for data acquisition and management, with a specific focus on web scraping.

What will you learn?

You will learn:

data manipulation with dplyr, tidyr and purrr;
tools for accessing the DOM;
scraping static sites with rvest;
scraping dynamic sites with RSelenium; and
setting up an automated scraper in the cloud.

See programme below for further details.

Where	-	Rise, Floor 5, Woodstock Exchange, 66 Albert Road, Woodstock, Cape Town
When	-	14-15 June 2018
Who	-	Andrew Collier Hanjo Odendaal

A 10% discount is available for groups of 4 or more people from a single organisation attending both days.

Email training@exegetic.biz if you have any questions about the workshop.

Programme

Day 1

Motivating Example
R and the tidyverse
- Vectors, Lists and Data Frames
- Loading data from a file
- Manipulating Data Frames with dplyr
- Pivoting with tidyr
- Functional programming with purrr
Introduction to scraping
- Ethics
- DOM
- Developer Tools
- CSS and XPath
- robots.txt and site map
Scraping a static site with rvest
- What happens under the hood
- What the hell is curl?
- Assisted Assignment: Movie information from IMDb

Day 2

Case Study: Investigating drug tests using rvest
Interacting with APIs
- Using XHR to find an API
- Building wrappers around APIs
Scraping a dynamic site with RSelenium
- Why RSelenium is needed
- Navigation around web-pages
- Combining RSelenium with rvest
- Useful JavaScript tools
- Case Study
Deploying a Scraper in the Cloud
- Launching and connecting to an EC2 instance
- Headless browsers
- Automation with cron