Join Andrew Collier and Hanjo Odendaal for a workshop on using R for Web Scraping.
Who should attend?
This workshop is aimed at beginner and intermediate R users who want to learn more about using R for data acquisition and management, with a specific focus on web scraping.
What will you learn?
You will learn:
- data manipulation with
dplyr
,tidyr
andpurrr
; - tools for accessing the DOM;
- scraping static sites with
rvest
; - scraping dynamic sites with
RSelenium
; and - setting up an automated scraper in the cloud.
See programme below for further details.
Where | - | Rise, Floor 5, Woodstock Exchange, 66 Albert Road, Woodstock, Cape Town |
---|---|---|
When | - | 14-15 June 2018 |
Who | - |
Andrew Collier Hanjo Odendaal |
A 10% discount is available for groups of 4 or more people from a single organisation attending both days.
Email training@exegetic.biz if you have any questions about the workshop.
Programme
Day 1
- Motivating Example
- R and the tidyverse
- Vectors, Lists and Data Frames
- Loading data from a file
- Manipulating Data Frames with
dplyr
- Pivoting with
tidyr
- Functional programming with
purrr
- Introduction to scraping
- Ethics
- DOM
- Developer Tools
- CSS and XPath
robots.txt
and site map
- Scraping a static site with
rvest
- What happens under the hood
- What the hell is
curl
? - Assisted Assignment: Movie information from IMDb
Day 2
- Case Study: Investigating drug tests using
rvest
- Interacting with APIs
- Using XHR to find an API
- Building wrappers around APIs
- Scraping a dynamic site with
RSelenium
- Why
RSelenium
is needed - Navigation around web-pages
- Combining
RSelenium
withrvest
- Useful JavaScript tools
- Case Study
- Why
- Deploying a Scraper in the Cloud
- Launching and connecting to an EC2 instance
- Headless browsers
- Automation with
cron