Using {pagedown} in Docker

I’m building an automated reporting system which generates PDF reports. My approach is to use R Markdown to write the report and render to PDF using the excellent {pagedown}
package.
Link to CV.
I’m building an automated reporting system which generates PDF reports. My approach is to use R Markdown to write the report and render to PDF using the excellent {pagedown}
package.
An Application Load Balancer receives requests and distributes them across a selection of processing resources. These processing resources are divided into Target Groups (see previous post for how to set one up).
We’re setting up a Flask API which is deployed as a Docker image and running on ECS. We’re going to create a load balancer which will accept requests on port 80 and route them to port 5000 on the API container.
Read More →If we want to have an ECS service which is visible to the public, then we need to set up an Application Load Balancer. There are a couple of steps to this process, the first of which is creating a Target Group.
Read More →In this post we’ll look at setting up an ECS Service. A service is just a persistent set of tasks running on a cluster.
Read More →A security group is a collection of rules which control the traffic into and out of an AWS resource.
Read More →We saw in a previous post that it’s important to ensure that the Selenium container is running and accepting requests before the crawler actually gets started. This is because the crawler depends on Selenium being available. We can use ECS task dependencies to assert this dependency.
Read More →Can we create a health check that will check if the Selenium service is available? Yes! We will need to do two things:
HEALTHY
andLet’s do it!
Read More →In the previous post we saw how to deploy a simple Selenium crawler on ECS. However, the Docker image for the crawler was stored in a Docker Hub repository. Now we’re going to see how to use the AWS Elastic Container Registry (ECR) instead.
Read More →In the last few posts we’ve looked at a few ways to set up the infrastructure for a Selenium crawler using Docker to run both the crawler and Selenium. In this post we’ll launch this setup in the cloud using AWS Elastic Container Service (ECS).
Read More →In the last few posts we’ve looked at a few ways to set up the infrastructure for a Selenium crawler using Docker to run both the crawler and Selenium. In this post we’ll launch this setup in the cloud using AWS Elastic Container Service (ECS).
Read More →In two previous posts we’ve looked at how to set up a simple scraper which uses Selenium in Docker, communicating via the host network and bridge network. Both of those setups have involved launching separate containers for the scraper and Selenium. In this post we’ll see how to wrap everything up in a single entity using Docker Compose.
Read More →In the previous post we set up a scraper template which used Selenium on Docker via the host network. Now we’re going to do essentially the same thing but using a bridge network.
Read More →This post will show you how to set up the following:
Both of these will run in Docker containers and will communicate over the host network.
Read More →I’ve taken another look at the {hagr}
data, which I wrote about previously. This time I’m focusing on the hierarchy of creatures.
The Linnaean Taxonomy is a hierarchical classification system for organisms devised by Carl Linnaeus. An organism is assigned to the following levels in the hierarchy (in increasing order or granularity):
The relative level of a group of organisms in this hierarchy determines its taxonomic rank.
Read More →I came across the Human Ageing Genomic Resources. They are doing some fascinating work and expose some engrossing data. I wanted to make the data easier for me to work with, and an R package seemed to be the natural vehicle to do this.
For more information on these data, take a look at this article: Tacutu, Craig, Budovsky, Wuttke, Lehmann, Taranukha, Costa, Fraifeld and de Magalhaes, “Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing,” Nucleic Acids Research 41(D1):D1027-D1033, 2013.
Read More →Has the price of Easter Eggs shot up since last year? Let’s use data from Trundler to investigate. I’ll do the analysis in R using the {trundler}
package.
The Google Mobility Data (or Community Mobility Reports) refers to the datasets provided by Google which track how people move and congregate in various locations during specific time periods. The data is based on anonymised location information from users who have opted into Location History on their Google accounts.
Read More →Fathom Data is working on a project to reproduce the figures from the CORE textbook The Economy using R and {ggplot2}
. There’s a strict style guide which specifies the figure aesthetics including colours and font. We’re a team of seven people working on as many different setups. The principle challenges have been package versions and fonts.
I’ve been following an excellent tutorialfor deploying a Docker image on an EC2 instance via GitLab CI/CD. It covers every step in the process in great detail. If you follow the steps then you’ll definitely end up with a working pipeline.
However, I still wasn’t quite sure how to handle the environment variables and credentials that I wanted to bake into the image, and which varied between my local development environment and the final deployed image.
📢 An updated version of this post reflecting recent changes in GitLab Runner can be found here.
I’ve got a project which takes a long time to build. And I rebuild it regularly. I’ve been using the shared runners on GitLab. However, the total time constraint has become a limitation. I’m going to install GitLab Runner as a Docker service on an underutilised EC2 instance.
Read More →