I’m building a crawler which I’m going to wrap up in a Docker image. The crawler writes data to a remote MySQL database. However, there’s a catch: the database connection is via an SSH tunnel. Another wrinkle: the crawler is going to be run on ECS, so the whole thing (including setting up the SSH tunnel) needs to be baked into the Docker image.
This post illustrates the process of connecting to a remote MySQL data via a SSH tunnel from Docker. I’m not sure how secure this is. And there are probably better ways to do this. But it’s a start and it works!
First we’ll need to generate a new SSH keypair.
ssh-keygen -N "" -t rsa -f id_rsa
This will not prompt for a passphrase and will generate two files:
id_rsa(private key) and
Copy the contents of the public key,
~/.ssh/authorized_keys on the remote host. 🚨 Make sure that the whole public key is on a single line.
Next we’ll set up the
FROM ubuntu:20.04 ARG PRIVATE_KEY RUN apt-get update -qq && \ apt-get install -y -qq openssh-client mysql-client && \ rm -rf /var/lib/apt/lists/* RUN mkdir ~/.ssh && \ echo "Host *" > ~/.ssh/config && \ echo " StrictHostKeyChecking accept-new" >> ~/.ssh/config && \ echo " ControlMaster auto" >> ~/.ssh/config && \ echo " ControlPath ~/.ssh/%r@%h:%p" >> ~/.ssh/config COPY $PRIVATE_KEY /root/.ssh/id_rsa COPY tunnel-mysql.sh . CMD ./tunnel-mysql.sh
This does the following:
- copies an SSH private key across onto the image;
- configures SSH not to prompt for confirmation when connecting to a new host; and
- copies a BASH script onto the image.
Now for the BASH script.
#!/bin/bash echo -n "Creating SSH tunnel to $HOST... " ssh -4 -q -N -f -T -M -L 3306:127.0.0.1:3306 $HOST echo "Done!" export MYSQL_PWD=$PASSWORD mysql -e 'SHOW DATABASES;' --user $USERNAME -h 127.0.0.1 echo -n "Closing SSH tunnel... " ssh -q -T -O "exit" $HOST echo "Done!"
This sets up an SSH tunnel to the remote host, connects to the MySQL database on the remote host and executes a simple SQL query, then closes the SSH tunnel.
There are an awful lot of
ssh options being used. Let’s quickly unpack those:
-4— only use IPv4 addresses
-f— run in background
-L 3306:127.0.0.1:3306— bind local port to remote port
-M— run in master mode
-N— don’t run a remote command
-q— run in quiet mode and
-T— don’t allocate a pseudo-terminal.
The SQL query is just a placeholder. Whatever database interactions you need to do would go here. So, in my case, this is where the crawler would kick in.
Let’s build the image, passing
id_rsa as the value for the
PRIVATE_KEY specified in the
Dockerfile. This private key is going to be baked into the image. 💡 It’s equally possible to provide the private key at run time, however, I ultimately opted to supply it at build time.
docker build --build-arg PRIVATE_KEY=id_rsa -t docker-ssh-tunnel .
Once it’s built, run it.
docker run --rm -t --env-file .env docker-ssh-tunnel
We’re passing through some environment variables from an
.env file which looks like this (all values fictitious!):
HOSTemail@example.com USERNAME=wookie PASSWORD=04cmRXCPJ111coQpuqmHH6Uc
And the fruits of our labour:
Creating SSH tunnel to firstname.lastname@example.org... Done! +---------------------+ | Database | +---------------------+ | information_schema | | mysql | | performance_schema | | sys | +---------------------+ Closing SSH tunnel... Done!
The final component is wrapping this up so that it will build using GitLab CI. Here’s the content of
stages: - build variables: IMAGE_NAME: docker-ssh-tunnel TAG_LATEST: $CI_REGISTRY_IMAGE/$IMAGE_NAME:latest DOCKER_TLS_CERTDIR: "" build: image: docker:stable stage: build only: - master services: - docker:dind script: - cp $PRIVATE_KEY id_rsa - docker build --build-arg PRIVATE_KEY=id_rsa -t $TAG_LATEST . - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY - docker push $TAG_LATEST
The content of
id_rsa is stored in a CI/CD file variable called
PRIVATE_KEY. This file is copied across into the Docker build context before the image is built.
So now the image is stored in a registry accessible from ECS. We can set environment variables
on the ECS task to hold the values of
🚨 One important caveat to this approach is that anybody who has access to the Docker image also has access to the SSH private key. You can (and should) mitigate the risk by (i) ensuring that the image is stored in a secure, private registry and (ii) using non-trivial database credentials.
If you’re doing this on a Mac then you might need to replace occurrences of