I’m building a crawler which I’m going to wrap up in a Docker image. The crawler writes data to a remote MySQL database. However, there’s a catch: the database connection is via an SSH tunnel. Another wrinkle: the crawler is going to be run on ECS, so the whole thing (including setting up the SSH tunnel) needs to be baked into the Docker image.
This post illustrates the process of connecting to a remote MySQL data via a SSH tunnel from Docker. I’m not sure how secure this is. And there are probably better ways to do this. But it’s a start and it works!
SSH Keypair
First we’ll need to generate a new SSH keypair.
ssh-keygen -N "" -t rsa -f id_rsa
This will not prompt for a passphrase and will generate two files:
id_rsa
(private key) andid_rsa.pub
(public key).
Copy the contents of the public key, id_rsa.pub
in ~/.ssh/authorized_keys
on the remote host. 🚨 Make sure that the whole public key is on a single line.
Dockerfile
Next we’ll set up the Dockerfile
.
FROM ubuntu:20.04
ARG PRIVATE_KEY
RUN apt-get update -qq && \
apt-get install -y -qq openssh-client mysql-client && \
rm -rf /var/lib/apt/lists/*
RUN mkdir ~/.ssh && \
echo "Host *" > ~/.ssh/config && \
echo " StrictHostKeyChecking accept-new" >> ~/.ssh/config && \
echo " ControlMaster auto" >> ~/.ssh/config && \
echo " ControlPath ~/.ssh/%r@%h:%p" >> ~/.ssh/config
COPY $PRIVATE_KEY /root/.ssh/id_rsa
COPY tunnel-mysql.sh .
CMD ./tunnel-mysql.sh
This does the following:
- copies an SSH private key across onto the image;
- configures SSH not to prompt for confirmation when connecting to a new host; and
- copies a BASH script onto the image.
Script
Now for the BASH script.
#!/bin/bash
echo -n "Creating SSH tunnel to $HOST... "
ssh -4 -q -N -f -T -M -L 3306:127.0.0.1:3306 $HOST
echo "Done!"
export MYSQL_PWD=$PASSWORD
mysql -e 'SHOW DATABASES;' --user $USERNAME -h 127.0.0.1
echo -n "Closing SSH tunnel... "
ssh -q -T -O "exit" $HOST
echo "Done!"
This sets up an SSH tunnel to the remote host, connects to the MySQL database on the remote host and executes a simple SQL query, then closes the SSH tunnel.
There are an awful lot of ssh
options being used. Let’s quickly unpack those:
-4
— only use IPv4 addresses-f
— run in background-L 3306:127.0.0.1:3306
— bind local port to remote port-M
— run in master mode-N
— don’t run a remote command-q
— run in quiet mode and-T
— don’t allocate a pseudo-terminal.
The SQL query is just a placeholder. Whatever database interactions you need to do would go here. In my case, this is where the crawler would kick in.
Building & Running
Let’s build the image, passing id_rsa
as the value for the PRIVATE_KEY
specified in the Dockerfile
. This private key is going to be baked into the image. 💡 It’s equally possible to provide the private key at run time, however, I ultimately opted to supply it at build time.
docker build --build-arg PRIVATE_KEY=id_rsa -t docker-ssh-tunnel .
Once it’s built, run it.
docker run --rm -t --env-file .env docker-ssh-tunnel
We’re passing through some environment variables from an .env
file which looks like this (all values fictitious!):
HOST=wookie@63.129.24.53
USERNAME=wookie
PASSWORD=04cmRXCPJ111coQpuqmHH6Uc
And the fruits of our labour:
Creating SSH tunnel to wookie@63.129.24.53... Done!
+---------------------+
| Database |
+---------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+---------------------+
Closing SSH tunnel... Done!
Nice!
Automation
The final component is wrapping this up so that it will build using GitLab CI. Here’s the content of .gitlab-ci.yml
:
stages:
- build
variables:
IMAGE_NAME: docker-ssh-tunnel
TAG_LATEST: $CI_REGISTRY_IMAGE/$IMAGE_NAME:latest
DOCKER_TLS_CERTDIR: ""
build:
image: docker:stable
stage: build
only:
- master
services:
- docker:dind
script:
- cp $PRIVATE_KEY id_rsa
- docker build --build-arg PRIVATE_KEY=id_rsa -t $TAG_LATEST .
- docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY
- docker push $TAG_LATEST
The content of id_rsa
is stored in a CI/CD file variable called PRIVATE_KEY
. This file is copied across into the Docker build context before the image is built.
Now the image is stored in a registry accessible from ECS. We can set environment variables on the ECS task to hold the values of HOST
, USERNAME
and PASSWORD
.
🚨 One important caveat to this approach is that anybody who has access to the Docker image also has access to the SSH private key. You can (and should) mitigate the risk by (i) ensuring that the image is stored in a secure, private registry and (ii) using non-trivial database credentials.
127.0.0.1
with host.docker.internal
.