Creating Git Commits in CI

I use Continuous Integration (CI) extensively across almost all of my remote Git repositories. These are the typical jobs which it’s used for:

  • running tests
  • building documentation and
  • acquiring data.

This post addresses the last item, acquiring data.

The workflow is typically something like this:

  1. build project
  2. run a script (or scripts to gather data)
  3. do some cleaning or other data preparation
  4. commit the new data and pushing back to the remote repository.

Obviously this approach only works for acquiring data of modest size where it can be usefully stored in a Git repository.

In the sections below I dig into the details of the final step in the workflow, adding the data to the repository and pushing it back to the remote.

GitLab

Most of my projects are hosted on GitLab, where I make extensive use of the abilitity to create project hierarchies using groups and sub-groups. These are generally private repositories.

Create a Personal Access Token

To be able to push back to the remote repository you’ll need to create a Personal Access Token (PAT) with write_repository scope. I will generally create one PAT per project because this gives me more granular control over access (as opposed to having a single PAT which is used across multiple projects).

Under SettingsCI/CDVariables in the repository add an environment variable, GITLAB_PAT, which contains the PAT.

Setup CI

Now you need to add or amend a .gitlab-ci.yml file.

If you want to add content directly to the main branch, use something like this:

gather:
  interruptible: false
  only:
    - main
  before_script:
  - git config --global user.email "${GITLAB_USER_EMAIL}"
  - git config --global user.name "${GITLAB_USER_NAME}"
  script:
  - date +"%Y%m%d-%H%M%S" >>main-times.txt
  after_script:
  - git add -f main-times.txt
  - git commit -m "Record date & time [$(date +'%Y-%m-%d %H:%M')]"
  - git push -o ci.skip https://gitlab-ci-token:$GITLAB_PAT@$CI_SERVER_HOST/$CI_PROJECT_PATH.git HEAD:main

💡 If you have a master branch, just substitute master for all occurrences of main.

A couple of notes about what’s going on there:

  • Instead of generating a timestamp using the date command you could use the CI_JOB_STARTED_AT environment variable.
  • The -o ci.skip option is important because it prevents GitLab from immediately trying to run the CI workflow on the resulting commit.

It might make more sense to commit the new data onto a separate branch, in which case try something like this:

gather-branch:
  interruptible: false
  variables:
      DATA_BRANCH: collect
  only:
    - main
  before_script:
  - git config --global user.email "${GITLAB_USER_EMAIL}"
  - git config --global user.name "${GITLAB_USER_NAME}"
  - git checkout -B $DATA_BRANCH || true
  script:
  - date +"%Y%m%d-%H%M%S" >>branch-times.txt
  after_script:
  - git add -f branch-times.txt
  - git commit -m "Record date & time [$(date +'%Y-%m-%d %H:%M')]"
  - git push -o ci.skip https://gitlab-ci-token:$GITLAB_PAT@$CI_SERVER_HOST/$CI_PROJECT_PATH.git $DATA_BRANCH

🚨 If you get an authentication failure with the above approach then try replacing http with https in the git push command.

GitHub

My public repositories are most often hosted on GitHub.

Setup Action

Actions are configured via YAML files in the project’s .github/workflows directory. Create an appropriately named .yml file in that directory and copy the configration below. This workflow has two jobs:

  • gather — commits content to the main (default) branch; and
  • gather-branch — commits content to the collect branch.

In practice you’d chose one of these approaches which suits your needs and delete the configuration for the other.

on:
  workflow_dispatch:
  push:
    branches:
      - main
    
jobs:
  gather:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup Git
        run: |
          git config --local user.email "actions@github.com"
          git config --local user.name "GitHub Actions"
      - name: Update file
        run: date +"%Y%m%d-%H%M%S" >>main-times.txt
      - name: Commit & push data
        run: |
          git add -f main-times.txt
          git commit -m "Record date & time [$(date +'%Y-%m-%d %H:%M')]"
          git push
  gather-branch:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          ref: collect
      - name: Setup Git
        run: |
          git config --local user.email "actions@github.com"
          git config --local user.name "GitHub Actions"
      - name: Update file
        run: date +"%Y%m%d-%H%M%S" >>branch-times.txt
      - name: Commit & push data
        run: |
          git add -f branch-times.txt
          git commit -m "Record date & time [$(date +'%Y-%m-%d %H:%M')]"
          git push