Automate Your Code with GitHub Actions #6: Real-World GitHub Actions: Data Automation, Linting, and Package Publishing

In the previous article, we explored how to use GitHub Actions to pre-build DevContainers, deploy static sites to GitHub Pages, and ship Cloudflare Workers.

In this article, we continue our cookbook with more practical recipes that demonstrate the versatility of GitHub Actions beyond traditional CI/CD.

Who is Bas Steins?

Bas is a software developer and technology trainer based in the triangle between the Netherlands, Germany, and Belgium. For almost two decades, he has been helping people turn their ideas into software and helping other developers write better code. This series is based on his mini-guide "Automate Your Code with GitHub Actions".

Using GitHub Actions for Data Entry Automation

In this example, we're building an open source podcast directory!

We're using GitHub Issues to retrieve new candidates for the podcast directory. When a new issue is created, we want to automatically add the podcast to our directory. If it's in the directory, we want to update a static site.

What you'll learn

Using a GitHub Action on the issues event
Parsing issue data to JSON
Using outputs and inputs to pass data between jobs
Updating a static site with new data

Series: "Automate Your Code with GitHub Actions"

This is part 6 of our series titled "Automate Your Code with GitHub Actions", based on the mini-guide by Bas Steins. Be sure to check out the rest of the series:

Fundamentals of GitHub Actions
Events and Triggers in GitHub Actions
Jobs, Actions, and the Marketplace
Creating Custom GitHub Actions
Examples: DevContainers, GitHub Pages, and Cloudflare Workers
Examples: Data Automation, Linting, and Package Publishing ← you are here!
Repository Splitting and Quick Reference

Setting Up an Issue Template

Before we create the GitHub Actions workflow, we need to set up an issue template for adding new podcasts to the directory. This template will help contributors provide the necessary information for adding a podcast.

Create a new file named add-podcast.yml in the .github/ISSUE_TEMPLATE directory of your repository.

name: New Podcast
description: Add a podcast to the list
title: "[Add Podcast]: "
labels: ["add-podcast"]

body:
  - type: input
    id: podcastIndexId
    attributes:
      label: PodcastIndex.org ID
      description: The ID of the podcast on PodcastIndex.org
      placeholder: ex. 522889
    validations:
      required: true
  - type: input
    id: name
    attributes:
      label: Title of the podcast
      description: The name of the podcast
      placeholder: Syntax.fm
    validations:
      required: true
  - type: textarea
    id: tags
    attributes:
      label: Tags for the podcast
      description: One per line

This will enable a new issue template for adding podcasts to the directory.

Within a GitHub Actions workflow, we can access the data from the issue template and use it to update our podcast directory. To parse the data to JSON, we can use the edumserrano/github-issue-forms-parser action.

The next step will be to create a workflow that triggers on the issues event and use the parsed data to update the podcast directory. Our podcast directory is a JSON file that contains the list of podcasts with their details.

Creating the GitHub Actions Workflow

name: Add Podcast

on:
  issues:
    types: [labeled]

jobs:
  add-podcast:
    if: contains(github.event.issue.labels.*.name, 'add-podcast')
    runs-on: ubuntu-24.04
    defaults:
      run:
        working-directory: app
    steps:
      - uses: actions/checkout@v4.1.7
      - name: Set up Python 3.12
        uses: actions/setup-python@v5
        with:
            python-version: "3.12"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
      - name: Run GitHub issue forms parser
        id: issue-parser
        uses: edumserrano/github-issue-forms-parser@v1
        with:
          template-filepath: '.github/ISSUE_TEMPLATE/add-podcast.yml'
          issue-form-body: '${{ github.event.issue.body }}'
      - name: Generate Podcast YAML from issue
        run: |
            python add_podcast_from_issue.py \
                --yaml-directory ../podcasts \
                --json-issue '${{ steps.issue-parser.outputs.parsed-issue }}' \
                --api-key '${{ secrets.PODCASTINDEX_API_KEY }}' \
                --api-secret '${{ secrets.PODCASTINDEX_API_SECRET }}'
      - name: Generate Podcast JSON files
        run: |
            python generate_podcast_json.py \
                --yaml-directory ../podcasts \
                --json-directory ../generated \
                --api-key '${{ secrets.PODCASTINDEX_API_KEY }}' \
                --api-secret '${{ secrets.PODCASTINDEX_API_SECRET }}'
      - name: Generate README
        run: |
            python generate_readme.py \
                --json-directory ../generated \
      - uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '  Thanks for reporting! Our robots already did their work!  '
            })
      - uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.update({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              state: 'closed'
            })
      # Commit results back to repository
      - uses: stefanzweifel/git-auto-commit-action@v5.0.1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        with:
          commit_message: Generate podcast data
          branch: main
          commit_user_name: Podcast Bot
          commit_user_email: podcastbot@bascodes.com
          commit_author: Podcast Bot <podcastbot@bascodes.com>

Note that we will need secrets for the PODCASTINDEX_API_KEY and PODCASTINDEX_API_SECRET to access the PodcastIndex API. You can get one for free at PodcastIndex.org. This enables us to fetch additional metadata along with cover images for the podcasts.

You'll also need to create a Python script to parse the issue data and update the podcast directory. The script should take the parsed issue data and update the podcast directory accordingly.

Using GitHub Actions to Periodically Scrape a Website

This example is taken from a blog post by Simon Willison with the title "Git scraping: track changes over time by scraping to a Git repository". The blog post explains how to use GitHub Actions to periodically scrape a website and store the scraped data in a Git repository.

Using a storage format that is both human-readable and version-controlled allows you to track changes over time and easily compare different versions of the scraped data.

What you'll learn

Using a GitHub Action on a schedule
Using jq to parse JSON data
Committing and pushing changes to a Git repository as a JSON file

name: Scrape latest data

on:
  push:
  workflow_dispatch:
  schedule:
    - cron:  '6,26,46 * * * *'

jobs:
  scheduled:
    runs-on: ubuntu-latest
    steps:
    - name: Check out this repo
      uses: actions/checkout@v4
    - name: Fetch latest data
      run: |-
        curl https://www.fire.ca.gov/umbraco/Api/IncidentApi/GetIncidents | jq . > incidents.json
    - name: Commit and push if it changed
      run: |-
        git config user.name "Automated"
        git config user.email "actions@users.noreply.github.com"
        git add -A
        timestamp=$(date -u)
        git commit -m "Latest data: ${timestamp}" || exit 0
        git push

Using Biome to Lint Code on a Pull Request

You can use GitHub Actions to automatically lint code with Biome and comment on pull requests with the linting results. This workflow runs on the pull_request event and checks for linting errors in the code. If any errors are found, it automatically creates a new pull request with the fixes.

What you'll learn

Using a GitHub Action on a pull_request event
Linting code with Biome
Automatically commenting on pull requests with linting results

name: Biome Lint Fix

on:
  pull_request:
    branches:
      - main

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout pull request branch
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.ref }}

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 16

      - name: Install dependencies
        run: npm install

      - name: Run Biome lint fix
        run: npm run biome:fix

      - name: Check for changes after biome fix
        id: git-check
        run: |
          if [ -n "$(git status --porcelain)" ]; then
            echo "changed=true" >> "$GITHUB_OUTPUT"
          else
            echo "changed=false" >> "$GITHUB_OUTPUT"
          fi

      - name: Commit and push biome fixes
        if: steps.git-check.outputs.changed == 'true'
        run: |
          # Create a new branch name using the GitHub run ID for uniqueness
          NEW_BRANCH="biome-lint-fix-${{ github.run_id }}"
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git checkout -b "$NEW_BRANCH"
          git add .
          git commit -m "Apply biome lint fixes"
          git push origin "$NEW_BRANCH"
          echo "NEW_BRANCH=$NEW_BRANCH" >> $GITHUB_ENV

      - name: Create pull request for biome fixes
        if: steps.git-check.outputs.changed == 'true'
        uses: peter-evans/create-pull-request@v5
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          branch: ${{ env.NEW_BRANCH }}
          base: ${{ github.event.pull_request.head.ref }}
          title: "Biome Lint Fixes"
          body: "This PR applies auto-corrected biome lint fixes."

Building and Publishing a Python Package

What you'll learn

Using a GitHub Action on the push event
Building a Python package
Publishing the package to PyPI

name: Publish Python Package

on:
  push:
    branches:
      - main

jobs:
    build:
        runs-on: ubuntu-latest
        steps:
        - name: Checkout code
          uses: actions/checkout@v4

        - name: Set up Python
          uses: actions/setup-python@v5
          with:
            python-version: '3.x'

        - name: Install dependencies
          run: |
            python -m pip install --upgrade pip
            pip install setuptools wheel twine

        - name: Build package
          run: python setup.py sdist bdist_wheel

        - name: Publish package
          run: twine upload dist/*
          env:
            TWINE_USERNAME: __token__
            TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}

In the next and final article, we'll tackle one more advanced recipe — repository splitting — and wrap up with a comprehensive quick reference you can bookmark. See you there!

Automate Your Code with GitHub Actions #6: Real-World GitHub Actions: Data Automation, Linting, and Package Publishing

Who is Bas Steins?

Using GitHub Actions for Data Entry Automation

What you'll learn

Setting Up an Issue Template

Creating the GitHub Actions Workflow

Using GitHub Actions to Periodically Scrape a Website

What you'll learn

Using Biome to Lint Code on a Pull Request

What you'll learn

Building and Publishing a Python Package

What you'll learn

We make Tower, the best Git client for Mac and Windows.

Your Download is in Progress…

Giveaways. Cheat Sheets. eBooks. Discounts. And great content from our blog!

Who is Bas Steins?

Series: "Automate Your Code with GitHub Actions"

Related Posts

We make Tower, the best Git client for Mac and Windows.

Your Download is in Progress…

Giveaways. Cheat Sheets. eBooks. Discounts. And great content from our blog!