Sensible CI for a Google Cloud Function monorepo
For the past few months I’ve been working for a client that has been around for long enough to have amassed a fair bit of technical debt. After observing for quite some time team dynamics I believe the main root causes can be summarised as:
- High employee turnover.
- Non tecnical Product Managers/Owners.
- Lack of documentation on multiple levels (both technical and business).
- Layer of custom built tools and abstractions on top of GCP services.
- Focus on quantity over quality (no time to stop reflecting on tooling improvement).
Without dwelling too much on organizational theory it’s quite clear how the combination of the points above can negatively affect the quality tooling and development experience in general. What has become particularly frustrating (hence the inspiration for this post) is that Jenkins builds take up to 9 hours (!!!) to propagate infrastructural changes to production.
I am not a software engineer by trade and I am not particularly sure how typical such scenario is in large enterprises. I just feel that, compared to the idylliac articles describiing DevOps best practices, wait times are way too long by any standard. This is obviously hampering the ability to work in an iterative manner and throw different approaches to a problem. It is common to go for a good enough solution to avoid deployment times of a potentially better approach.
Context and Goal
Data infrastructure for the data platform (GCS buckets, Pub/Sub topics, etc.) is set up up in a declarative way
through .yaml files that get parsed by deployment scripts running as Jenkins builds. In a nutshell, every time a
commit makes it to main
, the script parses and interacts with the the entire infrastructure.
Now, I am a big fan of Google Cloud Functions and I consider them a good fit for lots of data ingestion use cases. They are particularly ideal in in scenarios where the workload doesn’t justify a full-blown Dataflow pipeline, such as:
- Pinging an API endpoint on a schedule and drop results in a BigQuery table or GCS bucket.
- Produce or consume Pub/Sub messages based on certain events.
- Retrieve model predictions from a model deployed in Google AI Platform.
Goal
This example should be considered as an exercise (there are probably better suited approaches for production environmnets). Its goals can be summarised in the following points:
- Set up CI (using Github Actions and Google Cloud Build) to deploy Cloud Functions to our GCP project any time
a change is deployed to
main
. - Add logic to make sure
gcloud builds submit
is launched only for changed functions. - Implement CI to run unit tests for all Cloud Functions everytime a change is committed. This is more of a safety measure (we could probably test only functions that have actually changed), but given test suites tend to be fairly ligthweight for cloud functions I see no harm in doing that.
- Make sure the CI pipeline can handle different runtimes (e.g Golang and Python based functions will live in different subfolders).
Project structure
Most of the times, Cloud Functions are so specific in scope (and implementation) that I feel they don’t deserve a repo by themselves as this would probably create a proliferation of many small repositories. The way I usually organise project is to group functions by scope (e.g data-ingestion) or project.
.
├── func_1
│ ├── cloudbuild.yaml
│ ├── main.py
│ ├── requirements.txt
│ └── test_func_1.py
├── func_2
│ ├── cloudbuild.yaml
│ ├── main.py
│ ├── requirements.txt
│ └── test_func_2.py
└── func_4
├── cloudbuild.yaml
├── go.mod
├── main.go
└── main_test.go
I’ve found this folder structure to be handy for a few different reasons:
- Each function is logically separated in subfolders (
func_a
,func_b
, etc.). - By having a
cloudbuild.yaml
file in each subfolder, it is easy to accomodate specific needs using deployment arguments. - Clean separation of dependencies.
- I can mix runtimes if I want to write something in golang or nodejs.
Test everyting, deploy what you need
Given we’re mixing up runtimes in different subfolders, we’ll need to traverse through all folders and check for file
extensions before invoking the correct test suite (e.g gotest
or pytest
). This ha been achieved with the following
shell script:
#!/bin/sh
# For every file in repo
for i in $(ls); do
# Check if it's a folder
if [[ -d "$i" ]]; then
cd $i
is_py=0
for f in $(ls); do
# Check if it contains python files
if [[ $f == *.py ]]; then
is_py=$((is_py+1))
fi
done
# Run tests if it does
if ((is_py == 0)); then
cd ..
continue
else
pip install -r requirements.txt
pytest
cd ..
fi
fi
done
This script will be executed through the following Github Action workflow file on every PR raised against target branches:
on:
pull_request:
branches: [main, master, dev]
name: pytest
jobs:
deploy_functions:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: install_pytest
shell: bash
run: |
pip install pytest
- name: pytest
shell: bash
run: |
chmod +x "${GITHUB_WORKSPACE}/.github/pytest_run.sh"
source ${GITHUB_WORKSPACE}/.github/pytest_run.sh
Deployment
Deployment will be achieved by using Cloud Build: each subfolder will contain a cloudbuild.yaml
file that will
define how the function should be deployed. This file will contain all the function-specific deployment parameters
(e.g runtime, invocation method, etc.). To be extra cautious, we will re-run tests before deploying:
steps:
- name: "docker.io/library/python:3.7"
id: Test
entrypoint: /bin/sh
args:
- -c
- "pip install -r requirements.txt && pip install pytest && pytest"
- name: "gcr.io/google.com/cloudsdktool/cloud-sdk"
args:
[
"gcloud",
"functions",
"deploy",
"func_1",
"--entry-point=hello",
"--region=us-central1",
"--source=.",
"--trigger-http",
"--runtime=python39",
]
In order to deploy only the functions that have been changed between the current and last revision, we’ll need to:
- Authenticate
gcloud
. This can be done through this action. We’ll also need to pass GCP auth credentials in JSON format as a repository secret. - Checkout the repository.
- Get a
git diff
between the current and previous revision and extract folder names. - Traverse through all the folders containing changed files and run
gcloud builds submit
.
on:
push:
branches: [main, master]
name: deploy
jobs:
deploy_functions:
runs-on: ubuntu-latest
steps:
- uses: google-github-actions/setup-gcloud@master
with:
project_id: ${{ secrets.GCP_PROJECT_ID }}
service_account_key: ${{ secrets.GCP_SA_KEY }}
export_default_credentials: true
- uses: actions/checkout@v2
with:
fetch-depth: 2
- name: Get git diff
id: git_diff
shell: bash
run: |
echo ::set-output name=diff::$(git diff --name-only --diff-filter=AMDR @~..@ | grep "/" | cut -d"/" -f1 | uniq)
- name: Run Cloud Build
shell: bash
run: |
for i in ${{ steps.git_diff.outputs.diff }}; do
# Don't run for CI foldere
if [[ "$i" == ".github" ]]; then
continue
fi
cd $i
gcloud builds submit
cd ..
done
Limitations
This approach is working just fine for my current needs. The main limitation I see is aroud mismatches between
the python runtime version used by ubuntu-latest
and the one used by a given function. I am not sure how likely
this might manifest into an actual problem but it might be good to match both runtimes.
References
- This post is based on adapting this SO answer to Github Actions. The answer explains a very similar approach on Google Cloud Build.