If your work requires that you use an on-prem Docker registry, you may find it difficult to use open source Helm Charts that refer to publicly hosted Docker images from
places like Docker Hub or quay.io.
I’ve run into this problem myself a couple times. Sometimes I just browse through the values.yml files and figure out what images are being used,
then pull them to my local machine before retagging them and pushing to our registry.
If there are a lot of images, I’ll sometimes write a quick bash script to do this. But so far, that bash script would just pull and push one image at a time.
To speed this process up, and to make it easier to repeat for any Helm Chart, I revisited this today. Here’s what I came up with.
helm template . \
    | yq '..|.image? | select(.)' \
    | sort \
    | uniq \
    | xargs -I % -n 1 -P 4 bash -c "docker pull % && docker tag % my.registry.com/% docker push my.registry.com/%"
Let’s break this down. We’ll look at each command one at a time.
helm template .
This assumes you are in a directory containining a Helm chart. The template command prints out all of the Kubernetes objects that would normally be created via helm install. Its output is
formatted in yaml.
yq '..|.image? | select(.)'
yq is a variant of the popular jq command line tool.
Whereas jq expects to deal with JSON streams, yq allows you to do the same with yaml.
The magic is in the filters that yq provides. Filters act on some input and performs some operation to produce an output. In this case,
we’re using the recursive descent operator to find all
fields named image. The ? in .image? indicates that yq should produce either the image value or null if it doesn’t exist. Piping that to select(.) will ignore all null values, finally
returning a new-line delimited string of the images we’re after.
At this point in the chain, our output looks something like this:
nginx:latest
prom/prometheus:latest
nginx:latest
Notice that nginx:latest is listed twice. That’s intentional. Most charts use an image in many places, so we need to filter them so we don’t have duplicates.
sort | uniq
We’ll discuss sort and uniq together. As you can probably guess, sort will sort the output of the previous step alphabetically. This step is required in order for uniq to do its job and remove duplicates.
Finally, we get to push some images! xargs is a nifty tool. With it you can construct and execute arbitrary unix commands based on standard input. Another awesome feature is it allows you to easily parallelize tasks. Let’s disect
the arguments I’m passing to it.
-I % indicates that we would like to use the % character as a substitution character when building up our command. Without this, xargs defaults to appending input to the end of the command.
-n 1 forces xargs to provide one line of input at a time to each command execution. This is useful for us, as each image from the previous step is separated by \n.
-P 4 is the max-procs flag. This is where our parallelism comes from. xargs will run 4 executions in parallel.
The remaining part of this is the command we want xargs to invoke. For each Docker image, xargs will call bash -c "docker pull % && docker tag % my.registry.com/% docker push my.registry.com/%".
The actual docker pull, docker tag, and docker push commands we want to run are wrapped in bash -c. This is done just so it’s clear that each command after && is meant to be included in the xargs execution.
So there we have it. A one line bash script for quickly copying the images required by a Helm chart to your own Docker Registry.
One More Thing
Sometimes it’s helpful to do a ‘dry-run’ and see which commands xargs will execute, without actually executing them. To do this, just add echo to the start of the xargs command. Like this:
xargs -I % -n 1 -P 4 echo 'bash -c "docker pull % && docker tag % my.registry.com/% docker push my.registry.com/%"'