Professional docker images

Osher El-Netanany
Israeli Tech Radar
Published in
9 min readMay 10, 2023

--

Containers have conquered the world with vigor and impunity. As “copy-pastable” snippets of getting-started washed the internet, what once was a fringe technology is a cornerstone of the industry.

However, getting-started is not enough any more.

The title “Mind the gap” print on a railway platform
Mind the gap (image from here)

This article addresses all teams that ship containerized applications, regardless to their tech-stack.

While I try to keep the explanations of the principals tech-agnostic, the examples here are using NodeJS. Yet, each tech-stack has its own equivalents. Same for other containerization platforms.

Why the gap exists

Two groups of people in two sides of a broken bridge
Because. (image from here, augmented)

Most examples available aim to get one started as quickly as possible. This is true for stackoverflow, codewhisperer, copilot, Bard and ChatGPT.

The results are usually working, but are far from optimal, because of three reasons:

  1. They aim for the lowest common denominator of readers, and therefore skip over professional details.
  2. The best practices are still in opinion phase and are yet to be formalized.
  3. Some optimizations depends on the team’s capabilities, culture, and concrete case.

IMHO, the most significant optimization to start with is: Slim Images.
Why? Resource footprint and Attack surface.

Resource footprint

An apple wrapped with a measuring tape
LOSE WEIGHT NOW — ASK ME HOW! (image from here)

Container images have a life-cycle.
First, they are created in CI. Then they are uploaded to the container registry (CR). From there they are pulled to target environments, and to PCs of developers that develop against them.

Thus, the implication of bigger images are:

  1. Network Bandwidth
    Pushed to the registry once, downloaded to every docker host that needs to run it.
  2. Speed
    Downloading big images takes longer, which is sometimes visible on service-start time.
    Same if developers need to work with them locally and download daily the latest & greatest over a mediocre VPN.
  3. Host Disk-space
    More disk required for docker hosts in cloud and for developers that need them locally.
  4. Registry Disk-space
    Expert ops teams define policies by which old images are discarded when they get old and are not in use.
    Alas — this is something I often see forgotten, especially on managed clouds where the storage is cheaper.

Light Pack = Light Pay.

Over time all these costs add up, together with the CO2 emissions they represent.

Attack surface

US Abrams tank
A container (image from here)

This is not a discussion about security, so let’s just agree that when a hacker gets in, they can use everything they find.

  1. If they find very little, there’s little they can use.
  2. The more things inside, the higher the chance that any of them includes a weakness that can be exploited.

Pack less = less risk.

The more modern attacks are the sophisticated ones that use weaknesses that look insignificant, but when found together surmount to a fatal flaw.

No crack in a fortress may be considered small.
Rv. John Hale, The crucible / Arthur Miller

Where can we cut weight?

A diagram that shows that most of the weight in a container is the OS and the project dependencies
OS, Runtime, Dependencies, User-Code (image from here)

Looking at the breakdown of a typical container of a NodeJS micro-service, we can identify these 4 parts. Most of them are in our control.

Let’s dig in, starting with the lowest level, and grow up from there.

Level -2 — The naïve way

A mocking picture of a person trying to hit with a hammer a wasp that is sitting on his nose
Not funny. (image from here)

Here’s the current spirit of the common simple and getting-started practice for NodeJS containers.

God forbid.

This snippet uses an official main NodeJS image as a base, copy inside it the project sources, runs inside npm install, declares exposed ports and sets the command.

What’s wrong with this way?

  1. It ships to production the means to download compile and modify dependencies.
  2. It packs all project source files — including files that are not meant to be used in production.
  3. It uses a base image of a fat OS equipped with all the tools a hacker needs to wreck havoc.
  4. The default entrypoint is a shell.
    ...
    — and the worst —
  5. The default user in this image is the privileged root.

This answer already fails job interviews.

Level 0 — A slim base

As of today the default node image weights over 300MB.
By choosing a different tag, you can reduce your base OS to less than 55MB.

A picture of an ultra-thin mobile phone
So light! Made by the elves. (image from here)

But even if the base image would be a thinner official image — say,
FROM node:lts-alpine or FROM node:18-alpine,
or a private image on your private CR that extends such image — you’re still in trouble,

because it still ships to production all the tools and means to compile and manage dependencies!

This answer fails job interviews just as well.

The issue is that we need OS packages and tools in order to install and compile our dependencies, but once that’s done — we don’t need them any more.
Moreover — we must not allow them in production!

Level 1 —Multi Phase build to the rescue!

Lilo from the iconic movie “5th element” presenting her Multi-Pass
Yeah, she knows it’s a multi pass… (image from here)

Docker phased builds happen whenever a Dockerfile has more than one FROM statements. Each time a FROM statement appears— the step starts over from a fresh base image, but preserves a build-time link to the FS trees of its previous stages.

This Dockerfile uses a two-phase build to produce a slimmer and secure result:

wellThe first phase uses an official node:lts-alpine image to install all the required dependencies, as specified in NodeJS projects using the package.json. Note the use of --production switch, to direct the package-manager to leave out dependencies that are used only for development (e.g. utilities for lint, build, pack, test, coverage, local-development, debug helpers, etc.).
However, this will be effective only if the team makes sure to save these dependencies as devDependencies.

The second phase uses a pure alpine OS as its base — the base OS and platform architecture of both phases must match.

  • First, it updates the OS packages and installs on it only the node runtime. Since service images are meant to be immutable — it does so with --no-cache to prevent any residual files from bloating the image. You may also close the door behind you —by removing apk.
  • Second, it copies all the dependencies installed by the previous phase (using COPY --from=0), and adds to that the project files.
    👉Read more about Dockerfile FROM and COPY .
  • Third, it prepares for secure production execution by creating a non-privileged user-group and user.
    You could also consider adding iptables rules, but this could require some concrete assumptions — just saying, there are more custom improvements to be had.
  • Lastly, it sets the port and the entrypoint.

The result image that is shipped to production has a significantly slimmer OS.
(❗) ️Note:

  • It does not even have npm!
    the protocol between the project and the container here is
    node index.js.
    Some projects may require a different entrypoint than index.js like server.js, app.js, bin/www, run.js etc.
  • Any additional CLI arguments passed to the container (if any) — will be passed to this command.
    👉Read more about Dockerfile CMD and ENTRYPOINT.

Level 2 — Optimize CI using custom base-images

a visual paraphrase of the infinity symbol with the titles continuous integration and continuous delivery
CI = Continuous Integration (image from here)

Consider that most of the setups today deploy a group of services. Micro, Nano or plain SOA — there will be a repeating common base that runs over and over, per-service per-build.

Docker is built smartly with a layering system, which helps it detect stages that should be considered deterministic, and not rebuild them unless underlying Dockerfile commands or resources have changed. However, many CI setups add build-arguments to the docker build that embed in the resulted image information about the build (e.g. the commit from which it came, and more). This negates the wit of the layers cache, resulting in building over and over the same layers, despite having nothing new in them.

If you inspect the phases above, both phases have steps that will be true for every service in the setup. There is an optimization we can apply that will bring two benefits:

  1. Reuse a common set-up
    if any change is added to this setup, all services can benefit from it.
  2. Improve build time
    Run in a service-build only the delta that matters to the current service.

This is done with two base images, and a Dockerfile for actual service images.

Node-Builder

Bob The-Builder — a character from a TV show for children
Like Bob, but with node. (image from here)
  • Includes all the tools to compile and build the project
  • Published as my-private-registry/node-builder:lts

The content of this base image is totally dependent on the system and the tools it needs to build its services.

This example adds some OS packages. None of them are necessary in the context of our limited example. I just show how it’s done — real life might require you some OS packages.

This image does not get to production.
If your CI jobs run in containers — you may add to it also CI tools, and use it as a base for your CI jobs as well.

Node-Runner

A hamster running on a rodent-size tredmill
I wanted a gopher, but that’s what I found… (image from here)
  • Includes the bare minimum to run code securely on production
  • Published as my-private-registry/node-runner:lts

Dockerfile for services

A blueprint of a 3 story house
The blueprint for your services (image from here)

What’s left in the two-phase build:

Few important things to note here.

First — we do not build node-builderand node-runner in every build of every service. They are built only when we need to change this basic setup — e.g. upgrade node version or OS packages, so we do need to build them frequently.
However, every service build still needs to pull the latest & greatest OS packages to close down known security holes using apk update && apk upgrade.

Second — the tags are parameterized, default to latest. This gives better control on what base images you use in case you some projects have to be frozen with older bases.

Last — move to the non-privileged user as a last step keeps all files belong to root.

Level 3 — Slim application distribution

A progress bar of a copying a heavy file on windows OS
Preparing to copy… (image from here)

Optimizing the containerization level is optimizing the wrap around your application, which in the world of micro/nano services — is a great leap. Yet, sometimes the application itself is unjustifiably fat.

First — on top of OS-level packages, applications bring with them dependency libraries. While some require more awareness than others, but this is true for all languages and platforms.

Second — in many platforms a project can be delivered either as final compilation, or as sources. Compiled distributions are notably smaller and safer, but are a bit less comfortable to troubleshoot. Transpiled distributions could get close to that.

Third — do not copy all your project files naïvely (e.g. COPY . .).
Lint, build & test configuration, test files, test fixtures, docs, readme, CI/CD files, deploy scripts — none of them are required on production.

Instead, use explicit operations that copy only the parts that are necessary to run on production:

# this assumes all the production files are index.js and lib/**/*
# if there are more - collect them explicitly
COPY index.js /app/index.js
COPY lib /app/lib

Last —sometimes services get heavy because they really involve a lot of logic. But sometimes, the size is a sheer result of neglect.

When services evolve over time while trying to maintain unjustified full backward compatibility, or when future-compatibility relays heavily on feature flags scaffold code, then it requires discipline.
The pitfall is when the Definition-of-Done of a feature does not include the cleanup of previous versions or the the feature-flag scaffold code.

Messy project = heavy pack.

Conclusion

A hiker carrying a heavy back-pack
Maybe you don’t carry that pack. But somebody does. (image from here)

The getting-started snippets are not good enough. Every step matters, the numbers add up.

  1. Use a slim image base.
  2. Ship to production only what is necessary to run your code, and run it with a non-privileged user.
  3. Optimize your containerization process using custom base-images and a multi-phase builds.
  4. Cleanup unnecessary dependencies and code continuously.

Now you, WDYT, what else be cut off?

What’s faster, a long press or a multi-click?

--

--

Osher El-Netanany
Israeli Tech Radar

Coding since 99, LARPing since 94, loving since 76. I write fast but read slow, so I learnt to make things simple for me to read later. You’re invited too.