What exactly is CI/CD?

Published in

Israeli Tech Radar

10 min readJan 10, 2022

It has become a software-development essential that comes with a set of buzzwords, so many people kind ‘a picked them up from the context. But how much really is understood about CI/CD? This is an in-simple-words break down.

This post is for technical and non-technical people alike, meant to help software development teams identify what they have, what they’re missing, and what they should expect from CI/CD. More technical details — in future posts.

The acronym is Continuous Integration / Continuous Delivery, which is broken down to many fine terms that are different from culture to culture, and many of them are highly opinionated. I’ll do my best to be as generic as I can and yet help you get them without nitpicking on their names.

Let’s dive in.

What’s Integration?

In software — a process whose end product is a distribution that is as proofed as possible, and made available — usually by placing it in a repository.

The key terms in that definition are:

distribution — one or more files that can be shipped to a target environment. AKA a build (noun), a package, artifacts or binaries.
Commonly, it’s named and tagged with a version and an indication of what sources it was produced from — so it’s also called a Version.
When a version comes only to fix things without adding functionality — we it’s called a Patch.
A repository — any accessible point, usually via network, that can act as a source of truth for distributed versions. (e.g. artifactory, npm, docker registry, an object-store / bucket of archive files, a releases page, a git repo, and arguably — a marketplace / store).
There could be alternative ways, but today these are by-far the dominant forms.
As proofed as possible — by proofing the distribution using as much tests as the budget allows to make sure no bugs leak out. Tests like code-lint, static-code analysis, security scans, dependencies scan, unit-test, minimum coverage bar, end-to-end functional tests, load tests, platform-compatibility tests, and more. I should explain all these in later posts.

Each organization customizes its own quality policy according to business and budget.

What do you mean by Process?

AKA — a workflow, a procedure.

The process covers the trigger, the Definition of Done, the Order-of-Operations, and the Human Interactions involved in it.

Trigger — (aka start point, entry, fire point) — the event that starts the whole process.
Definition of Done — (DoD) sum of all the objectives that must be met in order for the process to be complete successfully.
Order of Operations — (OoO) the steps involved and the control points between them. It goes hand in hand with DoD, comes to elaborate when the order matters.
Human interactions — that’s what glues the process together with other processes in the organization and allows supervision and visibility on what’s going on.

Obviously the details vary across teams and cultures, but for Integration — the trigger is usually submission of new code, the DoD is the built artifacts, Human Interaction is the sum of notifications that communicate and document the process and any expected human response to any of them. The the OoO is the steps required to produce and proof the build, and if or not it should proceed to Delivery.

What’s Delivery?

A process whose goal is to bring one or more distributions to one or more target environments. Usually through a best effort to mitigate risk to the SLO.

target environment — where the distribution should run. A company Server? A cloud cluster? Embedded in product hardware? A customer’s device?
A common semantic is to call CD to production continuous delivery, while CD to any preview environments without getting to production is called continuous deployment, which is confusing — because both are CD. However, a process that continuously delivers to production usually is thought to faces higher risks, and therefore deserves it’s own name.
best effort — capped by time, resources and budget.
The more allocated (and efficiently used) → the less risk.
mitigate risk — any action that will result in less risk in the delivery and or its consequences. e.g.
— exercise the deployment of the distribution to a sandbox environment and run more tests against it (e.g. e2e functional tests, but not only).
— transfer the workload to the new version gradually and pending transfer of more workload on healthy metrics produced by the new version.
— keep track to ensure quick & easy rollback to the previous version in case of show stoppers.
SLO — Service Layer Objective — i.e. — an objective this software is there to provide (e.g. information, communication, access, shopping, etc.).

Like checks during Integration — It’s highly customized.

Mind that so far — no mention for continuousness.

So what’s Continuous?

Continuous is when a process runs whenever it’s needed and without delay — and arguably — without breaks.

Stretching it, one can claim that continuous can still be manual: physical production lines of 24/7 shifts are continuous. 24/7 call centers accept calls continuously. Even in hi-tech for a long while continuous delivery was manual.

But to be relevant, in reality, the more manual the process is — the less practical it is to be continuous.

Continuous-Manual mean shifts over shifts of expansive humans.

The case for Automation

When a process is executed by programs — we say it’s automated.

When the rollout to production is continuous and automated — we have CD: code ships to production without breaks.

For a long time when manager heard this, they got this mental image of a downhill rider, which is ridiculous:

(image from: https://www.youtube.com/watch?v=Kx_u4ALWQsc)

The breaks are DEFINITELY there, and they WILL stop bad shipments!
On the same breath, the downhill slope also makes it easy to ship mitigations to problems faster and more reliably.

Given that programs don’t take coffee breaks, don’t get sick, do not forget, do not require a learning curve, are not distracted by phones, are not affected by moods, and are happy to execute the same thing over again — they have an inherent advantage in doing that kind of meticulous repetitive work.

On the other hand, first — programs require power, hardware, and network for their wellbeing. Luckily, nowadays — that is the easy part and is much cheaper than salaries.

Second — programs cannot solve problems they are not programmed to handle, and cannot yet detect anomalies reliably. Given an issue — they cannot think of a creative solution for it.
Although they are getting better on all that too, they are not quite there. Yet.

So until they do — it’s up to us to describe for them in a language they understand how to execute every step, and how to mitigate every edge-case in a workflow, which — by the way — is called programming.

Wait, what? Isn’t programming the work of developers?

The rise of DevOps

Given Devs and Ops have conflicting interests, for years they were kept apart. As the answer for more and more operation tasks involved programming, the rift between them started to close down.

This has led to the rise of the term DevOps, which is found in two flavors:

Operation-Developers — is a team of developers, usually cross-wide, that owns the integration and delivery processes.
Shift-left — it’s the product developers themselves that own more of the process — the integration & delivery concerns, and increasingly — the on-call duty as well.

What would you like, to raise the bar for developers, asking them to learn the domains of integration and delivery, or outsource these domains to a specialized team and handle the inter-team handshakes? This depends on whether taking that learning curve off developers pays more than the cost of inter-team communication in ops real-time.

Obviously, this is a very much a matter of culture.

So now I need …more developers?

Sounds like replacing expensive human workers with pieces of software and hardware that must be maintained by more expansive humans.

Why would I want to do that?

The bottom line is taking the delivery flow as seriously as your actual product.

When you do — this is what you can get:

Detect Problems Early

Let automation do the meticulous tests. Over and over.
Let it do it on every code submitted, without even asking.
Let it raise the breaks and inform whenever something is misaligned.

Ship Faster

Let automation be able to tell that distribution is good enough by your standards the minute the tests are over!
If it’s good enough — don’t wait! Let the machine roll it out without delay.

Stay Stable

Let a machine rehearse your deploys. Every time. Even if it had just did it with the previous version 5 minutes ago. It won’t complain.
Let machines recover from failed delivery automatically.
Let them inform you whenever something is misaligned, or the stakeholders when delivery is complete.

Now, with machines having your back — you gain additional benefits:

Respond Faster

Being able to ship fast also means being able to respond quickly. To opportunities. To market changes. To outages.
This makes errors that get to production a lot less frightening — because the mitigation can also ship fast.

So what’s CI/CD?

This is a case where the entire flow from code submission to deployment runs automatically and continuously. i.e. — with maximal automation, reducing human involvement to the bare minimum: A human needs only to decide on the “go”, the rest will roll down by automation, and escalate to humans only when necessary.

So now I need more developers.

Maybe not to get started. Go score. Get rolling. But remember that everything you accomplish during your kickstart without CI/CD coverage is a form of debt you’ll have to catch up before you can move quickly enough to stay relevant in the industry.

The market will change, your product will change, the workload you’ll face will need to scale. Automating the operational effort earlier will help you focus on the real business challenges as they appear.

The Elephant in the Room

The important thing to admit is the challenge: to get automation to produce a reliable indicator of success. Especially that the distribution they are presented with qualifies for production or not, and that deployment is successful or requires mitigation. Same for how to mitigate every anticipated would-be issue in the process, or how to handle unexpected problems.

The answer and the make or break of it is:

Whenever an issue is met — even if it was handled by humans quickly — it is not over until it is also programmed into the CI/CD.
Thus, the CI/CD is an accumulation of knowledge in form of automatic responses to more and more cases: more checks, more failovers, more efficient resource spending, more safety.

Build Confidence

Despite the industry already has good know-how on all of them — the culprit is fears on the management’s level that dictate a procedure that requires the process to halt for a human inspection. All of them can be reasoned out.

I would like humans to see the UI

There are tools today that see the UI automatically and detect failures better, tirelessly, over and over. (e.g. applitools, percy, and more)

I would like humans to explore it

There are tools today that can perform a routine tour and cover methodically more than a human can explore (e.g. testim.io, pefecto.io, celenium / appium, cypress, nightwatch, and more).

I would like to see the report before I approve the go

Would you like to see it in 23:00 as well? And if you see it at 02:00 would you let the hour affect you to compromise the quality bar? I daresay you would like to discuss the quality bar on the clear head, and once decided — you want to stick to it. Automated checks are better at that. What you do want is the ability to override, and that can be built into the process to let you handle these extreme outliers that require overriding.

The industry has gone a long way. It has become a standard to strip the human factor to a bare minimum.

Humans are kept on the higher conceptual level of “am I done so I can submit my code?”, where the automated flows are fully capable to decide “does a submitted code qualify for shipping”, and roll on downhill from there, pulling the breaks by itself when necessary.

Conclusion

A full CI/CD solution is when the entire flow from code submission to production runs continuously, fully automated.

You can be continuous and not automated — i.e — people do most of the work.
You can be automated and not continuous — i.e — it takes few human decisions to progress the flow.

But hitting both is the cornerstone of modern software conduct: detecting problems early, responding to problems early, eliminating gaps between production and source, and reducing the friction of code contributions.

As the bottom line is, you can say you have CI/CD —

when the team does not need to do anything in order to get a feature to production beside submitting the new code.