Did you get the message? (part 1)

Published in

Israeli Tech Radar

11 min readJan 31, 2023

Quite early, software got so big that it had to be organized in parts. While each part does its thing, how do they communicate? How do different parts of the system get a message from each other?

One person delivering an envelope to another — Getting the message (img from here)

This question proliferates across technology, time and scale.

Let’s take a drive through history and see how much progress has been made, and yet, how very little has changed.

Disclaimers: I will 1) simplify. 2) trade accuracy for clarity, keeping terms relatable. My goal is to nourish ingenuity, not to preach history. 3) NOT cover everything in one article — expect more.

The simplest form of passing messages between program units commonly recognized today is passing arguments. Let’s explore.

Look back to the source

The early programs of the 40s and 50s may have been composed of different units, but when assembled together they were still a single program which had the entire hardware for itself: All the memory, all the IO devices, and the CPU.

The Z3 computer from the 40s. — Yes, it is cased in wood. It wasn’t fast enough to make heat dissipation a problem... (img from here)

A CPU core has a set of registers — the closest physical thing to what programmers have in mind when they think of variables: They are named, sized, and numbered.
For a CPU everything is represented in numbers, so that’s all that these registers can hold. Each CPU computation can read from one or more registers, perform some arithmetic, and output the result to one or more registers.

CPUs also have access to a data-bus and an address-bus — the CPU’s gate to the world it lives in — and can copy numbers between registers and these buses, and this is how it communicates with the rest of the computer.

An abacus — Abacus — the grand-grand-grandpa of registers (img from here)

Program units at this level do not have the comfort of parameters and methods. Their communication is based on an agreement to place the arguments in a specific set of registers before handing control over to the called unit, and expect it to leave its result in some registers set to be collected the same way.
Sometimes, when the data is too big for the registers, these registers would hold the address of where in memory the data itself resides, but that’s about it — as far as machine-code goes.

In simple words — CPU used (and STILL ARE using) the most basic form of shared memory - Globals.

Buzz Lightyear tells Andie — “Globals, Globals everywhere” — It’s only frightening if you look inside (img from here)

When a programmer makes a call for a function or a method in a modern language, this translates to machine code that organizes the arguments in these said registers, and wires the result registers to the returned value.

Oh, there’s also a lot of fuss about scope — which makes sure that units do not step on each other’s toes with all this global access party, but that’s not related to our discussion here.
What IS relevant is:

The reason it works so well is because all these repetitive low-level mechanics are organized consistently by a compiler in a years-old battle-tested way.

Cutting into stages

When workflows became too much for one program — it was broken down into separate programs. Each program owned a step of the flow, where step-programs could read input from storage, and save their results into storage to be used as the input for the next program in the flow. This was the first form of batch processing.

A conveyer belt production line, where in every stage a different arm adds something to the pruduct moving on the belt — Every program is like an arm on a conveyer production line (img from here)

However, this is very cumbersome, error-prone, and involved a lot of temporary files that should be managed, and the industry had to evolve from it quite soon.

Also note that this form of communication is asynchronous. While a program of one step prepared a message and left it for the next step, it got no guarantee (and actually did not care) if and when the message would be picked up.

There had to be a higher-order orchestration of the entire flow that involves all these programs — a controller, if you will — telling each program when to run, passing input and output between them, watching them all as they go until their end, and handling errors.
(ℹ)️ Just mind that in early days — this was done by human operators.

A picture of a computer from the 50s, with two operators, which are young weman. — Also mind that the operators are young ladies! Where did they go? (Image from here)

The Process and Multitasking

In the late 50s computers were getting commercial, however they were still huge expansive machines owned by organizations and governments, so their economic viability mandated that they should be able to handle multiple programs in the same time — i.e. multitasking.

So far, the program launched (possibly with some arguments), performed some calculations, gave output and died away.

But now, few running programs should share the same hardware, and most importantly — the same CPU.
The result is time-sharing: each of the running programs gets a CPU core in a rotation which the operating system manages for it’s Multitasking.

A picture of a person multitasking as if he has 6 arms. — Maybe many arms, but still one brain (img from here)

Multitasking — a.k.a. parallelism — made the computer perform a few tasks in parallel. This gives the illusion that everything happens concurrently, however, at any given point in time, a CPU core is handling a single task, continuously rotating between them.

When such a rotation happens, the OS saves values from all the registers the paused process is using, and loads into them values saved from when the next process was last paused — what is called context switching. The more registers processes were using — the more costly this operation is.

The first models were cooperative — means — programs would use CPU time responsibly: When programs were waiting for data, they would cede their turn. Since IO devices were always slower than CPUs — this happens a lot.
Programs also made sure to divide their work thinly and make one step at a time, so they can give others a go at the CPU between chunks, expecting other processes to do the same.

The obvious weakness of this model, is when a process does not release the CPU for whatever reason — the entire system could become unstable.

The Signal

A had of a soccer judge waving a yellow ticket — Not just in soccer (img from here)

To cope with that problem signals were introduced in the 70s, which is a solution that evolved over time and took time to mature. By the 80s the preemptive time-sharing model appeared, that could make sure the CPU time allocation is more efficient and fair.

This required processes to respond to signals that would take precedence over their usual work. Signals such as — pause, resume, abort, terminate, break, etc. together with fault signals, like illegal access or division by zero. These operational signals were designed to mean the same thing to all processes and helped the OS and the process play together.

Mind the process is not the program — a part of it comes from the operating system with mechanisms like the one that accepts signals. While processes always get the message, the program inside is sometime informed about the signal (e.g. SIGINT) and on some — it is not (e.g. SIGKILL).

IPC — Inter Process messaging (yet local)

A big disadvantage of a flow wrapped around separate programs is the cost of processes. A process wraps the program with additional mechanisms, takes time to spawn, corresponds with the OS, takes time to clean after.

If the processes can be kept alive and ready, the time it takes to spawn a new process and waiting for it to be ready will be saved. The cost is the memory it occupies and the context-switching of keeping it in the timesharing rotation even when it’s idle. When we need a fast result, the bigger and slower to initiate a program is — the more sense it makes.

For this model to work, we need a way to tell the process there’s a job it can do.

A worker looks on a well-organized office pin-board — Checking for new messages (img from here, …what is it more? amazing or creepy)

Polling a Shared FS

Polling, a.k.a. pull — is where the receiver checks recurringly if it got new messages.

If programs agreed on a directory on a shared file-system to act as a pin-board, they could scan this place every now and then, and handle any new messages. If each process gets its own directory, this directory can act as an inbox — and when programs were coded to take care of messages in the order they arrive — the simplest form of computerized queue was born. 😏

However, this requires the program to manage these inboxes — not only in the logical sense, but down to what the FS provides. E.g. the logic of remember what messages they saw, could be implemented as marking files as seen, deleting them, or moving them to another directory.

British people display amazing queue etiquette in a concert from 2017 — British people display amazing queue etiquette without being told (img from here)

Mind that in fact, this is still a form of globals.
Moving from program units to processes, we also moved from shared registers to shared file-system.

This means that coordination around files became crucial, so OS started to provide features to manage that.

Also note that this form of communication is as asynchronous as in batch processing.

Semaphores

A sailor communicating with semaphore flags — (img from here)

Given a directory that acts as a pegboard, and given many processes that need to access this directory — we could end up with a process trying to read a message while another is still writing it, or worse — two processes trying to write to the same place, overwriting each other’s work.

Semaphores overcome that. This mechanism allows processes tell the OS to wake them up only when the resource they are waiting for is available for them exclusively — when no other process is using it, managing the queue between them.

This also helps divide processes into those that are ready to utilize the CPU and those that are waiting for resources and can be skipped from the time-sharing loop.

Using Signals between processes

A ship waving celebratory communication flags — Did you know ships has a flags-variation of mors code?(img from here)

The operational signals mechanism was expanded to include few signals applications could use for their own logic (e.g. SIGUSR1, SIGUSR2).

On most OS, all signals under the hood are numeric codes which programs could map to specific intents. Means, a signal could be used to call a specific part of code, but there are no parameters to pass with that.

With some ingenuity, some teams found that prior to sending the signal they can prepare in a predefined directory a message to the target process. When the target process gets that custom signal — the program inside should read the actual message from the FS — making the program able to effectively get any kind of message with almost any kind of payload using the same signal mechanism.

Hooking FS events

Captain hook, from Peter-Pan — Get the hook? (img from here)

Events, a.k.a. push — is a more efficient later mechanism (2001) that delegates the work of managing the watch to the OS, and keeping in the application only the logic of what should happen should a change in a directory or a file is observed.

This reverses the asynchronous nature: On one hand — processes can now respond to events as they come — if they’re up and listening.
Thus, on the other hand, an application that relies on events still needs to do much of what an application that relies on polling do: If a process was down for a while, it has to check as soon as it wakes up what’s new on the FS and respond just like a polling application would.

Sockets

socket of an amplifier sound system — Plug in to get the signal (img from here)

Eventually, OS provided sockets as a way to pass a message, saving the whole business with the files.

It already borders with distributed systems — which is the topic of a future post— because if we can pass messages between independent processes on the same machine, it’s one leap away from doing that between two processes across the network.

So this is just a mention in this part.
Next thing to covered here is very crucial to local work:

The design for Pipes and the STDIO

This evolution is about reuse: It aspired to make each program specialized in it’s own specific thing, and let end-users use them as generic building blocks to express concrete flows.

The result is pipes.

a mesh of shiny copper pipes — if STDIO was physical (img from here)

The design of pipes meant that on top of their command-line arguments and the signals mechanism, programs get 3 basic communication channels:

one for input
one for output
one for notifications

When reusable programs are composed into pipes, the output of each link in the pipe serves as the input of it’s next, saving the need to manage all these temp files that are passed between them. These are the standard input and output, called stdin and stdout.

And the 3rd?
When a program needs to make a notification that must not be a part of the output — it emits it to the 3rd channel. Since most notifications are about errors — this channel was poorly named stderr, even when it’s used for warnings, notifications or anything the process has to address to operators, and not to the next program in the pipe.

More benefits of pipes — they can exist mostly in memory, saving a lot of IO, and the links can run simultaneously, passing data-chunks between them as soon as they are ready.

Pipe Power

Given programs that are generic enough and support sufficiently articulated parameters — one could build very expressive flows without getting into programming.

This shell snippet prints a markdown list of the top 10 tags found on articles.

this is a snippet of a shell script describing a pipe with 6 links using the binaries of jq, sed, head, uniq and sort.

This example assumes that the directory articles contains JSON documents in files, each document may have a tags property as a list of strings. It uses jq to extract them, uniq to count them, sort to sort them in reversed order, head to chop the top 10 results, and sed to format it as markdown bullets with a heading.

If you’re a programmer, how much code would you need to express that using your language-of-choice?

Intermediate Conclusion

a diagram that shows all the ways to pass a message to application code, divided roughly to three: the process, the FS and the network. — If there are more ways — let me know in a comment! (I drew that, here).

We started a journey to see how different software units could pass messages between them, focusing this time on communication on the local machine, leaving distributed systems for future posts.

We saw that locally it’s all Globals under the hood — shared registers, shared memory and shared file-system. Despite that — we do not treat them as Globals, because they are accessed mostly through battle-tested mechanisms with safeguards built-in. However, sometimes software has to explicitly use special mechanisms to safeguard from colliding with other processes.
Last — we saw the power of the pipes design.

On the next part we’ll approach networking.