May 2025 – Secure Design

The term microservice architecture is often used to describe modern software development and state-of-the-art software architecture. But what is it? The answer is simple: There is no general definition of the term. An application build with “microservice” in mind has some properties that can be helpful for your codebase. The services part does not necessarily mean code that needs to answer to networked API calls. You can choose to couple your code loosely and implement the parts with independent operation in mind. Think of the classic module approach. Modules can use the network to communicate, but you can use any technique that relays messages between your modules. This is a reminder of the Unix philosophy of code. The first two rules are quotes from the Wikipedia article (and the Bell System Technical Journal from 1978):

Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
Expect the output of every program to become the input to another, as yet unknown, program.

This is essentially the minimal version of thinking in microservices. The rest, such as scalability, flexibility, network APIs, containers or cloud services, are consequences of these two rules. Do not fall into the trap of only linking microservices to web applications. Even single-binary applications can follow this approach. To give you an example: I had the opportunity to inspect the code of a voice-over-IP telephony server system. In essence, the application functioned as an IP-only telephone switchboard. The customer was looking to improve the memory management, because the application usually runs for long periods of time. The code itself was divided into modules which could be enabled or disabled at will. There were even dummy functions to set missing modules to a minimal level of functionality so you could test the whole application. The calling convention on the architecture level regulated what every module needs to receive and to return. Basically, this is a microservice architecture in a big binary.

You can walk through many lists of advantages and disadvantages, but the key to thinking “microservice” is not to create code that you are afraid to change. Sometimes prototypes work well, so there is some reluctance not to break anything by adding changes. Don’t hesitate! Replace, break, and repair code. This is the way.

I did some refactoring of old code that started out as a Bash and Perl script. The idea was to fill a filesystem with random files of random content. The use case was to test filesystems and storage media. My code started out as C++ with some C to interface with the Linux® kernel’s syscalls. Essentially, it was an exercise into the wonderful world of the virtual filesystem layer (VFS), I/O options, memory attributes, and getting lots of random data faster than any disks could write. The random generator in the code comprises the SSE-accelerated Mersenne Twister algorithm from the University of Hiroshima. Clearly, this is a sign for pre-C++11 code. Modern code can use the C++ <random> library, which also uses SIMD instructions heavily if compiled in native mode. The works fine and has been used to test I/O performance and storage media. The problem: The SIMD instructions are for the Intel®/AMD™ platform only. What about ARM and RISC-V processors? Alternative processors use less power and enable to run the code on a smaller platform.

The code itself is mostly portable. When compiling for Microsoft® Windows® or Apple OS X, you can safely remove the Linux® syscalls. Basically, it is just madvise(), posix_memalign() and posix_fallocate64() to give the VFS layer some hints. C++11 and later standards allow for replacing the SSE MT random generator with the standard library. The first stage of refactoring was a hack, because I just redefined missing functions with empty function declarations. Don’t do that in production code, because it produces source code that you just don’t need on a specific platform. The next step is to collect all the platform-dependent code in classes that handle the interface and the random data generation. Alternatively, the random file could be its own class and instance, but this would cause more memory allocations. The current uses a single memory buffer to prepare the file content before the data is written to the file. Memory buffer size can be increased increased to 2 GB which shows nice alternating processor / I/O cycles. The code does not use any parallel code, so one memory buffer is fine.

The code’s file I/O layer is very close to the Linux® kernel. open(), write(), and close() are called direct. madvise() is used to tell the kernel that the data will be written sequentially and that it will not be needed for reading soon. This helps the memory management. When combined with O_DIRECT, the code can run on desktops without filling up file and block buffers with write-only random data. This part is hard to refactor to C++’s stream library. I can recommend anyone to study the parameters of the syscalls I mentioned. Using the lower layers of the I/O subsystem can actually be useful in low-powered systems or when saving memory/performance. The uses of the code since 2008 have shown that the storage media and the I/O bus path to the device is the bottleneck. Both the SSE MT random generator and the C++11 <random> code can create more than enough data to saturate the I/O system.

The takeaway from writing cross-platform code is not surprising. Contain all the platform-dependent code in classes of functions. Do this early in your process, even in prototypes. Using empty functions to disable non-existent syscalls might be optimised away by the compiler, but it makes the source code messy. You will need some #ifdef / #ifndef directives, though. Best to collect them in your include files. If you are interested in the code, let me know. You can also use it for fuzzing, because there is a switch to create file and directory names from fully random bytes.

Secure Software doesn't develop itself.

Month: May 2025

Microservice Architecture revisited

Platform-dependent Code, SIMD instructions, and I/O Layers