Secure Design – Opinion

Secure Software doesn't develop itself.

The picture shows the top layer of the Linux kernel's API subsystems. Source: https://www.linux.org/attachments/kernel-jpeg.6497/

Microservice Architecture revisited

The term microservice architecture is often used to describe modern software development and state-of-the-art software architecture. But what is it? The answer is simple: There is no general definition of the term. An application build with “microservice” in mind has some properties that can be helpful for your codebase. The services part does not necessarily mean code that needs to answer to networked API calls. You can choose to couple your code loosely and implement the parts with independent operation in mind. Think of the classic module approach. Modules can use the network to communicate, but you can use any technique that relays messages between your modules. This is a reminder of the Unix philosophy of code. The first two rules are quotes from the Wikipedia article (and the Bell System Technical Journal from 1978):

  1. Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
  2. Expect the output of every program to become the input to another, as yet unknown, program.

This is essentially the minimal version of thinking in microservices. The rest, such as scalability, flexibility, network APIs, containers or cloud services, are consequences of these two rules. Do not fall into the trap of only linking microservices to web applications. Even single-binary applications can follow this approach. To give you an example: I had the opportunity to inspect the code of a voice-over-IP telephony server system. In essence, the application functioned as an IP-only telephone switchboard. The customer was looking to improve the memory management, because the application usually runs for long periods of time. The code itself was divided into modules which could be enabled or disabled at will. There were even dummy functions to set missing modules to a minimal level of functionality so you could test the whole application. The calling convention on the architecture level regulated what every module needs to receive and to return. Basically, this is a microservice architecture in a big binary.

You can walk through many lists of advantages and disadvantages, but the key to thinking “microservice” is not to create code that you are afraid to change. Sometimes prototypes work well, so there is some reluctance not to break anything by adding changes. Don’t hesitate! Replace, break, and repair code. This is the way.

Platform-dependent Code, SIMD instructions, and I/O Layers

I did some refactoring of old code that started out as a Bash and Perl script. The idea was to fill a filesystem with random files of random content. The use case was to test filesystems and storage media. My code started out as C++ with some C to interface with the Linux® kernel’s syscalls. Essentially, it was an exercise into the wonderful world of the virtual filesystem layer (VFS), I/O options, memory attributes, and getting lots of random data faster than any disks could write. The random generator in the code comprises the SSE-accelerated Mersenne Twister algorithm from the University of Hiroshima. Clearly, this is a sign for pre-C++11 code. Modern code can use the C++ <random> library, which also uses SIMD instructions heavily if compiled in native mode. The works fine and has been used to test I/O performance and storage media. The problem: The SIMD instructions are for the Intel®/AMD™ platform only. What about ARM and RISC-V processors? Alternative processors use less power and enable to run the code on a smaller platform.

The code itself is mostly portable. When compiling for Microsoft® Windows® or Apple OS X, you can safely remove the Linux® syscalls. Basically, it is just madvise(), posix_memalign() and posix_fallocate64() to give the VFS layer some hints. C++11 and later standards allow for replacing the SSE MT random generator with the standard library. The first stage of refactoring was a hack, because I just redefined missing functions with empty function declarations. Don’t do that in production code, because it produces source code that you just don’t need on a specific platform. The next step is to collect all the platform-dependent code in classes that handle the interface and the random data generation. Alternatively, the random file could be its own class and instance, but this would cause more memory allocations. The current uses a single memory buffer to prepare the file content before the data is written to the file. Memory buffer size can be increased increased to 2 GB which shows nice alternating processor / I/O cycles. The code does not use any parallel code, so one memory buffer is fine.

The code’s file I/O layer is very close to the Linux® kernel. open(), write(), and close() are called direct. madvise() is used to tell the kernel that the data will be written sequentially and that it will not be needed for reading soon. This helps the memory management. When combined with O_DIRECT, the code can run on desktops without filling up file and block buffers with write-only random data. This part is hard to refactor to C++’s stream library. I can recommend anyone to study the parameters of the syscalls I mentioned. Using the lower layers of the I/O subsystem can actually be useful in low-powered systems or when saving memory/performance. The uses of the code since 2008 have shown that the storage media and the I/O bus path to the device is the bottleneck. Both the SSE MT random generator and the C++11 <random> code can create more than enough data to saturate the I/O system.

The takeaway from writing cross-platform code is not surprising. Contain all the platform-dependent code in classes of functions. Do this early in your process, even in prototypes. Using empty functions to disable non-existent syscalls might be optimised away by the compiler, but it makes the source code messy. You will need some / directives, though. Best to collect them in your include files. If you are interested in the code, let me know. You can also use it for fuzzing, because there is a switch to create file and directory names from fully random bytes.

Getting random Seeds for Pseudo-Random-Number-Generators

Pseudo-random-number-generators (PRNGs) require a good seed if the sequence of random numbers should be as random as possible. The C++ Standard Template Library (STL) provides std::random_device for using a local entropy source. You cannot easily see what the compiler uses on the target platform. Potential source of entropy are processor commands (RDRAND), keyboard/mouse timings, network events, jitter entropy, or other implementation details. Most operating systems maintain a pool of entropy and feed it into a stream cipher with a random key. The STL random device may return 0 or data not generated by an entropy source. You should check for entropy sources by calling the entropy() member function of the random device instance. If you get a 0 value, then you need to create your own seed value(s). Where to get them?

You may be tempted to use the C++11 std::seed_seq to create a seed sequence, but the sequence generator first needs integer input values. Furthermore, there are some properties of the seed sequence container you should check out before relying on it. The usual approach is to use counters, timestamps of clocks, memory addresses of storage areas (if the operating system’s address randomisation is enabled), or some other local states that change and are hard to predict. If you do that, then you should use multiple sources and mix them further by feeding it to a second PRNG. Splitmix64 or the Xoroshiro128+ family of algorithms is useful because of the low complexity. The Linux® kernel can always help you out with its /dev/urandom pool of entropy. The Microsoft® Windows platform offers the CryptGenRandom() function.

Most of the time the compilers can find a suitable source. Make sure that your compiler does not have bugs regarding the implementation (see MingGW-w64 bug 338 for example). I did some check and found a recent version of GCC on Microsoft® Windows to provide a random device with a constant output. Calling the entropy() member function and having a fallback code for generating the seed is not expensive and does not add much code. Here is an example (needs further testing, of course):

uint32_t get_seed() {
    uint32_t s = static_cast( time(0) );

#if defined(__unix__) || defined(__APPLE__)
    uint32_t a;
    uint32_t b;
    uint32_t c;
    uint32_t d;
    ifstream urandom( "/dev/urandom", ios::in | ios::binary );
    do {
        urandom.read( reinterpret_cast(&a), sizeof(a) );
        urandom.read( reinterpret_cast(&b), sizeof(b) );
        urandom.read( reinterpret_cast(&c), sizeof(c) );
        urandom.read( reinterpret_cast(&d), sizeof(d) );
    } while ( (a*b*c*d) == 0 );
    urandom.close();
    s = a * b ^ c * d;
#endif

#if defined(_WIN32) || defined(_WIN64)
    // Here could be a way to extract / read a random value from
    // the Microsoft® Windows run-time environment.
    //
    // See for example:
    // https://learn.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptgenrandom
    // 
    HCRYPTPROV hCryptProv;
    union {
        BYTE pbData[16];
        uint32_t a,b,c,d;
    } u;

    if ( CryptAcquireContext( &hCryptProv, NULL, (LPCWSTR)L"Microsoft Base Cryptographic Provider v1.0", PROV_RSA_FULL, CRYPT_VERIFYCONTEXT ) ) {
        do {
            if ( CryptGenRandom( hCryptProv, 16, u.pbData) ) {
                s = u.a * u.b * u.c * u.d;
            }
        } while ( s == 0 );
        CryptReleaseContext( hCryptProv, 0 );
    }
#endif

    return( s );
}

When to use Math Library Functions

Most programming languages have libraries with mathematical functions. Usually, these functions provide more complex calculations, such as trigonometric functions, numerical methods, or special mathematical functions. The libraries also provide simpler functions. The power function is one example. Sometimes developers can directly implement the calculations of the functions in the application code. The best practice is not to do that. There are some exceptions. You can calculate powers by performing multiplications. The pow() function calculates the result with arbitrary parameters, including fractional powers. The difference can be vastly different when measuring the time needed. Consider the following code snippet.


for( uint32_t j=0; j<hit_max; j++ ) {
    hit = 0;
    x = 0;
    y = 0;
    for( uint32_t i=0; i<rnd_max; i++ ) {
        x = static_cast( dist(gen64) ) / static_cast( numeric_limits::max() );
        y = static_cast( dist(gen64) ) / static_cast( numeric_limits::max() );
        if ( y < std::sqrt( 1 - x*x) ) {
            hit++;
        }
    }
}

The if() condition performs a calculation. To square a number, you just need to multiply it by itself. If you run the code with hit_max=rnd_max=5000, then two loops take about 5.4 seconds. The code executes two loops for comparison. Replacing the multiplication with pow(x,2) leads to an execution time of 8.6 seconds. That’s a lot, especially for loops running more often. So when calling a function from the math library, you should first consider if direct numerical operations can replace the call to the function in your code. The compiler cannot know what you want to calculate, so the optimiser cannot replace the function call. Deciding this with the pow() function and integer power values is easy. When faced with more complex calculations, it is not so easy. You can still try dividing the calculation into parts and check if those parts are directly computable. For numerical simulations or approximations, this is an effort well worth spent. Focus on the things you do in loops. You should also get to know the functions of your math library. The templates in C++ enable the compiler to select different data types. The size of the chosen data types also affects the performance, because floating-point values have different precisions and sizes. Some calculation might even be pre-computed with integers alone and then converted to floating-point data types. Inspect these sections in your code. Use integers with fitting sizes wherever you can. Use floating-point data types only if necessary.

Researching Code Examples for Secure Coding

The image shows shredded paper strips from a shredded document. Source: http://securology.blogspot.com/2012/09/destroying-paper-documents.htmlLearning by doing means you spent a lot of time with reading documentation and exploring example code that illustrates the features of your favourite development toolchain. Getting a well-written example code has become substantially more difficult in the past years. Once upon a time, Google offered a search engine just for source code. It was active between 2006 and 2012. Now you are stuck with search engines and their deteriorating quality. The amount of AI-generated content, copy-&-paste from documentation, and hyperlinks to gigantic forum discussions filled with errors and even more copy-&-paste snippets destroys the classical Internet research. You have to select your sources carefully. So what is a good strategy here? I have compiled a short checklist that enables you to avoid wasting time.

  • Start with the tutorials and documentation of your development tools/languages. Some have sections with examples and a well-written explanation. It depends on the developers, because writing didactically valuable explanations takes some effort.
  • Actively look for content from schools, colleges, or universities. Sometimes courses are online and contain the information you need. Try to prefer this source category.
  • When using search engines, keep the following in mind:
    • Skip results pushed by Search Engine Optimization (SEO); SEO is basically a way to push results to the top by adding noise and following the search engine company’s policy of the day. You can recognise this content by summary texts that don’t tell you the facts in briefs, the obnoxious Top N phrase in the title, and even more variations of copy-&-paste text fragments.
    • Do not „AI-enhance“ the results! While Large Language Model (LLM) algorithms may have used actual sources relevant to your research during training, their results are merely a statistical remix subtly altered by hallucinations. Go directly to software/coding forums and look for relevant threads. LLM-generated code will contain more bugs or bugs more frequently.
    • Do not use content sponsored by companies pushing their development products. Research is all about good examples, good explanations, and facts, not marketing.
    • Mind the date of the results. AI spammers and companies following the AI hype have changed dates of published articles to sell them as new or updated. Don’t fall for that.
  • Inspect secure coding standards and policy documents. Some contain useful sections with examples. You can also verify the search results with this by recognising outdated advice (deprecated algorithms, old standards, etc.).
  • Inspect version control repositories and look for example code. A lot of projects have samples and test code that is part of the release.
  • Write your own test code and explore! Add the created test code to your personal/project toolbox. You can later turn this code into unit tests or use it to check if major version changes broke something.

Unfortunately, these hints won’t change the degrading quality of the current search engines. It will help you filter out the noise.

Filtering Unicode Strings in C++

The image shows a screenshot of the "iconv -l" command. It shows all character encodings that the iconv tool can convert.Dealing with text is a major task for code. Writing text means to string characters in a row. Characters are the symbols. The encoding determines how these characters are represented in memory. There are single-byte and multi-byte encodings. The Unicode family aims to represent all characters and symbols of all writing systems. If you specify Unicode, you still need to select a specific encoding. Unicode can be expressed in UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UTF-7-IMAP, UTF-7, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF-32LE. The numbers indicate the bytes and bits. The LE and BE indicate the endianness of the encoding. So if you see a software specification saying „let’s use Unicode“, then this is not a specification. Universal Coded Character Set (UCS) is an early representation of Unicode, but it is still updated by the Unicode group.

C++ has multiple string classes. The string container follows the C behaviour and has no encoding per se. You can store byte sequences in a string. You have to take care of the encoding. Wide strings can be stored in the wstring container. Wide strings can accommodate multi-byte characters as used in UTF-16 or UTF-32. The disadvantage is that this differs between platforms (just as the int data type). C++11 and C++20 introduced the u8string, u16string, and u32string containers to address this. You still need to track the encoding of the data. A good choice is to stick with the standard string container and handle the encoding issues yourself. However, the C++ standard library lacks some functionality that is frequently needed. The following libraries can help you out:

  • simdutf for Unicode validation and transformation; the library has SIMD support
  • pcrecpp for regular expressions with Unicode
  • UTF8-CPP for Unicode string operations with UTF-8 and conversions to UTF-16 / UTF-32

The native string encoding on Microsoft© Windows® is UTF-16LE. GNU/Linux® systems usually use UTF-8 as does the World Wide Web. Web servers can also serve UTF-16 content. Web standards do not allow UTF-32 for text content.

You must validate all strings entering your code. Both simdutf and UTF8-CPP have validation functions. You can store the text in the standard string container. Using Unicode adds a lot of extra characters and code that you need to track. For example, you get over two whitespaces in strings. Unicode has 25 characters with the whitespace property. Filtering is easiest with regular expressions. There are some caveats. The extended ASCII and ISO-8859 non-breaking space has the code 0xa0. Unicode has the code 0xc2 0xa0. Filtering may only remove the 0xa0, and this leaves you with an invalid code point 0xc2. The pcrecpp library will do this if you remove all Unicode whitespaces. It’s helpful to explore how Unicode encodes characters. Focus on the additional controls and modification characters, because they can also reverse the writing order (see Unicode bidirectional formatting characters for more information). The best way to avoid trouble is to use allow lists and remove everything else, if possible. Some special cases will require looking for byte sequences that never occur and markers for the two-, three-, and four-byte sequences (in UTF-8, other encoding also have markers for extended character sequences and modifiers).

Transformations will also be a frequent issue. The in-memory representation of the C++ string classes is independent of the representation on storage subsystems or the network. Make sure to handle this and all localization aspects. The language settings require extra conversions.

Parallel Operations on numerical Values

Everyone knows the vector container of C++’s Standard Template Library (STL). It is useful, versatile, and store the data of all elements in a contiguous memory location. There is another container named std::valarray for array data that is not widely known. It is part of the STL for a long time (i.e. way before C++11). The use case is to perform operations on all array elements in parallel. You can even multiply two valarray containers element by element without using loops or other special code. While it has no iterators, you can easily create a valarray container from a vector, perform calculations in parallel, and push the results into a vector again. The C++ reference has example code to show how to do this. Creation from a vector requires access to the memory location of the vector’s data.

std::vector<double> values;
// Put some values into the vector here …
// Convert vector to valarray
std::valarray<double> val_values( values.data(), values.size() );

Now you can perform operations on all elements at once. Calculating cos() of all elements simply looks like this:

auto val_result = cos(val_values);

If you take the time and compare it to a loop through a vector where the function is called for every element, then you notice valarray is much faster. It depends on your compiler. GCC and Clang are quite fast. The apply() member function allows you to run arbitrary functions on every element. If you only need a subset of the elements, then you can create slices with the required values.

Static Tests and Code Coverage

The picture shows a warning sign indicating that a laser beam is operating in the area. Source: https://commons.wikimedia.org/wiki/File:Laser-symbol-text.svgTesting software and measuring the code coverage is a critical ritual for most software development teams. The more code lines you cover, the better the results. Right? Well, yes, and no. Testing is fine, but you should not get excited about maximising the code coverage. Measuring code coverage can turn into a game and a quest for the highest score. Applying statistics to computer science can show you how many code paths your tests need to cover. Imagine that you have a piece of code containing 32 if()/else() statements. Testing all branches means you will have to run through 4,294,967,296 different combinations. Now add some loops, function calls, and additional if() statements (because 32 comparisons are quite low for a sufficiently big code base). This will increase the paths considerably. Multiply the number by the time needed to complete a test run. This shows that tests are limited by physics and mathematics.

Static analysis is a standard tool which helps you detect bugs and problems in your code. Remember that all testing tries to determine the behaviour of your application. Mathematics has more bad news for you. Rice’s Theorem states that all non-trivial semantic properties of a specific code are undecidable. An undecidable problem, which is a decision problem, cannot be solved by any algorithm implementation. Rice published the theorem with a proof in 1951, and it relates to the halting problem. It implies that you cannot decide if an application is correct. You also cannot decide if the code executes without errors. The theorem sounds odd, because clearly you can run code and see if it shows any errors given a specific set of input data. This is a special case. Rice’s theorem is a generalisation and applies to all possible input data. So your successful tests basically work with special cases that do not cause harm. Security testing checks for dangerous behaviour or signs of weaknesses. Increasing the input data variations can cover more cases, but Rice’s theorem still holds, no matter how much effort you put into your testing pipeline.

Let’s get back to the code coverage metric. Of course, you should test all of your code. The major goal for your code is to handle errors correctly, fail safely (i.e. without creating damage), and keep control of the code execution. You can achive these goals with any code coverage per test above 0%. Don’t fall prey to gamification!

Mixing Secure Coding with Programming Lessons

The picture shows a fantasy battle where a witch attacks a wizard with spells. Source: https://wiki.alexissmolensk.com/index.php/File:Spellcasting.jpgLearning about programming first and then learning secure coding afterwards is a mistake. Even if you are new to a programming language or its concepts, you need to know what can go wrong. You need to know how to handle errors. You need to do some basic checks of data received, no matter what your toolchain looks like. This is part of the learning process. So instead of learning how to use code constructs or language features twice, take the shortcut and address security and understanding of the concepts at once. An example method of classes and their behaviour. If you think in instances, then you will have to deal with the occasional exception. No one would learn the methods first, ignore all error conditions, and then get back to learn about errors.

Another example are variables with numerical values. Numbers are notorious. Even the integer data types stay in the Top 25 CWE list since 2019. Integer overflow or underflow simply happens with the standard arithmetic operators. There is no fancy bug involved, just basic counting. You have to implement range checks. There is no way around this. Even Rust requires you to do extra bound checks by using the checked_add() methods. Secure coding always means more code, not less. This starts with basic data types and operators. You can add these logical pitfalls to exercises and examples. By using this approach, you can convey new techniques and how a mind in the security mindset improves the code. There is also the possibility of switching between “normal” exercises and security lessons with a focus on how things go wrong. It’s not helpful to pretend that code won’t run into bugs or security weaknesses. Put the examples of failure and how to deal with it right into your course from the start.

If you don’t know where to start, then consult the secure coding guidelines and top lists of well-known vulnerabilities. Here are some good pointers to get started:

The Ghost of Legacy Code and its Relation to Security

The picture shows a spade and the wall of a pit dug into the earth. The wall shows the different layers created by sedimentation over time. Source: http://www.thesubversivearchaeologist.com/2014/11/back-to-basics-stratigraphy-101.htmlThe words legacy and old carry a negative meaning when used with code or software development. Marketing has ingrained in us the belief that everything new is good and everything old should be replaced to ensure people spend money and time. Let me tell you that this is not the case, and that age is not always a suitable metric. Would you rather have your brain surgery from a surgeon with 20+ years of experience or a freshly graduated surgeon on his or her first day at the hospital?

So what is old code? In my dictionary, the label “not maintained anymore” is assigned to legacy and old code. This is where the mainstream definition fails. You can have legacy code which is still maintained. There is a sound reason for using code like this: stability and fewer errors introduced by creating code from scratch. Reimplementing code always means that you start from nothing. Computer science basic courses teach everyone to reuse code in order to avoid these situations. Basically, reusing code means that you allow code to age. Just don’t forget to maintain parts of your application that work and experience few changes. This is the sane version of old code. There is another one.

An old codebase can serve as a showstopper for changes. If you took some poor design decisions in the past, then parts of your code will resist fresh development and features. Prototypes often exhibit this behaviour (a prototype usually never sees the production phase unaltered). When you see this in your application, then it is time to think about refactoring. Refactoring has fewer restrictions if you can do this in your own code. Once components or your platform is part of the legacy code, then you are in for a major upgrade. Operating systems and run-time environments can push changes to your application by requiring a refactoring. Certifications can do the same. Certain environments only allow certified components. Your configuration becomes frozen once applications or run-time get the certification. All changes may require a re-certification. Voilà, here is your stasis, and your code ages.

Legacy code is not a burden per se. It all depends if the code is still subject to maintenance, patches, and security checks. Besides, older code usually has fewer bugs.

Page 1 of 4

Powered by WordPress & Theme by Anders Norén