Secure Software doesn't develop itself.

The picture shows the top layer of the Linux kernel's API subsystems. Source: https://www.linux.org/attachments/kernel-jpeg.6497/

Category: Testing

All things testing and software quality.

Static Tests and Code Coverage

The picture shows a warning sign indicating that a laser beam is operating in the area. Source: https://commons.wikimedia.org/wiki/File:Laser-symbol-text.svgTesting software and measuring the code coverage is a critical ritual for most software development teams. The more code lines you cover, the better the results. Right? Well, yes, and no. Testing is fine, but you should not get excited about maximising the code coverage. Measuring code coverage can turn into a game and a quest for the highest score. Applying statistics to computer science can show you how many code paths your tests need to cover. Imagine that you have a piece of code containing 32 if()/else() statements. Testing all branches means you will have to run through 4,294,967,296 different combinations. Now add some loops, function calls, and additional if() statements (because 32 comparisons are quite low for a sufficiently big code base). This will increase the paths considerably. Multiply the number by the time needed to complete a test run. This shows that tests are limited by physics and mathematics.

Static analysis is a standard tool which helps you detect bugs and problems in your code. Remember that all testing tries to determine the behaviour of your application. Mathematics has more bad news for you. Rice’s Theorem states that all non-trivial semantic properties of a specific code are undecidable. An undecidable problem, which is a decision problem, cannot be solved by any algorithm implementation. Rice published the theorem with a proof in 1951, and it relates to the halting problem. It implies that you cannot decide if an application is correct. You also cannot decide if the code executes without errors. The theorem sounds odd, because clearly you can run code and see if it shows any errors given a specific set of input data. This is a special case. Rice’s theorem is a generalisation and applies to all possible input data. So your successful tests basically work with special cases that do not cause harm. Security testing checks for dangerous behaviour or signs of weaknesses. Increasing the input data variations can cover more cases, but Rice’s theorem still holds, no matter how much effort you put into your testing pipeline.

Let’s get back to the code coverage metric. Of course, you should test all of your code. The major goal for your code is to handle errors correctly, fail safely (i.e. without creating damage), and keep control of the code execution. You can achive these goals with any code coverage per test above 0%. Don’t fall prey to gamification!

CrowdStrike and how not to write OS Drivers

The image shows a screenshot of a null pointer execption in Microsoft Windows. Source: Zach VorhiesYesterday the CrowdStrike update disable thousands of servers and clients all across the world. The affected systems crashed when booting. A first analysis by Zach Vorhies (careful, the link goes to the right-wing social media network X) has some not very surprising news about the cause of the problem. Apparently, the system driver from CrowdStrike hit a null pointer access violation. Of course, people immediately started bashing C++, but this is too shallow. There are different layers where code is executed. The user space is usually a safe ground where you can use standard techniques of catching errors. Even a crash might be safer for user space applications than continuing and doing more harm. Once your code runs as a system driver, then you are part of the operating system and have to observe a couple of constraints. OS code can’t just exit or crash (even exception without the catch{} code count as a crash). So having a null situation in mission-critical code is something which should never happen. This should have been caught in the testing phase. Furthermore, Modern C++ has no use for null pointers. You must use smart pointers, and by doing that, you don’t need to handle null pointers. There is nothing more to it.

You cannot ignore certain error conditions when running within the operating system. Memory allocation, I/O errors, and everything concerning memory operations is critical. There must be checks in place, and there is no excuse for omitting these checks.

Finding 0-Days with Large Language Models exclusive-or Fuzzing

The picture shows all the different Python environments installed on a system. The graphical overviiew is very confusing. Source: https://xkcd.com/1987/If all you have is a Large Language Model (LLM), then you will apply it to all of your problems. People are now trying to find 0-days with the might of LLMs. While there is no surprise that this works, there is a better way of pushing your code to the limit. Just use random data! Someone coined the term fuzzing in 1988. People have been using defective punch cards as input for a while longer. With input filtering of data, you want to eliminate as much bias as possible. This is exactly why people create the input data using random data. Human testers think too much, too less, or are too constrained. (Pseudo-)Random number generators rarely have a bias. LLMs do. This means that the publication about finding 0-days by using LLMs should not be good news. Just like human Markov chains, LLMs only „look“ in a specific direction when creating input data. The model is the slave of vectors and the training data. The process might use the source code as an „inspiration“, but so does a compiler with a fuzzing engine. Understanding that LLMs do not possess any cognitive capabilities is the key point here. You cannot ask an LLM what it thinks of the code in combination with certain input data. You are basically using a fancy data generator that uses more energy and is too complex for the task at hand.

Comparing LLMs with fuzzing engines does not work well. Both approaches serve an original purpose. Always remember that the input data in security tests should push your filters to the limit and create a situation that you did not expect. Randomness will do this much more efficiently than a more complex algorithm. If you are fond of complexity or have too much powerful hardware at your hands, there are other things you can do with this.

URL Validation, Unit Tests and the Lost Constructor

I have some code that requests URLs, looks for session identifiers or tokens, extracts them, and calculates some indicators of randomness. The tool works, but I decided to add some unit tests in order to play with the Catch2 framework. Unit tests requires some easy to check conditions, so validating HTTP/HTTPS URLs sounds like a good idea to get started. The code uses the curl library for requests, so checking URLs can be done by libcurl or before feeding the URL string to it. Therefore I added some regular expressions. RFC 3986 has a very good description of Uniform Resource Identifiers (URIs). The full regular expression is quite large and match too many variations of URI strings. You can inspect it on the regex101 web site. Shortening the regex to matching URLs beginning with “http” or “https” requires to define what you want to match. Should there be only domain names? Are IP addresses allowed? If so, what about IPv4 and IPv4? Experimenting with the filter variations took a bit of time. The problem was that no regex was matching the pattern. Even patterns that worked fine in other programming languages did not work in the unit test code. The error was hidden in a constructor.

Class definitions in C++ often have multiple variations of constructors. The web interface code can create a default instance where you set the target URL later by using setters. You can also create instances with parameters such as the target or the number of requests. The initialisation code sits in one member function which also initialises the libcurl data structures. So the constructors look like this:

http::http() {
}

http::http( unsigned int nreq ) {
init_data_structures();
set_max_requests( nreq );
return;
}

The function init_data_structures() sets a flag that tells the instance if the libcurl subsystem is working or not. The first constructor does not call the function, so the flag is always false. The missing function call is hard to miss. The reason why the line was missing is that the code had a default constructor at first. The other constructors were added later, and the default constructor function was never used, because the test code never creates instances without an URL. This bring me back to the unit tests. The Catch2 framework does not need a full program code with a main() function. You can directly create instances in your test code snippets and use them. That’s why the error got noticed. Unit tests are not security tests. The missing initialisation function call is most probably not a security weakness, because the code does not run with the web request subsystem flag set to false. It’s still a good way to catch omissions or logic errors. So please do lots of unit tests all of the time.

Continuous Integration is no excuse for a complex Build System

The picture shows a computer screen with keyboard in cartoon style. The screen shows a diagram of code flows with red squares as a marker for errors.Continuous Integration (CI) is a standard in software development. A lot of companies use it for their development process. It basically means using automation tools to test new code more frequently. Instead of continuous, you can also use the word automated, because CI can’t work manually. Modern build systems comprise scripts and descriptive configurations that invoke components of the toolchain in order to produce executable code. Applications build with different programming languages can invoke a lot of tools with individual configurations. The build system is also a part of the code development process. What does this mean for CI in terms of secure coding?

First, if you use CI methods in your development cycle, then make sure you understand the build system. When working with external consultants that audit your code, the review must be possible without the CI pipeline. In theory, this is always the case, but I have seen code collections that cannot be built easily, because of the many configuration parameters hidden in hundreds of directories. Some configuration is old and use environment variables to control how the toolchain has to translate the source. Especially cross-platform code is difficult to analyse because of the mixture of tools. Often it is only possible to study the source. This is a problem, because a code review also needs to rebuild the code with changing parameters (for example, changing compiler flags, replacing compilers, adding analyzers, etc.). If the build process doesn’t allow this, then you have a problem. This makes switching to different tools impossible, which is also necessary when you need to test new versions of your programming language or need to migrate old parts of your code to a newer standard.

Furthermore, if your code cannot be built outside your CI pipeline, then reviews are basically impossible. Just reading the source means that a lot of testing cannot be done. Ensure that your build systems do not grow into a complex creation no one wants to touch any more. The rules of secure and clean coding also apply to your CI pipeline. Create individual packages. Divide the build into modules, so that you can assemble the final application from independent building blocks. Also, refactor your build configuration. Make is simpler and remove all workarounds. Once the review gets stuck and auditors have to read your code like the newspaper, it is too late.

Implementing basic Tests during Software Development

The recent GnuPG bugs have sparked a discussion about standard tests during software development. The case was a buffer in the code which could be overwritten by a decryption operation. Overflow bugs can be easily avoided by defensive programming, but also by standard tests during the development phase. Modern compilers have features to test for stack/heap overflows, memory leaks, undefined behaviour, and many more cases you don’t want in your code. Clang offers the Clang Static Analyzer tools. GCC 10 offers similar features in the form of its static code analysis options as well. Valgrind celebrated its 20th anniversary last year, so there are no excuses. Since every project written in C/C++ always needs a set of build options for anyway, why not add some scripts or configurations to your tests?

First of all, adding something to your tests requires that you already do systematic testing of your code. Most projects have a collection of regression tests to make sure the code behaves as expected after changes. Additionally there might be test cases stemming from the bug reports to check for errors which should have been fixed and should never return. Furthermore, some projects have stress tests, load tests, and even fuzzing tests which can be easily activated. All of this requires a testing platform and processes to define, develop, test, and deploy tests. Not having a test infrastructure is no excuse for not testing code. This is especially true for code bases like libgcrypt or other widely used libraries/tools. The lack of continuous integration (CI) pipelines is also no excuse. Ideally tests are automated, but they don’t have to be. They need to just be run with few changes to the code and the build instructions. Often code has debug flags or other parameters which influences the run-time behaviour or generated code. That’s the way to start. Once your configuration (and maybe scripts) are in place, then you can go forth and automate everything else.

Collecting test cases should be your first step. Harvest the bug tracker and the change history. Try to extract cases and data that triggered a bug. Build a library of tests, then start extending your build system by a test mode that utilises this library and performs the tests. Don’t forget the benefits of your toolchain! Use the static analyzers when testing or running code. You can even do what you usually do with your code before shipping, just make sure the analysers are in place (i.e. compiled in or supervising the code execution). Using different compiles in the pre-shipping phase is a very good idea, too. All in all this should not add an enormous time to your development cycle. You have to test your code any way, why not let computers do this?

Using the built-in features of your compiler (or your favourite run-time framework) in order to detect bugs is a basic task for developers. Don’t wait until security researchers or penetration testers will do this for you. And if they do, please don’t treat bug reports as yet another rant. If it is a real bug report, then you should fix it and blame your code. The alternative is not to accept bug reports, but doing this doesn’t help anyone.

Powered by WordPress & Theme by Anders Norén