Secure Design – Opinion

Secure Software doesn't develop itself.

The picture shows the top layer of the Linux kernel's API subsystems. Source: https://www.linux.org/attachments/kernel-jpeg.6497/

Recommendations for using Exceptions in Code

Exceptions can be useful for handling error conditions. They are suited for better structuring code and avoiding if/else cascades. However, exceptions can disturb the control flow and can make your program skip sections of your code. Cleaning up after errors and the management of resources can be affected by this. Another downside is the performance if exceptions are triggered often. If you need to catch errors, you have to be careful when to use exceptions and when to use error flags. The article Exception dangers and downsides in the C++ tutorial has some hints how to use exceptions:

  • Use exceptions for errors that trigger infrequently.
  • Use exceptions for errors that are critical for the code execution (i.e. error conditions that make subsequent operations impossible).
  • Use exceptions when the errors cannot be handled right at the point where it occurs.
  • Use exceptions when returning an error code is not possible or not an option.

The actual cost for exceptions is influenced by the compiler, the hardware, and the operating system. There is a sign that the exception handling is improved on the x86_64 platform. Having multiple cores can pose a problem, because unwinding exceptions is a single-thread problem. This can be a problem for locked resources and similar synchronisation techniques. The proposal P2544R0 describes the background of these problems, and it proposes some alternatives for error handling by exceptions. The article also has some measurements that show the impact of exceptions. My recommendation is to investigate how frequent errors are and to explore handling non-critical errors by using flags or return codes. When in doubt, use the instrumentation of your compiler and measure the actual cost of exceptions.

Software Bill of Materials (SBOM)

No software projects can do without components. Software libraries are the most common ingredient. Some programming languages use the concept of modules or plugins. Then there are frameworks typically used in web applications. All of these parts can feature different versions, because some requirement fixed the version to a major release. If one component exhibits a security vulnerability, then the code in question must be updated. Given the size of modern applications, this can be a challenge. Remember the Log4j vulnerability from last year. Many teams didn’t even know that they were running applications containing Log4j. Attackers know about this and target the supply chain of the code. Finding a critical bug in components can open up a wide variety of code for attacks. This is where the Software Bill of Materials (SBOM) comes into play.

SBOM is simply a list of components that your applications contains. Most teams using build tools have this already implicitly. Basically, SBOM is a standard for the manifest of components your software brings with it. The National Telecommunications and Information Administration (NTIA) has compiled a set of resources describing how to use the standard. The idea is not new. There already exists Software Package Data Exchange® (SPDX®), OWASP CycloneDX, and ISO/IEC 19770-2:2015. Everything boils down to a specification for a JSON document that contains all relevant metadata of all components. The key is to catch all the dependencies. Some libraries depend on others, so package managers can resolve these links and install additional components. Build systems do this automatically, but the SBOM needs to be complete. Another problem is the categorisation. Some parts of the code belong to the runtime, some are managed by the operating system, and others are shipped within the application itself. A full SBOM needs to combine these sources. Your toolchain can help you there. C/C++ compiler can create a list of all headers used during compilation (the Clang/GCC flags -MD and -MF name.out.d take care of it). This is just a part of the development environment. You have to create a manifest for container images and other deployment methods as well. Be as accurate as possible, and make sure that SBOM creating is an automated part of your build pipeline.

Go, Go Carbon, Go++, Carbon++, C++, or Go Rust?

I break my rule of not writing titles with questions marks. The exemption is because of the new programming language, Carbon. I saw the announcement a couple of days ago. The article mentioned that Google engineers are working on a new programming language called Carbon. The author of the article added the tagline „A Hopeful Successor To C++“. The immediate question to the endeavor of creating a new programming language was: Why? There are a lot of programming languages and dialects around. All the languages have their own background. It is easy to bash a specific programming language, but if you take the history of creation into account, then it gets easier to understand why specific design choices were made. It is easy to recommend different choices in retrospect. Given the existence of Go and the periodic C++ standard updates, I wonder what the design goals of Carbon are. Luckily, the article mentioned them:

  • C++ performance
  • Seamless bidirectional interoperability with C++
  • Easier learning curve for C++ developers
  • Comparable expressivity
  • Scalable migration

If you read the actual Carbon language description from the repository, then there are some additional goals:

  • Safer fundamentals, and an incremental path towards a memory-safe subset
  • Language evolution (Carbon wants to be C++’s TypeScript/Kotlin)

This doesn’t look too bad. Clearly safer fundamentals and memory-safe features are a good idea. After checking some code examples, the syntax of Carbon looks a lot like Rust. For my taste, the easier learning curve is in the beholder’s eye. Personally, I dislike Rust’s syntax. Carbon adds some grammar and differences in special characters that will probably hinder anyone with C++ experience (well, at least myself). The interoperability claims that you can use your tried build system for your projects. The Carbon project clearly states that it is presenting a prototype for exploring the desired language features. Apparently, the designers want to use C++ as the foundation and add their own vision.

C++ has evolved a lot in the past decade. The new C++ standards have implemented a lot of the missing features that had to be supported by third-party libraries. Because a programming language needs time to create a stable version. Carbon will have to catch up with C++’s head start. Judging from the project vision, it seems to be yet another use case for LLVM. Let’s see if we find out the real reason why Google wants to replace C++.

Using C++ Threads or OpenMP for parallel Processing

Having easy access to parallel processing is a pleasant feature in programming languages. The thread syscalls of operating systems have notoriously been difficult to access, especially in C. The Open Multi-Processing (OpenMP) library started in 1997 to make things easier. It helps to mark sections of parallel code and loop that can be parallelized. It works well for C, C++, and FORTRAN code. It is easy to implement. Plus, your code can be compiled with or without the OpenMP library present. The downside is that your code requires OpenMP on the target. I recently had a case where C++ code needs to be installed on different platforms (i.e. systems with different major version level). OpenMP is tied to the C/C++ standard library and the compiler. The code is compiled by Clang, so in this particular case you need different OpenMP libraries. In order to reduce the dependency on OpenMP, the code was refactored to use C++ threads.

C++11 threads are easy to use. When switching from OpenMP, you only have to convert your #pragma statements to function calls. When using member functions, you have to code around a peculiarity of std::async and std::thread. Member functions of dynamically allocated objects cannot be called directly. If you try to do this, then you will get an error message. Consider the following object:

class hash_list {
private:
kyotocabinet::HashDB HDB;
kyotocabinet::HashDB::Cursor *pos;

public:
bool walkthrough( string directory ) {

}
The class is used to access different Tokyo Cabinet databases. The function walkthrough() does heavy I/O work and updates the database file. Calling the function directly with std::async will not work. I fought with many compiler errors and was tempted to convert the member function to static, but this would have required a full rewrite of the class. Static member variables and function change access to encapsuled data. Instead, you will need a wrapper function to call the members.

#ifndef USE_OPENMP
bool wrap_walkthrough( hash_list *h, string d ) {
return( h->walkthrough(d) );
}
#endif

The function wrap_walkthrough() works fine, and it can be called with different dynamically allocated objects. The section calling the functions looks like this:

#ifndef USE_OPENMP
future<bool> f_rc_path = std::async( std::launch::async, wrap_walkthrough, path_orig, opt_path );
future<bool> f_rc_prfx = std::async( std::launch::async, wrap_walkthrough, path_prefix, prefix_path );
const bool rc_path  = f_rc_path.get();
const bool rc_prfx  = f_rc_prfx.get();
if ( ! rc_path ) {
cerr << "Walkthrough for " << opt_path << " failed!" << endl;
rc += 23;
}
if ( ! rc_prfx ) {
cerr << "Walkthrough for " << prefix_path << " failed!" << endl;
rc += 23;
}
#endif

Remember to write wrapper functions when you encounter the error message “reference to non-static member function must be called”.

Anatomy of a Buffer Overflow in Python 3.x

The bug tracking system of Python was notified of a buffer overflow in Python. Affected versions were 3.10, 3.9, 3.8, 3.7, and 3.6. The code in question is part of the PyCArg_repr() function. This function is called when Python has to evaluate parameters from the ctypes class (i.e. wehn you are using C type variables in your Python code). The overflow can be triggered by using extreme values and letting Python expand the content into a buffer:
case 'd':
sprintf(buffer, "<cparam '%c' (%f)>",
self->tag, self->value.d);
break;
The %f place-holder is interesting. Using 1.79769e+308 (maximum for double data type) or 1.18973e+4932 (maximum for long double data type) will trigger a buffer overflow. This can be detected by the runtime and lead to an error message aborting the interpreter. In any case it is a good example to always validate input data before processing it. Some applications use components from different programming languages. Whenever data is handed around between functions implemented in different run-time environments, then you have to be extra careful about the data types. Sometimes implicit conversions occur. If conversions between numerical data and strings are performed, then always check the limits on both ends.

You can do the bounds checks even with arbitrary-precision arithmetic (also called bignum, multiple-precision, or infinite-precision arithmetic). Conversions can be done in confined common data types with less precision. This means to cut off the values and lose precision. Arbitrary-precision arithmetic often can export the data to string representations or other serialisation formats. This means that you have to estimate the size of the result. Java™ offers the BigDecimal and BigInteger classes. The buffer estimate looks like this:
byte[] storedUnscaledBytes = bigDecimal.unscaledValue().toByteArray();
int storedScale = bigDecimal.scale();
This gives you the exported values and its size. The latter needs to be used when using the export with functions using size-limited buffers. Both object size and object needs to be processed together and must not be separated in any further processing step. Check your code for conversions near APIs to external libraries or other components. There might be potential for overflows or conversions errors.

Implementing basic Tests during Software Development

The recent GnuPG bugs have sparked a discussion about standard tests during software development. The case was a buffer in the code which could be overwritten by a decryption operation. Overflow bugs can be easily avoided by defensive programming, but also by standard tests during the development phase. Modern compilers have features to test for stack/heap overflows, memory leaks, undefined behaviour, and many more cases you don’t want in your code. Clang offers the Clang Static Analyzer tools. GCC 10 offers similar features in the form of its static code analysis options as well. Valgrind celebrated its 20th anniversary last year, so there are no excuses. Since every project written in C/C++ always needs a set of build options for anyway, why not add some scripts or configurations to your tests?

First of all, adding something to your tests requires that you already do systematic testing of your code. Most projects have a collection of regression tests to make sure the code behaves as expected after changes. Additionally there might be test cases stemming from the bug reports to check for errors which should have been fixed and should never return. Furthermore, some projects have stress tests, load tests, and even fuzzing tests which can be easily activated. All of this requires a testing platform and processes to define, develop, test, and deploy tests. Not having a test infrastructure is no excuse for not testing code. This is especially true for code bases like libgcrypt or other widely used libraries/tools. The lack of continuous integration (CI) pipelines is also no excuse. Ideally tests are automated, but they don’t have to be. They need to just be run with few changes to the code and the build instructions. Often code has debug flags or other parameters which influences the run-time behaviour or generated code. That’s the way to start. Once your configuration (and maybe scripts) are in place, then you can go forth and automate everything else.

Collecting test cases should be your first step. Harvest the bug tracker and the change history. Try to extract cases and data that triggered a bug. Build a library of tests, then start extending your build system by a test mode that utilises this library and performs the tests. Don’t forget the benefits of your toolchain! Use the static analyzers when testing or running code. You can even do what you usually do with your code before shipping, just make sure the analysers are in place (i.e. compiled in or supervising the code execution). Using different compiles in the pre-shipping phase is a very good idea, too. All in all this should not add an enormous time to your development cycle. You have to test your code any way, why not let computers do this?

Using the built-in features of your compiler (or your favourite run-time framework) in order to detect bugs is a basic task for developers. Don’t wait until security researchers or penetration testers will do this for you. And if they do, please don’t treat bug reports as yet another rant. If it is a real bug report, then you should fix it and blame your code. The alternative is not to accept bug reports, but doing this doesn’t help anyone.

Everyone has an Opinion – so do we

This is a new blog. Its purpose is to serve as a companion to the secure coding / secure design curriculum I am developing for years. The we in the title are partners that help to educate software developers about how to protect their code against hostile environments and malicious attackers. The blog itself is embedded into the wiki that holds a collection of secure coding/design patterns, taxonomy, and examples. So much for the background.

The blog carries opinion it its title. The reason is simple. While we have a lot of information security standards, the information technology itself is driven by opinions. Agile software development, the use of containers versus processes versus virtual machines, the programming language of the day, the/your/our operating system of choice, code platforms, frameworks, and many more aspects of the modern digital tools available to software developers is based on opinion. This doesn’t mean that there is no right answer. The problem is just that there are many of them. What works for your organisation, your team, your project is highly dependant on the context you are working in. Furthermore it depends on how your customers use your application. One size fits all might work for hats, it doesn’t work for most other things. This is why opinion is presented in this blog. All articles will have the intent to connect to the wonderful world of software development, secure coding, and secure design.

If you want to engage, then you can leave comments. However the time window for adding comments will close after a few weeks. This is purely out of administrative reasons, because neither me nor my partners have the time to watch the comments and manually approve them. Be quick, or write an email!

Enjoy the articles!

Page 2 of 2

Powered by WordPress & Theme by Anders Norén