Secure Software doesn't develop itself.

The picture shows the top layer of the Linux kernel's API subsystems. Source: https://www.linux.org/attachments/kernel-jpeg.6497/

Category: C/C++ Page 1 of 2

C/C++ specifics.

Parallel Operations on numerical Values

Everyone knows the vector container of C++’s Standard Template Library (STL). It is useful, versatile, and store the data of all elements in a contiguous memory location. There is another container named std::valarray for array data that is not widely known. It is part of the STL for a long time (i.e. way before C++11). The use case is to perform operations on all array elements in parallel. You can even multiply two valarray containers element by element without using loops or other special code. While it has no iterators, you can easily create a valarray container from a vector, perform calculations in parallel, and push the results into a vector again. The C++ reference has example code to show how to do this. Creation from a vector requires access to the memory location of the vector’s data.

std::vector<double> values;
// Put some values into the vector here …
// Convert vector to valarray
std::valarray<double> val_values( values.data(), values.size() );

Now you can perform operations on all elements at once. Calculating cos() of all elements simply looks like this:

auto val_result = cos(val_values);

If you take the time and compare it to a loop through a vector where the function is called for every element, then you notice valarray is much faster. It depends on your compiler. GCC and Clang are quite fast. The apply() member function allows you to run arbitrary functions on every element. If you only need a subset of the elements, then you can create slices with the required values.

CrowdStrike and how not to write OS Drivers

The image shows a screenshot of a null pointer execption in Microsoft Windows. Source: Zach VorhiesYesterday the CrowdStrike update disable thousands of servers and clients all across the world. The affected systems crashed when booting. A first analysis by Zach Vorhies (careful, the link goes to the right-wing social media network X) has some not very surprising news about the cause of the problem. Apparently, the system driver from CrowdStrike hit a null pointer access violation. Of course, people immediately started bashing C++, but this is too shallow. There are different layers where code is executed. The user space is usually a safe ground where you can use standard techniques of catching errors. Even a crash might be safer for user space applications than continuing and doing more harm. Once your code runs as a system driver, then you are part of the operating system and have to observe a couple of constraints. OS code can’t just exit or crash (even exception without the catch{} code count as a crash). So having a null situation in mission-critical code is something which should never happen. This should have been caught in the testing phase. Furthermore, Modern C++ has no use for null pointers. You must use smart pointers, and by doing that, you don’t need to handle null pointers. There is nothing more to it.

You cannot ignore certain error conditions when running within the operating system. Memory allocation, I/O errors, and everything concerning memory operations is critical. There must be checks in place, and there is no excuse for omitting these checks.

URL Validation, Unit Tests and the Lost Constructor

I have some code that requests URLs, looks for session identifiers or tokens, extracts them, and calculates some indicators of randomness. The tool works, but I decided to add some unit tests in order to play with the Catch2 framework. Unit tests requires some easy to check conditions, so validating HTTP/HTTPS URLs sounds like a good idea to get started. The code uses the curl library for requests, so checking URLs can be done by libcurl or before feeding the URL string to it. Therefore I added some regular expressions. RFC 3986 has a very good description of Uniform Resource Identifiers (URIs). The full regular expression is quite large and match too many variations of URI strings. You can inspect it on the regex101 web site. Shortening the regex to matching URLs beginning with “http” or “https” requires to define what you want to match. Should there be only domain names? Are IP addresses allowed? If so, what about IPv4 and IPv4? Experimenting with the filter variations took a bit of time. The problem was that no regex was matching the pattern. Even patterns that worked fine in other programming languages did not work in the unit test code. The error was hidden in a constructor.

Class definitions in C++ often have multiple variations of constructors. The web interface code can create a default instance where you set the target URL later by using setters. You can also create instances with parameters such as the target or the number of requests. The initialisation code sits in one member function which also initialises the libcurl data structures. So the constructors look like this:

http::http() {
}

http::http( unsigned int nreq ) {
init_data_structures();
set_max_requests( nreq );
return;
}

The function init_data_structures() sets a flag that tells the instance if the libcurl subsystem is working or not. The first constructor does not call the function, so the flag is always false. The missing function call is hard to miss. The reason why the line was missing is that the code had a default constructor at first. The other constructors were added later, and the default constructor function was never used, because the test code never creates instances without an URL. This bring me back to the unit tests. The Catch2 framework does not need a full program code with a main() function. You can directly create instances in your test code snippets and use them. That’s why the error got noticed. Unit tests are not security tests. The missing initialisation function call is most probably not a security weakness, because the code does not run with the web request subsystem flag set to false. It’s still a good way to catch omissions or logic errors. So please do lots of unit tests all of the time.

Floating Point Data Types and Computations

The picture shows how real numbers fit into the IEEE 754 floating point data type representation. Source: https://en.wikibooks.org/wiki/Introduction_to_Numerical_Methods/Rounding_Off_ErrorsFloating point data types are available in most programming languages. C++ knows about float, double, and long double data types. Other programming languages feature longer (256 bit) and shorter (16 bit and lower) representations. All data types are specified in the IEEE Standard for Floating-Point Arithmetic (IEEE 754). IEEE 754 is the standard for all implementations. Hardware also supports storage and operations. Floating point data storage is usually used in numerical calculations. Since the use case is to represent real numbers, the accuracy is a problem. Mathematically there is an infinite amount of other real numbers between two arbitrary chosen real numbers. Computers are notoriously bad at storing an infinite amount of data. For the purposes of programming, this means that all choices for using any floating point data type have to deal with error conditions and how to handle them. Obvious errors include the division by zero. Less obvious conditions are rounding errors, special numbers (infinity, not a number, signed zeroes, subnormal numbers), and overflows.

Not all of the error conditions may pose a threat for your applications. It depends on what type of numerical calculations your code does or consumes. Comparisons have to be implemented in a thoughtful way. Test for equality may fail, because of rounding errors. Using the “real zero” can backfire. The C and C++ standard libraries supply you with a list of constants. Among them is the minimal difference that can be represented in a floating point data type. It is called the epsilon value. Epsilon (ε) is often used to denote very small values. cfloat or float.h defines FLT_EPSILON (for float), DBL_EPSILON (for double), and LDBL_EPSILON (for long double). Using this value as the smallest difference possible is usually a good idea. There is another method for finding neighbouring floating point numbers. C++11 has introduced functions to find the next neighbour value. The accuracy is determined by the unit of least precision (ULP). ULPs are defined by the value of the least significant bit. Using ULPs or the epsilon values is a different approach. ULP checking requires transformation of the values into integer registers. Both methods work well away from the zero. If you are near the zero value, then consider using multiples of the epsilon value as a comparison value.

There is another overlooked fact. The float data type has 32 bit of storage. This means you can use 4 billions different bit combinations, which is not a lot. Looping through all values and stress testing a numerical function can be done in minutes. There is a blog post covering this technique complete with example code.

I have compiled some useful resources for this topic.

Linking against the Threading Building Blocks (oneTBB) library with g++ and clang++

A couple of weeks ago, I created a repository of example code for C and C++ courses. The examples include a source file that uses the Threading Building Blocks (oneTBB) library. Since the examples are rather small, I included no Makefile. Instead, I wrote a Bash script using clang++ and a Ninja build file using g++. Strangely, the clang++ build worked, but the g++ version complained about symbols in the linking phase. The linking errors look horrible. Here is the top of the long list of errors (added here for search engines to find the error):

/usr/bin/ld: /tmp/cc4i6mNS.o: in function `tbb::detail::d1::wait_context::add_reference(long)':
parallel_algorithms.cpp:(.text._ZN3tbb6detail2d112wait_context13add_referenceEl[_ZN3tbb6detail2d112wait_context13add_referenceEl]+0x6c): undefined reference to `tbb::detail::r1::notify_waiters(unsigned long)'
/usr/bin/ld: /tmp/cc4i6mNS.o: in function `tbb::detail::d1::execution_slot(tbb::detail::d1::execution_data const&)':
parallel_algorithms.cpp:(.text._ZN3tbb6detail2d114execution_slotERKNS1_14execution_dataE[_ZN3tbb6detail2d114execution_slotERKNS1_14execution_dataE]+0x14): undefined reference to `tbb::detail::r1::execution_slot(tbb::detail::d1::execution_data const*)'
/usr/bin/ld: /tmp/cc4i6mNS.o: in function `tbb::detail::d1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&)':
parallel_algorithms.cpp:(.text._ZN3tbb6detail2d15spawnERNS1_4taskERNS1_18task_group_contextE[_ZN3tbb6detail2d15spawnERNS1_4taskERNS1_18task_group_contextE]+0x30): undefined reference to `tbb::detail::r1::spawn(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&)'
/usr/bin/ld: /tmp/cc4i6mNS.o: in function `tbb::detail::d1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&)':
parallel_algorithms.cpp:(.text._ZN3tbb6detail2d116execute_and_waitERNS1_4taskERNS1_18task_group_contextERNS1_12wait_contextES5_[_ZN3tbb6detail2d116execute_and_waitERNS1_4taskERNS1_18task_group_contextERNS1_12wait_contextES5_]+0x2c): undefined reference to `tbb::detail::r1::execute_and_wait(tbb::detail::d1::task&, tbb::detail::d1::task_group_context&, tbb::detail::d1::wait_context&, tbb::detail::d1::task_group_context&)'…

There was no obvious difference between the compiler calls.

g++ -Wall -Werror -Wpedantic -std=c++20 -ltbb -g -O0 -o parallel_algorithms parallel_algorithms.cpp
clang++ -Wall -Werror -Wpedantic -std=c++20 -ltbb -march=native -o parallel_algorithms parallel_algorithms.cpp

After browsing through a lot of search results that don’t explain the problem, I tried to put the -ltbb at the end of the command. If you do this, then everything works fine with g++:

g++ -Wall -Werror -Wpedantic -std=c++20 -g -O0 -o parallel_algorithms parallel_algorithms.cpp -ltbb

🥳 oneTBB doesn’t require much, but its link option has to be at the end of the compiler command. Apparently clang++ does something different when resolving symbols. Good to know.

Data Conversion in Code – Casting Data Types

A lot of code deals with transforming data from one data type to another. There are many reasons for doing this. The different sizes of integer numbers and changing their sign is one issue. In the past, casting was easy and error-prone. C++ has different ways to deal with casting data types. The full list of methods involves:

  • static_cast<T> is for related data. The checks perform at compile time, not at run-time.
  • dynamic_cast<T> is useful for pointers. It checks the data type at run-time and ensures that unsafe downcasts are avoided. Catching errors usually results in an exception.
  • const_cast<T> does not change the data type. It just tells the compiler not to expect any write operations on the data. This is useful for optimisation choices implemented by the compiler.
  • reinterpret_cast<T> is the worst option for converting/casting data. The compiler will enforce the new data type. So this should only be an option if you know what you are doing.
  • std::move is new in C++11/C++14. It doesn’t change the data type, but the location of the data. You can move values and instances with it. The main reason for moving things is to protect them from unintended modification.

While you can still use the old-fashioned C-style cast operation, you should never do this in C++. The list of options document more clearly what you intend to do in your code. For more complex operations, it is recommended to create a method for performing the transformation. This method can also enforce range checks or other tests before converting data. There is a blog article that illustrates the conversion methods by using Han Solo from Star Wars.

Using constants and constant expressions is something you absolutely should adopt for your coding style. Marking data storage as read-only allows the compiler to create faster code. In addition, you will get errors when other parts of your code modifies data that should not be changed.

C++ Style Improvements

The international tidy person symbol. Source: https://commons.wikimedia.org/wiki/File:International_tidyman.svgRecently I experimented with Clang Tidy. I let it analyse some of my code and looked at the results. It was really helpful, because it suggested a lot of slight improvements that do not radically change your code. If you started with C and switched to C++ at some point, you will definitely still have some C habits in your code. Especially APIs expecting pointers to character buffers often use C-style arrays. There are ways to get rid of these language constructs, but you have to work on your habits. Avoiding C-style arrays is straightforward. You just have to do it. There is a reason why C and C++ are different programming languages.

Then there are the new C++ standards. In terms of security and convenience, the C++11 and C++17 standards are milestones. You should update your code to at least the C++11 standard. Clang Tidy can help you. There is a class of checks covering the C++ Core Guidelines. The guidelines are a compilation of recommendations to refactor your code into a version that is less error-prone and more maintainable. Keep in mind that you can never apply all rules and recommendations to your code base. Pick a set of five to seven issues and try to implement them. Repeat when you have time or need to change something in the code anyway.

Error message “undefined reference to `Mcuda_compiled’ in nvhpc”

Using different compilers for either production or testing purposes can be challenging. While the GNU Compiler Collection is the standard on GNU/Linux system, using more than one compiler toolchain helps to identify bugs in the code. The project I am working on uses the Clang compiler. I added command-line options for NVIDIA’s nvc and nvc++ from the high-performance compiler collection. The first compilations runs yielded the error message “undefined reference to `Mcuda_compiled'” during the linker phase. Inspecting the object files and the libraries did not help. Several libraries define the Mcuda_compiled symbol. The key is to use consistent -M options across compiler and linker. The compiler uses the optimizer flags -Mcuda -Mvect. Adding the same options to the linker phase solves the problem. Inspecting the output of ldd shows that the library libcudanvhpc.so contains the symbol.

Recommendations for using Exceptions in Code

Exceptions can be useful for handling error conditions. They are suited for better structuring code and avoiding if/else cascades. However, exceptions can disturb the control flow and can make your program skip sections of your code. Cleaning up after errors and the management of resources can be affected by this. Another downside is the performance if exceptions are triggered often. If you need to catch errors, you have to be careful when to use exceptions and when to use error flags. The article Exception dangers and downsides in the C++ tutorial has some hints how to use exceptions:

  • Use exceptions for errors that trigger infrequently.
  • Use exceptions for errors that are critical for the code execution (i.e. error conditions that make subsequent operations impossible).
  • Use exceptions when the errors cannot be handled right at the point where it occurs.
  • Use exceptions when returning an error code is not possible or not an option.

The actual cost for exceptions is influenced by the compiler, the hardware, and the operating system. There is a sign that the exception handling is improved on the x86_64 platform. Having multiple cores can pose a problem, because unwinding exceptions is a single-thread problem. This can be a problem for locked resources and similar synchronisation techniques. The proposal P2544R0 describes the background of these problems, and it proposes some alternatives for error handling by exceptions. The article also has some measurements that show the impact of exceptions. My recommendation is to investigate how frequent errors are and to explore handling non-critical errors by using flags or return codes. When in doubt, use the instrumentation of your compiler and measure the actual cost of exceptions.

Using C++ Threads or OpenMP for parallel Processing

Having easy access to parallel processing is a pleasant feature in programming languages. The thread syscalls of operating systems have notoriously been difficult to access, especially in C. The Open Multi-Processing (OpenMP) library started in 1997 to make things easier. It helps to mark sections of parallel code and loop that can be parallelized. It works well for C, C++, and FORTRAN code. It is easy to implement. Plus, your code can be compiled with or without the OpenMP library present. The downside is that your code requires OpenMP on the target. I recently had a case where C++ code needs to be installed on different platforms (i.e. systems with different major version level). OpenMP is tied to the C/C++ standard library and the compiler. The code is compiled by Clang, so in this particular case you need different OpenMP libraries. In order to reduce the dependency on OpenMP, the code was refactored to use C++ threads.

C++11 threads are easy to use. When switching from OpenMP, you only have to convert your #pragma statements to function calls. When using member functions, you have to code around a peculiarity of std::async and std::thread. Member functions of dynamically allocated objects cannot be called directly. If you try to do this, then you will get an error message. Consider the following object:

class hash_list {
private:
kyotocabinet::HashDB HDB;
kyotocabinet::HashDB::Cursor *pos;

public:
bool walkthrough( string directory ) {

}
The class is used to access different Tokyo Cabinet databases. The function walkthrough() does heavy I/O work and updates the database file. Calling the function directly with std::async will not work. I fought with many compiler errors and was tempted to convert the member function to static, but this would have required a full rewrite of the class. Static member variables and function change access to encapsuled data. Instead, you will need a wrapper function to call the members.

#ifndef USE_OPENMP
bool wrap_walkthrough( hash_list *h, string d ) {
return( h->walkthrough(d) );
}
#endif

The function wrap_walkthrough() works fine, and it can be called with different dynamically allocated objects. The section calling the functions looks like this:

#ifndef USE_OPENMP
future<bool> f_rc_path = std::async( std::launch::async, wrap_walkthrough, path_orig, opt_path );
future<bool> f_rc_prfx = std::async( std::launch::async, wrap_walkthrough, path_prefix, prefix_path );
const bool rc_path  = f_rc_path.get();
const bool rc_prfx  = f_rc_prfx.get();
if ( ! rc_path ) {
cerr << "Walkthrough for " << opt_path << " failed!" << endl;
rc += 23;
}
if ( ! rc_prfx ) {
cerr << "Walkthrough for " << prefix_path << " failed!" << endl;
rc += 23;
}
#endif

Remember to write wrapper functions when you encounter the error message “reference to non-static member function must be called”.

Page 1 of 2

Powered by WordPress & Theme by Anders Norén