Secure Software doesn't develop itself.

The picture shows the top layer of the Linux kernel's API subsystems. Source: https://www.linux.org/attachments/kernel-jpeg.6497/

Tag: Mathematics

When to use Math Library Functions

Most programming languages have libraries with mathematical functions. Usually, these functions provide more complex calculations, such as trigonometric functions, numerical methods, or special mathematical functions. The libraries also provide simpler functions. The power function is one example. Sometimes developers can directly implement the calculations of the functions in the application code. The best practice is not to do that. There are some exceptions. You can calculate powers by performing multiplications. The pow() function calculates the result with arbitrary parameters, including fractional powers. The difference can be vastly different when measuring the time needed. Consider the following code snippet.


for( uint32_t j=0; j<hit_max; j++ ) {
    hit = 0;
    x = 0;
    y = 0;
    for( uint32_t i=0; i<rnd_max; i++ ) {
        x = static_cast( dist(gen64) ) / static_cast( numeric_limits::max() );
        y = static_cast( dist(gen64) ) / static_cast( numeric_limits::max() );
        if ( y < std::sqrt( 1 - x*x) ) {
            hit++;
        }
    }
}

The if() condition performs a calculation. To square a number, you just need to multiply it by itself. If you run the code with hit_max=rnd_max=5000, then two loops take about 5.4 seconds. The code executes two loops for comparison. Replacing the multiplication with pow(x,2) leads to an execution time of 8.6 seconds. That’s a lot, especially for loops running more often. So when calling a function from the math library, you should first consider if direct numerical operations can replace the call to the function in your code. The compiler cannot know what you want to calculate, so the optimiser cannot replace the function call. Deciding this with the pow() function and integer power values is easy. When faced with more complex calculations, it is not so easy. You can still try dividing the calculation into parts and check if those parts are directly computable. For numerical simulations or approximations, this is an effort well worth spent. Focus on the things you do in loops. You should also get to know the functions of your math library. The templates in C++ enable the compiler to select different data types. The size of the chosen data types also affects the performance, because floating-point values have different precisions and sizes. Some calculation might even be pre-computed with integers alone and then converted to floating-point data types. Inspect these sections in your code. Use integers with fitting sizes wherever you can. Use floating-point data types only if necessary.

Parallel Operations on numerical Values

Everyone knows the vector container of C++’s Standard Template Library (STL). It is useful, versatile, and store the data of all elements in a contiguous memory location. There is another container named std::valarray for array data that is not widely known. It is part of the STL for a long time (i.e. way before C++11). The use case is to perform operations on all array elements in parallel. You can even multiply two valarray containers element by element without using loops or other special code. While it has no iterators, you can easily create a valarray container from a vector, perform calculations in parallel, and push the results into a vector again. The C++ reference has example code to show how to do this. Creation from a vector requires access to the memory location of the vector’s data.

std::vector<double> values;
// Put some values into the vector here …
// Convert vector to valarray
std::valarray<double> val_values( values.data(), values.size() );

Now you can perform operations on all elements at once. Calculating cos() of all elements simply looks like this:

auto val_result = cos(val_values);

If you take the time and compare it to a loop through a vector where the function is called for every element, then you notice valarray is much faster. It depends on your compiler. GCC and Clang are quite fast. The apply() member function allows you to run arbitrary functions on every element. If you only need a subset of the elements, then you can create slices with the required values.

Powered by WordPress & Theme by Anders Norén