Secure Software doesn't develop itself.

The picture shows the top layer of the Linux kernel's API subsystems. Source: https://www.linux.org/attachments/kernel-jpeg.6497/

Month: February 2021

Anatomy of a Buffer Overflow in Python 3.x

The bug tracking system of Python was notified of a buffer overflow in Python. Affected versions were 3.10, 3.9, 3.8, 3.7, and 3.6. The code in question is part of the PyCArg_repr() function. This function is called when Python has to evaluate parameters from the ctypes class (i.e. wehn you are using C type variables in your Python code). The overflow can be triggered by using extreme values and letting Python expand the content into a buffer:
case 'd':
sprintf(buffer, "<cparam '%c' (%f)>",
self->tag, self->value.d);
break;
The %f place-holder is interesting. Using 1.79769e+308 (maximum for double data type) or 1.18973e+4932 (maximum for long double data type) will trigger a buffer overflow. This can be detected by the runtime and lead to an error message aborting the interpreter. In any case it is a good example to always validate input data before processing it. Some applications use components from different programming languages. Whenever data is handed around between functions implemented in different run-time environments, then you have to be extra careful about the data types. Sometimes implicit conversions occur. If conversions between numerical data and strings are performed, then always check the limits on both ends.

You can do the bounds checks even with arbitrary-precision arithmetic (also called bignum, multiple-precision, or infinite-precision arithmetic). Conversions can be done in confined common data types with less precision. This means to cut off the values and lose precision. Arbitrary-precision arithmetic often can export the data to string representations or other serialisation formats. This means that you have to estimate the size of the result. Java™ offers the BigDecimal and BigInteger classes. The buffer estimate looks like this:
byte[] storedUnscaledBytes = bigDecimal.unscaledValue().toByteArray();
int storedScale = bigDecimal.scale();
This gives you the exported values and its size. The latter needs to be used when using the export with functions using size-limited buffers. Both object size and object needs to be processed together and must not be separated in any further processing step. Check your code for conversions near APIs to external libraries or other components. There might be potential for overflows or conversions errors.

Implementing basic Tests during Software Development

The recent GnuPG bugs have sparked a discussion about standard tests during software development. The case was a buffer in the code which could be overwritten by a decryption operation. Overflow bugs can be easily avoided by defensive programming, but also by standard tests during the development phase. Modern compilers have features to test for stack/heap overflows, memory leaks, undefined behaviour, and many more cases you don’t want in your code. Clang offers the Clang Static Analyzer tools. GCC 10 offers similar features in the form of its static code analysis options as well. Valgrind celebrated its 20th anniversary last year, so there are no excuses. Since every project written in C/C++ always needs a set of build options for anyway, why not add some scripts or configurations to your tests?

First of all, adding something to your tests requires that you already do systematic testing of your code. Most projects have a collection of regression tests to make sure the code behaves as expected after changes. Additionally there might be test cases stemming from the bug reports to check for errors which should have been fixed and should never return. Furthermore, some projects have stress tests, load tests, and even fuzzing tests which can be easily activated. All of this requires a testing platform and processes to define, develop, test, and deploy tests. Not having a test infrastructure is no excuse for not testing code. This is especially true for code bases like libgcrypt or other widely used libraries/tools. The lack of continuous integration (CI) pipelines is also no excuse. Ideally tests are automated, but they don’t have to be. They need to just be run with few changes to the code and the build instructions. Often code has debug flags or other parameters which influences the run-time behaviour or generated code. That’s the way to start. Once your configuration (and maybe scripts) are in place, then you can go forth and automate everything else.

Collecting test cases should be your first step. Harvest the bug tracker and the change history. Try to extract cases and data that triggered a bug. Build a library of tests, then start extending your build system by a test mode that utilises this library and performs the tests. Don’t forget the benefits of your toolchain! Use the static analyzers when testing or running code. You can even do what you usually do with your code before shipping, just make sure the analysers are in place (i.e. compiled in or supervising the code execution). Using different compiles in the pre-shipping phase is a very good idea, too. All in all this should not add an enormous time to your development cycle. You have to test your code any way, why not let computers do this?

Using the built-in features of your compiler (or your favourite run-time framework) in order to detect bugs is a basic task for developers. Don’t wait until security researchers or penetration testers will do this for you. And if they do, please don’t treat bug reports as yet another rant. If it is a real bug report, then you should fix it and blame your code. The alternative is not to accept bug reports, but doing this doesn’t help anyone.

Powered by WordPress & Theme by Anders Norén