LLM – Secure Design

If all you have is a Large Language Model (LLM), then you will apply it to all of your problems. People are now trying to find 0-days with the might of LLMs. While there is no surprise that this works, there is a better way of pushing your code to the limit. Just use random data! Someone coined the term fuzzing in 1988. People have been using defective punch cards as input for a while longer. With input filtering of data, you want to eliminate as much bias as possible. This is exactly why people create the input data using random data. Human testers think too much, too less, or are too constrained. (Pseudo-)Random number generators rarely have a bias. LLMs do. This means that the publication about finding 0-days by using LLMs should not be good news. Just like human Markov chains, LLMs only „look“ in a specific direction when creating input data. The model is the slave of vectors and the training data. The process might use the source code as an „inspiration“, but so does a compiler with a fuzzing engine. Understanding that LLMs do not possess any cognitive capabilities is the key point here. You cannot ask an LLM what it thinks of the code in combination with certain input data. You are basically using a fancy data generator that uses more energy and is too complex for the task at hand.

Comparing LLMs with fuzzing engines does not work well. Both approaches serve an original purpose. Always remember that the input data in security tests should push your filters to the limit and create a situation that you did not expect. Randomness will do this much more efficiently than a more complex algorithm. If you are fond of complexity or have too much powerful hardware at your hands, there are other things you can do with this.

The trend of large language models (LLMs) continues. Many people are doing experiments and explore how these algorithms can help them when developing software. Most integrated development environments have features that help you while writing code. Access to documentation, function call parameters, static checks, and suggestions are standard tools to help you. LLMs are the new kid on the block. Some articles describe how questions (or prompts) to chat engines were used to create code samples. The output depends a lot on the prompt. Changing words or rephrasing the prompt can lead to different results. This differs from the way other tools work. Getting useful results means to play with the prompt and engage in trial-and-error cycles. Algorithms such as ChatGPT are not sentient. They cannot think. The algorithm just remixes and repeats part of its training data. Asking for code examples is probably most useful for getting templates or single functions. This use case is disappointingly close to browsing tutorials or Stackoverflow.

Designing prompts is a new skill artificially created by LLMs algorithms. This is another problem, because you now need to collect prompts for creating the most useful code. The work shifts to another domain, but you don’t actually save time unless you have a compendium of prompts. Creating useful and well-tested templates is a better use of resources. The correct of of patterns governs code creating with or without LLMs.

The security questions have not been addressed yet. There are studies that analyse how the code quality of tool- and human-generated code looks like. According to the Open Source Security and Risk Analysis (OSSRA) report from 2022, the code created by assistant features contained vulnerabilities 40% of the time. An examination of code created by Github’s Copilot shows that autogenerated code contains bugs that belong to specific software weaknesses. The code created by humans has a distinct pattern of weaknesses. A more detailed analysis can only be done by larger statistical samples, because Copilot’s training data is proprietary and not accessible. There is room for more research, but it is safe to say that LLMs also make mistakes. Output from these algorithms must be included in the quality assurance cycle, with no exceptions. Code generators cannot work magic.

If you are interested in using LLMs for code creation, make sure that you understand the implications. Developing safe and useful templates is a better way than to engineer prompts for the best code output. Furthermore, the output can change whenever the LLM changes its version or training data. Using algorithms to create code is not a novel approach. Your toolchains most probably already contain code generators. In either case, you absolutely have to understand how they work. This is not something an AI algorithm can replace. The best approach is to study the code suggested by the algorithm, transfer it into pseudo-code, and write the actual code yourself. Avoid cut & paste, otherwise your introduce more bugs to fix later.

Secure Software doesn't develop itself.

Tag: LLM

Finding 0-Days with Large Language Models exclusive-or Fuzzing

Using AI Language Models for Code Creation