If you read the news, the articles are full of hype for the new Large Language Models (LLMs). People do amazing and stupid things by chatting with these algorithms. The problem is that LLMs are just algorithms. They have been „trained“ with large amounts of data, hence the first „L“. There is no personality or cognitive process involved. Keep the following properties in mind when using these algorithms.
- LLMs stem from the field of artificial intelligence, which is a field of computer science.
- LLMs perform no cognitive work. The algorithms do not think, they just remix, filter, and repeat. They can’t create anything new better than a random generator and a Markov chain of sufficiently high order.
- LLMs have no opinion.
- LLMs feature in-built hallucinations. This means the algorithms can lie. Any of the current (and possibly future models of the same kind) will generate nonsense at some point.
- LLMs can be manipulated by conversation (by using variations of prompts).
- LLMs are highly biased, because their training data lacks different perspectives (given their size and cultural background of the data, especially languages used for training).
- LLMs can’t be debugged reliably. Most fixes to problems comprise input or output filters. Essentially, these filters are allow or block lists. This is the weakest form of using filters for protection.
So there are a lot of arguments for not treating LLMs as something with a opinion, personality, or intelligence. These models can mimick language. When training, they always learn language patterns. Yes, LLMs belong to the research field of artificial intelligence, but there is no thinking going on. The label „AI“ doesn’t describe what the algorithm is doing. This is not news. Emily Bender published an article titled „Look behind the curtain: Don’t be dazzled by claims of ‘artificial intelligence’“ in the Seattle Times in May 2022. Furthermore, there is a publication from Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. Its title is „On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?“. The publication date is March 2021. They published both articles way before the „new“ chat algorithms and LLM-powered „search engines“ hit the marketing departments. You have been warned.
So, this article is not meant to discourage anyone from exploring LLMs. You just need to be careful. Remember that algorithms are seldom a silver bullet that can solve all your problems or make them go away. This is especially true if you want to use LLMs for your software development cycle.
Update: If you really want to explore the uses of synthetic text generation, please watch the presentation “ChatGP-why: When, if ever, is synthetic text safe, appropriate, and desirable?” by Emily Bender.
Continuous Integration (CI) is a standard in software development. A lot of companies use it for their development process. It basically means using automation tools to test new code more frequently. Instead of continuous, you can also use the word automated, because CI can’t work manually. Modern build systems comprise scripts and descriptive configurations that invoke components of the toolchain in order to produce executable code. Applications build with different programming languages can invoke a lot of tools with individual configurations. The build system is also a part of the code development process. What does this mean for CI in terms of secure coding?
The trend of large language models (LLMs) continues. Many people are doing experiments and explore how these algorithms can help them when developing software. Most integrated development environments have features that help you while writing code. Access to documentation, function call parameters, static checks, and suggestions are standard tools to help you. LLMs are the new kid on the block. Some articles describe how questions (or prompts) to chat engines were used to create code samples. The output depends a lot on the prompt. Changing words or rephrasing the prompt can lead to different results. This differs from the way other tools work. Getting useful results means to play with the prompt and engage in trial-and-error cycles. Algorithms such as ChatGPT are not sentient. They cannot think. The algorithm just remixes and repeats part of its training data. Asking for code examples is probably most useful for getting templates or single functions. This use case is disappointingly close to browsing tutorials or Stackoverflow.