ChatGPT was introduced in November 2022 and has since amazed users with its capabilities. It can translate languages, it can read, digest and summarize large bodies of text, it can even write poems and software code. Many other similar models (known as large language models) have been released since then, including ChatGPT’s more powerful successor GPT 4 (which also accepts image inputs) and versions from Google (Bard) and Meta (LLaMa). These models are trained on extremely large datasets (hundreds of terabytes) of web-scraped data, covering articles, books, academic papers, and content of all types. It’s partly because of this deep and wide understanding of human activity that these complex models can become so intelligent. There has been a dizzying rise in the capabilities of these models but also risks. It’s important we understand both to figure out how to use these tools properly.
At the core of these technologies is something called language models. A language model (LM) is a probability distribution over sequences of ‘words’ in human communications, calculated from text sources like books or websites. The probability of a sentence like “the mouse ate the cheese” occurring is much higher than the probability of “the cheese ate the mouse”.
The most straightforward way to build an LM is to take a collection of text and build the probability distribution by calculating how often any sequence of words occurs. Because this can be computationally expensive, it’s common to only consider a fixed length of words, like sequences of 5 words or less, known as a context window. Today, developers typically use neural networks, a complex class of machine learning algorithms, to estimate this probability distribution from a large body of text. This approach is taken because it allows developers to effectively scale to large data sets while using longer context windows.
LMs are part of a class of models known as generative AI, which have the capability to generate output like text, images (MidJourney) and even music (JukeBox). LMs can generate text given a prompt based on the probability distribution it learns. Learning the right ways to prompt text models is becoming a critical skill in and of itself. Users are starting to understand how to approach different conversations with a chatbot and even open-sourcing task-specific prompts.
Introducing Large Language Models
Large language models (LLMs) such as ChatGPT and GPT 4 are enabled by a combination of (1) more effective computation, (2) large bodies of open source data (and manually labeled data), and (3) algorithmic advancements that made it possible to effectively build models on such large datasets.
The ability to train such large models has been accelerated by more powerful GPU, or graphics processing units. The training process relies on carrying out calculations on many different nodes in parallel. This is something GPUs are extremely efficient at, much more so than the CPU processors typically used in computers. Additionally, GPUs have gotten significantly more effective over the past decade, which helps explain why the LLM boom has taken off recently.
Extremely large textual data sets have also powered the development of these models. LLMs are typically trained on public datasets composed of scraped internet data. One of the most common data sets is called Common Crawl; one month’s snapshot of data is roughly 300TB. This includes everything posted on websites, news articles, academic journals, and more. The size and the variety of this data enables LLMs to learn an incredible variety of topics. A recent trend is feeding human-labeled data into these models so that they learn explicit human feedback. In addition to internet-scraped data, ChatGPT was also trained with data provided manually labeled by humans on the basis of quality. This is one of the steps that helped make ChatGPT’s responses higher quality and more human-like than its predecessor, GPT 3.
There have also been algorithmic advancements that have allowed researchers to create more powerful models and to train these models more efficiently. The concept of transformers, a neural network algorithm introduced by Google Brain researchers in 2017, is the most significant. Transformers allowed more efficient training with multiple GPUs in parallel, and with longer context windows.
As a result of these advancements, the size of LLMs has increased at an astounding rate over the past five years, leading to the incredible capabilities of these models. The size of LLMs is often measured by the number of parameters, the internal weights used to map inputs to outputs. The original GPT (2018) had 110 million parameters, and GPT 3 (2020) had 175 billion. This is a thousand-fold increase in just two years! While OpenAI hasn’t said publicly, some sources estimate GPT 4 (March 2023) has a trillion parameters.
Capabilities of LLMs
Given their powerful capabilities, LLMs are used in many applications. This list is far from exhaustive but gives a sense of what types of tasks LLMs can take on.
- Generating, searching, summarizing and editing text: LLMs can transform jobs that create text, and those that rely heavily on reading and parsing large amounts of documents. Writers are now able to generate multiple paragraphs of text from a single prompt, specify the tone and delivery they want and iteratively add to an article. Copywriters can prompt the AI with previous examples so they can receive output consistent with their company’s character and writing style. They can also ask for feedback and editing on their own writing. This even extends to creative writing: the self-published section of Amazon is filling up with AI-generated novels. Lawyers, researchers, and analysts can rely on an LLM to summarize long documents, or ask specific questions of their contents. For example, a lawyer could parse many court filings to find specific clauses, or a financial analyst could ask for a quick summary of a company’s earnings report. This task is bound by the context window of the model but more powerful models should provide a longer window.
- Generate ideas: Users can leverage LLMs for a spark of creativity. Users have reported using ChatGPT to brainstorm activities for a kid’s birthday party, create recipes using a mixed selection of ingredients, create novel games from scratch and even develop an entirely new language. The technology has already assisted scientific research: generative AI is creating new proteins by understanding how their amino acids fold together. More powerful AI will have the creativity to power new advancements in research.
- Assisting software developers: Software developers have been using ChatGPT to generate code from a simple prompt, like “write code to reverse a string in c”. They can also provide it with a code snippet and ask it to create tests, ask it how to make code more efficient, or ask it to debug code. Replit is an example of this technology built into the editors where developers write code. These capabilities could change the requirements of software engineers from writing code themselves to figuring out how to prompt LLMs effectively.
- Simplifying/automating workflows: This works in two directions: First, users can accomplish tasks by writing a text prompt. With an embedded LLM, a manager could ask a chatbot to send email reminders to all her employees who haven’t completed sales training yet. This is powerful because it makes technical, potentially complex tasks accessible to many with natural language. Secondly, LLMs can accomplish tasks themselves and communicate the output in natural language, often much faster than a human. Microsoft recently released Microsoft 365 Copilot, which combines LLMs with a user’s data across all Microsoft Office products. With Copilot, users can create a Powerpoint presentation based on a text document, summarize sales data in a spreadsheet, or summarize a meeting based on the transcript. We expect these capabilities to permeate many complex systems and help people use them more effectively.
- Learning: Models like ChatGPT can benefit both teachers and students – or anyone who wants to learn something – by explaining many topics and doing so in a personalized way. Any learner will have access to huge amounts of information (LLMs gain a huge breadth of knowledge from being trained on text from the entire internet) and the model can respond to any question in plain English. Learners also benefit from personalized learning; they can approach a new topic by asking a chatbot to explain the topic in simplified terms and analogies, or they can ask for expert- or beginner-level descriptions. Learners will also gain the benefit of practice quizzes to assess their understanding and role play to emulate real conversations. Users have created prompts to instruct ChatGPT to act as a job interviewer in order to prepare for interviews and even to hold a conversation in a different language and ask for feedback on mistakes.
Risks of LLMs
As with any new technology, LLMs pose both benefits and risks. The benefits will be strong but the challenges posed are also powerful.
Since LLMs are trained with real-world data, they can exhibit the biases and prejudice present in that data. Despite guardrails, ChatGPT is still capable of bias; one researcher found ChatGPT was capable of some biased takes on race and gender. Due to their size, input data and complexity, it’s important not to assume that LLMs are incapable of bias.
One of the most noticeable risks is that these models aren’t always reliable or accurate. The underlying methodology of LLMs is based on pattern recognition (correlation analysis) on digital text and data, which is quite different from understanding logic, theory, and causality. This can lead to these models providing flat out wrong answers. Sometimes these models hallucinate and make things up that aren’t true. Meta’s science-based LLM, Galactica, frequently made up scientific articles. In high-stakes scenarios, a wrong answer could have severe consequences. Furthermore, it’s hard to tell when LLMs are wrong too because of their authoritative tone. This is still a problem in more modern LLMs like ChatGPT and GPT 4 but researchers are taking steps to lessen the risks. Nonetheless, users should still validate information they are given.
LLMs can be used to create toxic content and disinformation. Disinformation is currently expensive and slow: Russians need people fluent in English to create disinfo to Americans. But with ChatGPT-style models, it’s simple and free to create plausible, ideologically consistent disinformation that works towards the bad actor’s objective. Similarly, bad actors know have a tool to frictionlessly create dialogue for social engineering scams, hate speech and more.
Another class of risks has to do with security. A risk security professionals fear is called a poisoning attack, where a malicious actor creates training data with a specific goal in mind, such as creating a backdoor into a secure system. Because many LLMs are trained with text scraped from the internet, it’s possible a malicious actor could feed bad data into LLMs by simply uploading that data to the internet.
Lastly, LLMs can contribute to concentration of power. State-of-the-art generative AI is extremely expensive; it’s estimated that ChatGPT cost millions of dollars to train based on computation alone. There aren’t many organizations with the budgets to create AI of this size. Cutting-edge generative AI is also inaccessible; access to most LLMs is controlled by the publisher and limited access is typically given to users. Given the high cost and closed nature of these models, it’s possible that the benefits (in terms of financial gains, political power and militarizing risks) will become more concentrated to the few companies and governments with the budgets and deep technical skills to create these.
Conclusion
Large language models are a novel class of AI that have been made possible by a convergence of advancements in computing power, large data sets and machine learning methodology. The technology offers incredible benefits but also poses significant risks. Given this potential impact, it’s critical that we find ways to maximize the positive, while managing and curtailing the risks.