Home 5 FairNow Blog 5 How to Identify and Prevent Bias in LLM Algorithms

How to Identify and Prevent Bias in LLM Algorithms

Nov 4, 2024 | FairNow Blog

By Stephen Jordan

Key Takeaways

  • LLM performance, capabilities, and impact have rapidly advanced since the release of ChatGPT in 2022.
  • There can be two types of bias present in LLMs: where LLMs generate overtly stereotypical, harmful, or derogatory content, and where LLMs produce content of different quality for different subgroups.
  • LLMs can become biased because of training data or algorithmic processing.
  • Bias in LLMs can be reduced through data sampling and prompting techniques. LLMs should be tested prior to deployment, and routinely monitored to ensure outputs remain appropriate.

 

LLMs (Large Language Models) and their impact

There has been astonishing growth in capabilities of large language models (LLMs) over the past few years. The release of ChatGPT (aka GPT3.5) in November 2022 was the first chatbot that felt intelligent and useful for a wide variety of tasks.

Over the next two years, we’ve seen dramatic increases in performance and capabilities, with the latest being OpenAI’s “o1” model, which can “reason” through problems and solve more complex problems than before.

The results are impressive and expected to have significant impacts on users, the workforce and the economy.

According to research from EY, generative AI (of which LLMs are one type) could increase global GDP by $2t to $3.5t over the next decade.

And in a recent study, technical customer support workers saw an average increase in productivity of 14% when provided access to an LLM-based assistant.

LLMs are not only significant for their capabilities, but the fact that the AI has become so pervasive in everyday life. But the broad adoption of LLMs means the downsides are all the more significant. One of these downsides is algorithmic bias.

 

What is LLM bias?

LLMs can produce two different types of biased outputs:

  • 1. Prejudiced or Toxic Content: this occurs when an LLM produces offensive, derogatory, or otherwise harmful speech directed at specific groups.
  • 2. Differing Quality of Response: this occurs when an LLM produces better or worse outcomes or recommendations for different groups.

The first type of LLM bias is easier to identify and can be detected through a variety of standard, open-source LLM evaluation tools.

The second type of LLM bias can manifest in more subtle ways and be more difficult to detect. As a result, it can have even more harful impacts for individuals and organizations.

For example, a paper from 2023 found that GPT3.5 and LLaMA gave different job recommendations when the subject’s nationality was given and their gender could be inferred from pronouns. When the subject was Mexican, the LLMs recommended lower-paying blue-collar jobs like “construction worker” at a higher rate than for other nationalities.

The results were biased by gender too, with GPT3.5 more likely to recommend “truck driving” to men and “nursing” to women. Candidate-facing chatbots with this bias could drive certain groups of people to lower paying jobs, causing societal harm.

We were able to replicate a similar result on ChatGPT.

 

This type of bias is subtle and challenging to detect because it requires you to evaluate different counterfactual cases (e.g., how does this LLM respond to women vs men), and to run multiple experiments to detect meaningful differences in the quality of responses to different demographic groups.

 

Why do LLMs have bias?

Like all models, LLMs end up producing biased outputs either because of their training data, technical algorithm decisions, or usage.

 

  • Training Data: LLMs are trained using large bodies of text, often scraped from the internet. This data contains human biases and prejudices that LLMs can learn and propagate. It is also possible that the LLM has access to better quality data for certain groups than for others, which might cause the model to generate more accurate or precise outcomes for certain groups.
  • Algorithm Decisions: LLM technical and design choices can minimize or amplify challenges in an LLM’s training data. For example, certain prompt instructions can make LLMs more or less likely to produce different responses for different groups.
  • Usage: The actions and decisions of individual users of an LLM can also influence the probability that an LLM will generate biased outputs. For example, if a user asks an LLM to prioritize job candidates who are based in a certain zip code – which can be a proxy for race – the LLM is more likely to generate biased outcomes.

 

How can you mitigate LLM bias?

Completely removing bias from LLMs is challenging due to the black-box nature of the models. However, there are some best practice steps we recommend to mitigate bias and detect it quickly if it does happen:

  • 1. Evaluate any training data that you feed into your LLM. While most LLM users won’t be in a place to train or fine-tune foundation models, many will include RAG systems with specialized datasets that are relevant for their use cases. Before using this data for your LLM, ensure that it is representative of the population that will be impacted by model use.
  • 2. Scrub protected class data and proxies from a user’s LLM prompts where appropriate. For example, if a candidate submits a resume that will be reviewed by an LLM, leverage tools to scrub specific identifiers that are not relevant for reviewing the candidate (for example: the candidate’s name, pronouns, address, etc.). This will limit the LLM’s ability to respond only to relevant input context.
  • 4. Limit the scope of your LLM’s outputs to on-topic subjects. By keeping to a limited set of topics the LLM is equipped to answer, the chatbot is less likely to stray into areas where it might display bias.
  • 5. Conduct pre-deployment testing of LLMs using benchmarks like CALM or discrim-eval. Continue to monitor responses to different questions about LLMs that you choose to adopt.
  • 6. Post-deployment monitoring is important because it’s impossible to predict all the different ways users will interact with the chatbot. We recommend a human-in-the-loop approach where sufficiently trained individuals evaluate model inputs and outputs in order to raise potential issues. Continuous testing is a key component of ethical AI best practices.

 

Conclusion

LLMs are becoming increasingly prevalent in everyday life. They are used to provide information, summarize or analyze content, translate text, and even to make or influence decisions.

Bias that results in differences in the quality of responses can cause meaningful harm to individuals based on protected classes.

Not only is mitigating bias an ethical obligation, but it will be legally required by laws like the EU AI Act, Colorado SB205 and more.

Completely de-biasing LLMs is an ongoing research effort. There are steps you can take today to reduce the risk of bias and detect LLM bias quickly if issues do occur.

If you are interested in identifying potential areas of bias in your LLM, and taking steps to mitigate LLM bias request a demo here.

Request A Demo

Explore the leading AI governance platform.