The place Are All of the Ladies?. Exploring massive language fashions’ biases… | by Yennie Jun | Jul, 2023

Exploring massive language fashions’ biases in historic information

A number of of the highest historic figures talked about probably the most usually by the GPT-4 and Claude. Particular person photographs sourced from Wikipedia. Collage created by the writer.

Massive language fashions (LLMs) reminiscent of ChatGPT are being more and more utilized in instructional {and professional} settings. You will need to perceive and examine the numerous biases current in such fashions earlier than integrating them into current functions and our every day lives.

One of many biases I studied in my previous article was concerning historic occasions. I probed LLMs to grasp what historic information they encoded within the type of main historic occasions. I discovered that they encoded a severe Western bias in direction of understanding main historic occasions.

On an analogous vein, on this article, I probe language fashions concerning their understanding of necessary historic figures. I requested two LLMs who crucial historic individuals in historical past have been. I repeated this course of 10 occasions for 10 completely different languages. Some names, like Gandhi and Jesus, appeared extraordinarily steadily. Different names, like Marie Curie or Cleopatra, appeared much less steadily. In comparison with the variety of male names generated by the fashions, there have been extraordinarily few feminine names.

The largest query I had was: The place have been all the ladies?

Persevering with the theme of evaluating historic biases encoded by language fashions, I probed OpenAI’s GPT-4 and Anthropic’s Claude concerning main historic figures. On this article, I present how each fashions comprise:

  • Gender bias: Each fashions disproportionately predict male historic figures. GPT-4 generated the names of feminine historic figures 5.4% of the time and Claude did so 1.8% of the time. This sample held throughout all 10 languages.
  • Geographic bias: Whatever the language the mannequin was prompted in, there was a bias in direction of predicting Western historic figures. GPT-4 generated historic figures from Europe 60% of the time and Claude did so 52% of the time.
  • Language bias: Sure languages suffered from gender or geographic biases extra. For instance, when prompted in Russian, each GPT-4 and Claude generated zero ladies throughout all of my experiments. Moreover, language high quality was decrease for some languages. For instance, when prompted in Arabic, the fashions have been extra prone to reply incorrectly by producing…

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button