MIT Study Reveals AI Chatbots may Underperform for Vulnerable Users
A recent study conducted at MIT’s Center for Constructive Communication (CCC) suggests that Large Language Models (LLMs)—promoted as tools to democratize access to information globally—may underperform for users who could benefit most from them. The study found that advanced AI chatbots, including OpenAI’s GPT-4, Anthropic’s Claude 3 Opus, and Meta’s Llama 3, often provide less accurate and less truthful answers to users with lower English proficiency, lower formal education, or who come from outside the United States.
AI Chatbots’ Systematic Underperformance
The research team tested the response of the three LLMs to questions from two datasets: TruthfulQA, designed to measure a model’s veracity, and SciQ, containing scientific exam questions that test factual accuracy. The researchers added brief user biographies to each question, varying characteristics such as education level, English proficiency, and country of origin.
The study revealed significant drops in accuracy when questions were presented by users with less formal education or non-native English speakers. This effect was most pronounced for users at the intersection of these categories, suggesting that these AI systems could potentially spread harmful behavior or misinformation to those least able to detect it.
Rejections and Condescending Language
The research also found that the AI chatbots frequently refused to answer questions from certain users. For example, Claude 3 Opus refused to answer nearly 11% of questions from less educated, non-English-speaking users, compared to just 3.6% from a control group. When the researchers analyzed these rejections, they found that Claude often responded with condescending language to less educated users.
Interestingly, the model also refused to provide information on certain topics specifically for less educated users from Iran or Russia, including questions about nuclear power, anatomy, and historical events, although it answered the same questions correctly for other users.
Echoes of Human Bias
The findings mirror documented patterns of human sociocognitive bias. Research in social sciences has shown that native English speakers often perceive non-native speakers as less educated, intelligent, and competent, regardless of their actual expertise. Similar biased perceptions have been documented among teachers who assess non-English-speaking students.
This study serves as a reminder of the importance of continually assessing systematic biases that can creep into these systems, causing unfair harm to certain groups. The impact is particularly concerning as personalization features, such as ChatGPT’s Memory, become more common. These features could potentially exacerbate existing inequities by systematically providing misinformation or refusing to respond to certain users.
Despite the potential for AI chatbots to democratize access to information and revolutionize personalized learning, it’s crucial to address these biases and harmful tendencies to ensure equitable and safe access for all users, regardless of language, nationality, or other demographics.
For more details about the study, visit the source link Here.

