Ensuring Accuracy and Equity in Vaccination Information From ChatGPT and CDC: Mixed-Methods Cross-Language Evaluation
"ChatGPT holds potential as a health information resource but requires improvements in readability and linguistic equity to be truly effective for diverse populations."
In the digital age, large language models (LLMs) like ChatGPT have emerged as important sources of healthcare information. Their interactive capabilities offer promise for enhancing health access, particularly for groups facing traditional barriers such as insurance and language constraints. However, LLM-provided information can be of inconsistent quality and may overlook language barriers. With an emphasis on health equity, this study explores whether LLMs such as ChatGPT provide reliable health information in multiple languages. It highlights the critical need for cross-language evaluation to ensure equitable health information access for all linguistic groups.
The study compared responses to childhood-vaccination-related frequently asked questions from the United States (US) Centers for Disease Control and Prevention (CDC) and ChatGPT across accuracy, understandability, and readability dimensions in both English and Spanish, using both quantitative and qualitative approaches. Accuracy was gauged by the perceived level of misinformation; understandability was gauged by items from the US National Institutes of Health's Patient Education Materials Assessment Tool (PEMAT) instrument; and readability was gauged by the Flesch-Kincaid grade level and readability score.
Notably, the study centres on the natural querying behaviour exhibited by the majority of ChatGPT users, who typically engage with the system in a conversational manner, similar to their interactions with traditional search engines such as Google. This is particularly true for vulnerable populations seeking health information, who may not be aware of or use prompt engineering techniques.
The study found that both ChatGPT and CDC provided mostly accurate and understandable (e.g., scores over 95 out of 100) responses. However, Flesch-Kincaid grade levels often exceeded the American Medical Association's recommended levels, particularly in English (e.g., average grade level in English for ChatGPT = 12.84, Spanish = 7.93, recommended = 6). CDC responses outperformed ChatGPT in readability across both languages. Furthermore, some Spanish responses often appeared to be translations of the English ones, rather than independently generated, which lead to unnatural phrasing and could hinder information access for Spanish speakers.
In conclusion, "the default user experience with ChatGPT, typically encountered by those without advanced language and prompting skills, can significantly shape health perceptions. This is vital from a public health standpoint, as the majority of users will interact with LLMs in their most accessible form. Ensuring that default responses are accurate, understandable, and equitable is imperative for fostering informed health decisions across diverse communities."
JMIR Formative Research 2024;8:e60939. doi: 10.2196/60939. Image credit: Airam Dato-on via Pexels (free to use)
- Log in to post comments