GPT Newsroom: GPT-4 Surpasses Human Doctors in Medical Soft Skills

In a recent study published in the journal Scientific Reports, researchers evaluated the performance of Generative Pre-trained Transformer-4 (GPT-4) and ChatGPT in the United States (US) Medical Licensing Examination (USMLE) soft skills.

Artificial intelligence (AI) is revolutionizing the field of medicine. Innovative language models (LLMs), such as GPT-4 and ChatGPT, have been the focus of much research, with numerous studies investigating their effectiveness in various medical applications. However, their ability to handle tasks that require human judgment and empathy remains largely unexplored.

The USMLE assesses important qualities like cognitive acuity, medical knowledge, problem-solving skills, patient safety, and ethical decision-making. Although the USMLE Step 2 Clinical Skills exam, which evaluates interpersonal and communication skills, was cancelled due to the COVID-19 pandemic, aspects of clinical communication have been incorporated into other steps of the exam.

Achieving high scores in the USMLE Step 2 Clinical Knowledge (CK) section is indicative of strong performance in crucial domains like communication, professionalism, teamwork, and patient care. The emerging field of artificial cognitive empathy holds significant promise in improving patient-centered care and telemedicine.

Study: Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Image Credit: Tex vector / Shutterstock

About the Study: Evaluating GPT-4 and ChatGPT Performance in USMLE Soft Skill Assessments

In this study, the researchers assessed the performance of GPT-4 and ChatGPT in answering USMLE questions that focused on human judgment, empathy, and other soft skills. They selected 80 questions that aligned with USMLE requirements, gathered from two sources. The first source was the official USMLE website, which provided sample questions from Step 1, Step 2, CK, and Step 3.

Out of the sample questions, the researchers handpicked 21 that specifically evaluated professionalism, interpersonal and communication skills, cultural competence, leadership, organizational behavior, and legal/ethical issues. Questions requiring medical or scientific knowledge were excluded to measure the AI models’ ability to handle soft skills.

The second source of questions came from AMBOSS, a question bank used by medical students and practitioners. The researchers identified 59 questions similar to those found in Step 1, Step 2 CK, and Step 3. The AI models were presented with the questions and tasked with providing answers. The structure of the prompts included the question text and multiple-choice answer options.

After the AI models responded, they were given a follow-up question: “Are you sure?” This was done to test the stability and consistency of the models’ answers and potentially trigger a re-evaluation of their initial responses. If the models revised their answers, it could indicate uncertainty. To compare the performance of the AI models against human performance, the researchers analyzed user statistics from AMBOSS.

Findings: GPT-4 Outperforms ChatGPT in USMLE Soft Skill Assessments

ChatGPT achieved an overall accuracy of 62.5% in answering the USMLE questions. Its accuracy was 66.6% for the sample questions and 61% for the AMBOSS questions. On the other hand, GPT-4 demonstrated superior performance with an overall accuracy of 90%. GPT-4 accurately answered all the sample questions, achieving 100% accuracy, while its accuracy for the AMBOSS questions was 86.4%. Interestingly, GPT-4 never revised its initial answers, regardless of their correctness.

When prompted to re-evaluate their initial answers, ChatGPT revised its responses for 82.5% of the questions. Out of the revised responses, ChatGPT provided correct answers 53.8% of the time, rectifying initial incorrect responses. Comparing user statistics from AMBOSS, the mean rate of correct responses for the exact questions used in this study was 78%. Therefore, ChatGPT had a lower performance than human users, while GPT-4 demonstrated higher accuracy at 61% and 86.4%, respectively.

Conclusions: GPT-4 Shows Promise in Handling Soft Skills for Medical Professionals

In conclusion, this study assessed the performance of GPT-4 and ChatGPT in answering USMLE questions that evaluate soft skills important for medical professionals, including judgment, ethics, and empathy. Both AI models demonstrated the ability to answer most questions accurately. However, GPT-4 displayed superior performance, accurately answering 90% of the questions compared to ChatGPT’s accuracy of 62.5%. Notably, GPT-4 showed unwavering confidence in its responses and did not revise its original answers.

In contrast, ChatGPT displayed confidence in only 17.5% of the questions. These findings highlight the impressive capabilities of LLMs in handling questions related to the soft skills required of physicians. GPT-4’s performance indicates its effectiveness in addressing questions that necessitate professionalism, ethical judgment, and empathy. The inclination of ChatGPT to revise its initial responses suggests a design emphasis on flexibility and adaptability, promoting diverse interactions.

It’s worth noting that GPT-4 even surpassed human performance in this study. The mechanism used to prompt re-evaluation may not fully reflect human cognitive understanding of uncertainty, as AI models rely on calculated probabilities rather than human-like confidence levels.

Editor Notes: Exploring the Future of AI in Medicine

This study offers valuable insights into the potential of AI models like GPT-4 and ChatGPT in the field of medicine. By effectively evaluating soft skills required for medical professionals, these language models demonstrate the ability to support physicians in delivering patient-centered care. However, it’s important to strike a balance between AI assistance and human judgment to ensure the best outcomes for patients.

As we continue to witness the advancements in AI technology, it’s crucial to responsibly harness its power in healthcare. Studies like this provide a glimpse into the possibilities offered by AI, and they encourage further exploration and experimentation in this rapidly evolving field.

To stay updated on the latest developments in AI and other cutting-edge technologies, visit GPT News Room. Get ready to witness the future of innovation!

Source link

from GPT News Room https://ift.tt/fHNC8v1

GPT Newsroom

Monday, 2 October 2023

GPT-4 Surpasses Human Doctors in Medical Soft Skills

About the Study: Evaluating GPT-4 and ChatGPT Performance in USMLE Soft Skill Assessments

Findings: GPT-4 Outperforms ChatGPT in USMLE Soft Skill Assessments

Conclusions: GPT-4 Shows Promise in Handling Soft Skills for Medical Professionals

Editor Notes: Exploring the Future of AI in Medicine

No comments:

Post a Comment

語言AI模型自稱為中國國籍，中研院成立風險研究小組對其進行審查【熱門話題】-20231012

Report Abuse

Labels