2024 Winner: Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

Project Information

Title of Project Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

Division Engineering

Course or Program Artificial Intelligence Explainability and Accountability (AIEA) Lab

Description / Abstract In our study, we looked into how well Large Language Models (LLMs), like ChatGPT, can explain their own decisions. These LLMs are great at understanding and generating human-like text, and can even explain their own thought process. For example, if ChatGPT is asked to figure out if a movie review is positive or negative, it can also explain why it thinks so by pointing out specific words like "fantastic" or "memorable."

We focused on a specific kind of task called sentiment analysis, where the goal is to understand the emotional tone of a text. We also looked at how these explanations are given, specifically focusing on a method called feature attribution, which has been widely used to understand how previous models before ChatGPT worked. Our main goal was to see if the explanations given by ChatGPT can measure up to other widely accepted explanation techniques.

To do this, we conducted a series of experiments to see how ChatGPT's explanations stack up against older methods, like occlusion (hiding parts of the text to see what matters most) or LIME (a technique to highlight important words for the model's decision). We found that ChatGPT's explanations are just as good as these older methods, even though they are quite different. Plus, ChatGPT's explanations are easier and cheaper to get because they are made automatically alongside the model's decision.

Our findings also made us notice some unique aspects of ChatGPT's explanations, leading us to question and possibly rethink how we understand and evaluate the way models like ChatGPT "think" about the text they analyze. This could change how we approach the task of making these models more transparent and understandable in the future.

URL https://arxiv.org/abs/2310.11207

PDF

1605.pdf

Students

Siddarth P Mamidanna (Crown)
Shiyuan Huang (Merrill)
Shreedhar Anil Jangam (Ten)

Mentors

Leilani H. Gilpin

You are here

2024 Winner: Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations