AI SafetyOct 18, 2024

Detecting LLM Hallucinations with Semantic Entropy

A method for identifying unstable model answers by measuring meaning-level variation across generations.

Audio companion: Open on Spotify

A Message from the CEO

Large language models (LLMs) like ChatGPT, Gemini, and Llama are tools with immense potential to change how we work and live. Imagine AI systems that can draft legal documents, provide accurate medical diagnoses, and even create captivating narratives. However, a critical issue stands in the way of widespread adoption: reliability. LLMs have a tendency to "hallucinate", meaning they fabricate information that may sound plausible but is ultimately untrue. This can have serious consequences, particularly in high-stakes fields.

As a company committed to developing cutting-edge AI solutions, we recognize the importance of addressing this challenge head-on. We believe that trust is paramount in any technological advancement, and unreliable AI is simply not acceptable. That's why we're excited about a research from the University of Oxford that presents a move forward in detecting and mitigating LLM hallucinations.

This research paves the way for more reliable and trustworthy AI systems that can be confidently deployed in various domains. The ability to flag potentially inaccurate information empowers users to exercise caution and seek verification when necessary. Ultimately, this leads to a more responsible and beneficial use of AI.

We remain dedicated to advancing AI technology in a safe and ethical manner. By investing in research that addresses critical challenges like LLM hallucinations, we are committed to building a future where AI is a trusted partner in our endeavors.

A Deep Dive into Semantic Entropy and Confabulation Detection

A new method, termed "semantic entropy," addresses the challenge of detecting LLM confabulations. Confabulations are a subset of hallucinations where LLMs produce incorrect and arbitrary answers. These answers are particularly problematic as they are sensitive to factors like the random seed used during generation and may change with repeated queries, even if the input remains the same.

This method tackles the difficult task of measuring uncertainty in free-form text generation, a key aspect of detecting confabulations. Previous uncertainty estimation methods are ill-suited for this setting as they focus on simpler tasks like classification or regression, or rely on naive entropy calculations that are confounded by variations in phrasing that don't affect meaning.

How It Works

Semantic entropy focuses on the meaning of generated text, rather than simply the sequence of words produced. It works by:

Sampling: Generating multiple possible answers to a given question using the LLM.
Clustering: Grouping answers with similar meanings based on whether they entail each other. This involves using natural language inference (NLI) techniques to assess semantic equivalence.
Calculating Entropy: Determining the entropy of the distribution over the clusters of meaning. High entropy indicates high uncertainty and a greater likelihood of confabulation.

This approach goes beyond simple lexical comparisons, allowing the model to recognize that answers like "Paris," "It's Paris," and "France's capital Paris" convey the same information despite their syntactic differences.

Key Findings

The study published in Nature highlights that:

Superior Performance: Semantic entropy significantly outperforms existing methods in detecting confabulations across a range of tasks, datasets, and LLMs, including GPT-4 and LLaMA 2.
Robustness: The method works without prior knowledge of specific tasks or datasets, demonstrating its generalizability and suitability for real-world scenarios.
Accuracy Improvement: By selectively refusing to answer questions likely to cause confabulations, semantic entropy can substantially enhance the accuracy of LLM-based question answering systems.

Beyond Single Sentences while initially designed for sentence-length answers, semantic entropy can be extended to handle longer passages of text, such as biographies. This involves:

Decomposition: Breaking down the generated text into individual factual claims.
Question Reconstruction: Generating questions that could have been answered by the extracted claims.
Entropy Calculation: Applying semantic entropy to each question-answer pair and averaging the scores to obtain an overall uncertainty estimate for the passage.

This demonstrates the versatility of semantic entropy in detecting confabulations across different text lengths and complexities.

Limitations and Future Directions

While semantic entropy is a powerful tool for detecting confabulations, it's crucial to acknowledge that:

It primarily addresses uncertainty stemming from a lack of LLM knowledge. It's not designed to handle situations where LLMs are confidently wrong due to biases in their training data, systematic reasoning errors, or intentional deception.
The success of semantic entropy hinges on the effectiveness of the semantic clustering and entailment methods used. These components may require careful tuning and refinement depending on the specific context and task.

Further research is needed to develop methods that address other types of LLM errors and enhance the reliability of these models across a broader range of scenarios. Nonetheless, semantic entropy represents a significant step forward in mitigating the problem of confabulations, enabling the development of more trustworthy AI systems.

Further Reading

This article draws from research published in the following paper:

Farquhar, S., Kossen, J., Kuhn, L. & Gal, Y. "Detecting hallucinations in large language models using semantic entropy." Nature (2024).