Meet Patronus AI’s ‘Lynx’: The open-source bullshit detector outsmarting GPT-4

by | Jul 11, 2024 | Technology

We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More

Patronus AI, a New York-based startup, unveiled Lynx today, an open-source model designed to detect and mitigate hallucinations in large language models (LLMs). This breakthrough could reshape enterprise AI adoption as businesses across sectors grapple with the reliability of AI-generated content.

Lynx outperforms industry giants like OpenAI’s GPT-4 and Anthropic’s Claude 3 in hallucination detection tasks, representing a significant leap forward in AI trustworthiness. Patronus AI reports that Lynx achieved 8.3% higher accuracy than GPT-4 in detecting medical inaccuracies and surpassed GPT-3.5 by 29% across all tasks.

A comparison of AI model responses to a botany question, with Patronus AI’s Lynx model (bottom) correctly identifying a flaw in the answer that competing models from OpenAI and Anthropic missed. (Credit: Patronus AI)

Battling AI’s imagination: How Lynx detects and corrects LLM hallucinations

Anand Kannappan, CEO of Patronus AI, explained the significance of this development in an interview with VentureBeat. “Hallucinations in large language models occur when the AI generates information that is false or misleading, making things up as if they were facts,” he said. “For enterprises, this can lead to incorrect decision-making, misinformation, and a loss of trust from clients and customers.”

Patronus AI also released HaluBench, a new benchmark for evaluating AI model faithfulness in real-world scenarios. This tool stands out for its inclusion of domain-specific tasks in finance and medicine, areas where accuracy is crucial.

Register to access VB Transform On-Demand

In-person passes for VB Transform 2024 are now sold out! Don’t miss out—register now for exclusive on-demand access available after the conference. Learn More

“Industries that deal with sensitive and precise information, such as finance, healthcare, le …

Article Attribution | Read More at Article Source

[mwai_chat context=”Let’s have a discussion about this article:nn
We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More

Patronus AI, a New York-based startup, unveiled Lynx today, an open-source model designed to detect and mitigate hallucinations in large language models (LLMs). This breakthrough could reshape enterprise AI adoption as businesses across sectors grapple with the reliability of AI-generated content.

Lynx outperforms industry giants like OpenAI’s GPT-4 and Anthropic’s Claude 3 in hallucination detection tasks, representing a significant leap forward in AI trustworthiness. Patronus AI reports that Lynx achieved 8.3% higher accuracy than GPT-4 in detecting medical inaccuracies and surpassed GPT-3.5 by 29% across all tasks.

A comparison of AI model responses to a botany question, with Patronus AI’s Lynx model (bottom) correctly identifying a flaw in the answer that competing models from OpenAI and Anthropic missed. (Credit: Patronus AI)

Battling AI’s imagination: How Lynx detects and corrects LLM hallucinations

Anand Kannappan, CEO of Patronus AI, explained the significance of this development in an interview with VentureBeat. “Hallucinations in large language models occur when the AI generates information that is false or misleading, making things up as if they were facts,” he said. “For enterprises, this can lead to incorrect decision-making, misinformation, and a loss of trust from clients and customers.”

Patronus AI also released HaluBench, a new benchmark for evaluating AI model faithfulness in real-world scenarios. This tool stands out for its inclusion of domain-specific tasks in finance and medicine, areas where accuracy is crucial.

Register to access VB Transform On-Demand

In-person passes for VB Transform 2024 are now sold out! Don’t miss out—register now for exclusive on-demand access available after the conference. Learn More

“Industries that deal with sensitive and precise information, such as finance, healthcare, le …nnDiscussion:nn” ai_name=”RocketNews AI: ” start_sentence=”Can I tell you more about this article?” text_input_placeholder=”Type ‘Yes'”]

Share This