Large Reasoning Models: An AI Breakthrough That Hides Poor Quality

The introduction of large reasoning models (LRMs) is an exciting development in artificial intelligence. These systems, which evaluate their own outputs and select the best response, represent a step forward in improving AI's reliability and reasoning. But their arrival prompts a bigger question: What were we getting before?

If AI tools now have mechanisms to check their own outputs, it highlights the fact that earlier systems lacked this basic layer of quality control. It underscores how much AI has been released into the world with fundamental flaws left unaddressed, from data biases and bizarre hallucinations to the opacity of how outputs are actually generated.

Why Is AI Getting a Free Ride?

In most major industries—whether it’s transport, pharmaceuticals, or construction—quality control is non-negotiable. Products and systems are subjected to rigorous safety and reliability standards before they’re released to the public. AI, by contrast, has largely escaped this scrutiny. The tech sector often rolls out tools that are incomplete, imperfect, and sometimes unsafe, with the assumption that users will uncover the problems over time.

Why This Matters

AI’s quality problems aren’t just a technical issue; they have real-world consequences.

  • Jobs and the Economy: AI is already reshaping employment.. Poor-quality AI systems risk compounding inequalities or misdirecting economic opportunities.
  • Social Impact: AI influences how people think, act, and make decisions. If flawed systems are accepted uncritically, they can perpetuate biases, spread misinformation, and erode trust in institutions.
  • GIGO (Garbage In, Garbage Out): The tech world loves this phrase to highlight how flawed inputs lead to flawed outputs. But when flawed AI systems are released, the technology itself becomes the garbage. Each bias, hallucination, or failure erodes trust, making it harder for people to believe in AI’s potential—or the promises of its developers.

LRMs, while a step forward, highlight how much work remains to be done. AI’s development needs to shift from prioritizing speed to prioritizing reliability and trustworthiness.

The Way Forward

To address these issues, different approaches are needed for AI’s two primary use cases:

  1. Business-to-Consumer (B2C): AI tools that interact directly with the public—whether through search engines, chatbots, or recommendations—need to be subject to regulation. Standards should ensure these systems are accurate, fair, and transparent before they’re released, just as cars or medications must meet safety benchmarks.
  2. Business-to-Business (B2B): Companies adopting AI must proceed with caution. They need to fully understand what they’re buying, including the system’s limitations, risks, and the work required to integrate it effectively. Vendors should be transparent about their tools’ capabilities and flaws, ensuring businesses can make informed decisions.

The Big Picture

The race to dominate the AI market is driving rapid innovation, but it’s also leading to the release of systems that are, at their core, still of poor quality. Large reasoning models deserve celebration for their advancement, but they also highlight the underlying reality: AI is far from perfect.

This isn’t the first time we’ve seen this pattern. From oil to opioids, industries have often prioritized speed and market dominance over safety and accountability, leaving society to grapple with the consequences. Let’s not repeat those mistakes.

AI has immense potential, but breakthroughs like LRMs reveal an uncomfortable truth: many systems are still of poor quality, and the race to market dominance has left critical gaps in safety and accountability. Let’s celebrate progress while ensuring we address the underlying issues—before they become unmanageable.

If you’re considering adopting AI tools or exploring their potential, reach out to discuss how to navigate the risks and maximize the benefits. It’s time to make AI work for you—not the other way around.