Large Reasoning Models: The AI That Actually Shows Its Work

Imagine you are in a Math Class, and the teacher just gave a complex problem for everyone to solve. Then, after a few minutes, a classmate just shouted an answer. You’re all shocked because how did he come up with it? Was it the correct answer or they just made a guess? When the teacher asked them to elaborate they refused and said that they already did their work.

This scenario displays the frustrations with traditional AI systems: fast answers with zero transparency.

Now imagine that same classmate going up to the board, writing all of the formulas and calculations, walking you through every step while double-checking the result. You would all believe him because they should the proper solution.

This is what Large Reasoning Models (LRMs) bring to artificial intelligence—the ability to show their work. Instead of mysterious predictions, LRMs generate visible chains of thought that you can follow, question, and verify.

When AI shows its reasoning, something fundamental changes in our relationship with technology. You can spot errors in its logic. You can understand how it reaches conclusions. Most importantly, you can make informed decisions about when to trust it, and when not to.

So how did we get here? Let’s rewind and explore the developments that made the Large Reasoning Models possible…

The Problem with Traditional AI

The traditional Large Language Models (LLMs) that are currently familiar with works like a big probability machine, where they intake millions of tokens and data, and one task.

Given all previous tokens, predict the next one.

After taking in tons of text from across the internet, they become really good at continuing any sequence of words in a way that sounds human. They can talk about almost anything convincingly, but they’re essentially making educated guesses based on patterns they’ve seen.

This approach took us surprisingly far, but it has a major flaw: these models can’t distinguish between what they know and what they’re just making up. They’re built to keep the conversation flowing, not to verify facts.

We call this problem “hallucination” when AI confidently serves up completely wrong information. And it happens more than you might think.

A 2024 study revealed something alarming: when ChatGPT 3.5 cited references to back up its claims, a whopping 40% of those references were completely made up. Even the more advanced GPT-4 still fabricated 29% of its citations, imagine nearly one-third of a research paper’s references being fictional.

This creates what experts call the “black-box trust gap.” How can you rely on something that can’t explain its reasoning and might be confidently making things up? It’s like taking advice from someone who refuses to explain their thinking and has a documented history of inventing facts.

This is where Large Reasoning Models come in. Instead of just predicting what sounds right, they’re designed to think through problems step by step, showing their work along the way – just like we expect from professionals in high-stakes fields.

Enter Large Reasoning Models

Everything changed on September 12, 2024. That’s when OpenAI dropped the o1-preview to ChatGPT Plus subscribers, and suddenly AI wasn’t just matching the patterns anymore, it was thinking.

What makes these new Large Reasoning Models (LRMs) so important? Unlike traditional AI that gives answers in one go, these models take their time, much like you might when solving a difficult problem.

Imagine having a scratch pad where you work out your thoughts before presenting your final answer, this is what’s happening behind the scenes.

One of the most powerful approaches is the STaR algorithm (Self-Taught Reasoner). Think of it as AI that tries multiple approaches to solve a problem, evaluates which reasoning path looks most promising, and then doubles down on that path. It’s like a chess player considering several moves, evaluating each one, then choosing the best strategy.

In practice, this looks something like:

The model generates several potential reasoning paths
It scores each path based on how likely it leads to the correct answer
It selects the most promising path and continues reasoning from there
It will repeat the process several times before settling with the final answer.

Another thing that improved is the chain-of-thought process. Instead of emitting one final stream of text, the LRM creates a hidden “thought-chain” where the developers can peek to reveal how the model processes and mimic the thinking process of a human.

The process works in three main steps:

Problem Analysis: The AI first divides a complex problems into smaller, more manageable pieces. For a math problem, it might find what’s being asked and which formulas might apply.
Step-by-Step Reasoning: Then it works through each piece, showing its logical progression. “If A is true, then B must follow. Given C, we can determine D…” This creates a clear path from question to answer.
Verification: Most impressively, the AI can actually check its own work and fix errors it finds along the way. “Wait, that doesn’t make sense because…” or “Let me double-check this calculation…” This self-correction capability is what truly sets LRMs apart.

The Result Now Speak for Themselves

The numbers shows that the reasoning models are absolutely crushing it. For example, on the 2024 American Invitational Mathematics Examination (AIME), the standard GPT-4o model barely solved 12% of the problems. But when OpenAI’s o1-preview came along with its reasoning capabilities? That number skyrocketed to 74% – more than six times better.

And it gets even more impressive. The newer o3 model isn’t just good, it’s really good. It scored high enough to earn a gold medal at the 2024 International Olympiad in Informatics and achieved a CodeForces rating that puts it shoulder-to-shoulder with the world’s top programmers. This is an AI that can solve problems at the same level as human experts who’ve spent years improving their skills.

These improvements extend beyond mathematics:

Scientific reasoning – These models can now tackle complex scientific concepts and follow multi-step processes that is not possible before.
Code generation – Programmers are getting more reliable assistance with significantly fewer bugs, the kind of help that actually saves time rather than creating new problems
Complex planning – Need to break down a complicated project? These models excel at turning overwhelming tasks into manageable steps
Logical reasoning – Puzzles and logical problems that require deep thinking are now more possible with LRMs.

These results are possible in various fields within the industry.

Real-World Applications

As reasoning models get smarter, they’re showing up everywhere. These AI systems really shine when tasks require careful thinking, solving tricky problems and when getting the right answer matters more than getting it quickly.

Education and Tutoring

Khan Academy’s Khanmigo is changing how kids learn math. Imagine an eighth-grader struggling with a quadratic equation. Instead of just giving answers, Khanmigo walks through the factorization process in real-time, gently pointing out exactly where the student went wrong, and offers a helpful hint to get them back on track. It’s like having a patient tutor who shows their work right alongside yours. The tool earned a four-star “transparency & learning” rating from Common Sense Media in March 2025, a bit above general-purpose chatbots like Bard.

Research and Analysis

Researchers using Elicit are saving countless hours since it upgraded to reasoning-based AI last year. Now when you’re looking for evidence, it doesn’t just throw papers at you, it explains which databases it searched, why certain studies didn’t make it, and exactly where in the PDF each important claim appears. Scientists can dive into these detailed notes or collapse them for a quick overview, making evidence gathering for peer-reviewed work faster and more efficient.

Business and Finance

Morgan Stanley has equipped its 15,000 wealth advisors with reasoning AI that transforms financial planning. Advisors can now map out cash flows while seeing every tax assumption and discount factor clearly laid out for compliance checks. It’s like having a financial expert who shows all their math. The firm reports that portfolio research now takes 40% less time – that’s nearly half a workday saved on each analysis.

Healthcare

In China, 42 hospitals are testing DeepSeek-R1 in patient triage. When you’re not feeling well, this system asks questions about your symptoms, then displays its diagnostic thinking right on screen for doctors to review, clearly marking uncertain conclusions in red. Recent data shows it correctly diagnosing 93% of common respiratory issues, with doctors agreeing with its top suggestion nearly three-quarters of the time. The transparency means doctors can quickly spot where the AI’s reasoning aligns with or differs from their own judgment.

Software Engineering

GitHub’s Copilot Coding Agent is revolutionizing how developers debug code. Instead of mysterious fixes, it creates a test environment, methodically tracks down bugs, documents each step in the process, and then suggests a solution. Developers can review the AI’s entire thought process before accepting changes. This transparency is paying off big time – when teams take time to review the reasoning, coding errors after updates drop by 37%.

Challenges and Limitations

Despite all their promise, reasoning models face some serious hurdles that keep them from becoming the go-to AI solution for everyone.

Computational Costs

Let’s talk about the elephant in the room: these things are expensive to run. Thinking takes time and energy, whether you’re human or AI. When ArtificialAnalysis tested o1-mini, they found it takes a noticeable 8.4 seconds just to start responding and costs about 2 dollars per million tokens – roughly double what you’d pay for the speedier but less thoughtful GPT-3.5.

In real-world terms, reasoning models can cost anywhere from 10 to 30 times more to operate than standard AI. That’s like comparing the operating cost of a luxury sports car to a normal van – fine for special occasions but way too pricey for everyday errands.

Energy & Environment

These large reasoning models are also a major environmental risk. All those extra computational cycles add up fast. A recent Frontiers study found something alarming: reasoning models can produce up to 50 times more carbon emissions than their more straightforward cousins when answering the same questions.

It’s a classic case of “better but worse” because we’re getting more reliable answers, but at what cost to our planet? As these systems become more popular, their carbon footprint could become a serious concern.

Verification and Trust

Even more concerning is what researchers at Google and CMU discovered: while reasoning models hallucinate less often, they can actually make some errors harder to catch. The very process that makes them more trustworthy can sometimes create such a convincing argument that obvious red flags get hidden. It’s like a confident student who shows detailed work that looks right at a glance but contains a fundamental mistake that’s harder to spot because of all the surrounding correct steps.

Regulation, Bias & Data Privacy

Starting this August, the EU’s new AI Act is changing the game for tech companies. It’s not just a suggestion – it’s a comprehensive law demanding that AI systems be transparent, properly licensed, tested for bias, and accountable for their environmental impact.

For reasoning models, there’s good news and bad news. The good news? Their step-by-step explanations help tick the transparency box that regulators want. The bad news? Those detailed reasoning trails might accidentally expose things they shouldn’t. Think about it: when an AI walks through its thinking process in a healthcare setting, might it inadvertently reveal sensitive patient information? Or when analyzing financial data, could its reasoning expose proprietary trading strategies or confidential business information?

This creates a new technical challenge. Before these reasoning models can be safely deployed in sensitive areas like finance or healthcare, we need sophisticated tools that can automatically identify and redact sensitive information while preserving the logical flow that makes them valuable in the first place.

The Future of Reasoning AI

Looking ahead, we’re seeing some fascinating trends in how LRMs (Large Reasoning Models) are evolving.

Technical Improvements

The research community is laser-focused on making these models faster and cheaper. Google has already shown something pretty clever – using a small “draft model” to sketch out thinking patterns that the bigger model just needs to review and fix. This smart approach cuts costs by up to 75% without sacrificing accuracy.

Meanwhile, Deepseek’s FlashReasoner is taking a different angle by trimming unnecessary reasoning steps on the fly, boosting speed by about 30%. At this rate, near-instant reasoning AI isn’t a question of if, but when.

Hybrid Approaches

The smartest solutions will likely mix and match different AI approaches – quick models for straightforward tasks, deeper reasoning models when things get complex. Think of it as “fast first, deep when needed” depending on what you’re trying to accomplish. Some platforms are getting even more sophisticated with team-based approaches: one model drafts, another critiques, and a third verifies – creating a powerful quality-control pipeline.

Democratization

Thanks to fierce competition between tech giants, these powerful reasoning tools are becoming more affordable for everyone. This means schools, startups, and independent developers can now access capabilities that were once reserved for big players with deep pockets.

Soon, we’ll see reasoning AI woven into everyday tools – from word processors that not only fix spelling but explain writing suggestions, to CRM systems that offer genuine insights, to design applications that can explain the principles behind their layout recommendations. The power of reasoning AI is steadily becoming available to all of us.

Getting Started with LRMs

Curious about jumping into the world of reasoning AI? Here’s how you can dive in and start exploring these fascinating tools:

Trying Existing Models

Ready to see these thinking machines in action? You’ve got several great options:

OpenAI o1 – If you’re already a ChatGPT Plus subscriber, you can start experimenting right away. Developers can also access it through the API to build their own applications.
DeepSeek-R1 – This powerful reasoning model is available across various platforms and through developer APIs, giving you flexibility in how you engage with it.
Open-source alternatives – Not ready to commit to a subscription? Check out Hugging Face and similar platforms where community-developed reasoning models are freely available.

Best Practices

When working with reasoning models:

Challenge them with meaty problems that need step-by-step thinking – that’s where they really shine compared to standard AI.
Try the same question with both reasoning and traditional models to see the difference in approach – it’s often eye-opening!
Double-check their reasoning paths, especially when the stakes are high. While impressive, they’re not always right.

Conclusion

Large Reasoning Models are changing the AI game. By teaching machines to solve problems step by step, we’ve created AI that’s not just smarter, but more reliable when tackling real-world challenges.

Sure, the technology is still young, it’s expensive, sometimes slow, and has limitations. But innovation is moving quickly, and these hurdles will likely fall away over time. As these reasoning models become more streamlined and widely available, they’ll reshape problem-solving across countless fields.

The key insight here is transparency. When AI shows its work, we can collaborate with it rather than simply taking its word. We can check its logic, understand its thought process, and build on its insights.

This is bigger than an upgrade from the existing model, it’s a fundamental shift toward AI that truly partners with us on complex problems. The future isn’t just about getting the right answers from AI; it’s about understanding the process and showing how it actually works.

References

AI21 Labs. (2025). Large Reasoning Models. AI21 Glossary.
arXiv preprint. (2025). Large Reasoning Models: An Empirical Study.
Sakash, C. (2024). LRM: Large Reasoning Model Rather Than a Traditional LLM. Medium.
Cohorte. (2024). Demystifying Reasoning Models: How AI Learns to Think Step by Step. Cohorte Blog.
OpenAI. (2024). Learning to reason with LLMs. OpenAI Blog.
Synthesis AI. (2025). Large Reasoning Models: How o1 Replications Turned into Real Competition. Synthesis AI Blog.
Perez, C. E. (2025). Comparison of Large Reasoning Models (LRMs). Intuition Machine.
MarkTechPost. (2025). OpenAI Introduces Competitive Programming with Large Reasoning Models.
DeepSeek AI. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv preprint.
DataCamp. (2025). DeepSeek-R1: Features, o1 Comparison, Distilled Models & More.

Written by: Roan Manansala

Roan Manansala is a Computer Science Undergraduate at the Polytechnic University of the Philippines. He is passionate about blending technology with creativity, often exploring ideas at the intersection of community building, data science and human-centered design. He has led initiatives through various tech organizations to empower students to embrace emerging technologies through beginner-friendly spaces.

Large Reasoning Models (LRMs): The AI That Actually Shows Its Work

Large Reasoning Models (LRMs): The AI That Actually Shows Its Work

The Problem with Traditional AI

Enter Large Reasoning Models

The Result Now Speak for Themselves