The Year of the Agent: Anthropic's Claude AI Models and Agents

Looking back on the past year of 2025 coding with the help of artificial intelligence, we can safely say that it was the year of agents, especially pioneered by Anthropic with Claude Code. AI products have matured to offer highly reliable agents that can understand, navigate, and work seamlessly on large codebases. It is the end of the old ways: of manually copying and pasting code into web AI applications. Agents are now actually part of the codebase, navigating around like a real developer. And they can now be left with long-running tasks on their own,opening and working on their own pull requests (Claude Code on the web). In this article, we will look back at the progress in AI agents, the evolution of frontier coding models from Anthropic, and the innovation in tooling that has erupted.

At the recently concluded AWS re:Invent 2025, the talk AWS re:Invent 2025 – What Anthropic Learned Building AI Agents in 2025 by Anthropic presented the company’s progress in coding models, tooling, and innovations from the past year. The introduction of the Claude 3 family, growing up to the most current frontier coding model, Claude Opus 4.5; their product innovation pushing Claude Code and agentic tooling into the mainstream; and their open-source contributions to the AI field with their model context protocol (MCP) and SKILLS.md. As a student developer, I was and still am at the forefront of these cutting-edge changes, given how each release leads to an engaging discovery to update my AI workflow, and genuinely see how coding models have kept on improving year by year.

Anthropic innovation from the past year of 2024

(From AWS re:Invent 2025 – What Anthropic Learned Building AI Agents in 2025)

Artifacts and Claude Sonnet 3.5

The introduction of Artifacts on Claude.ai was my eye-opening moment on artificial intelligence. These are web applications, like iframes, embedded in the AI chat application, that render the AI-generated frontend code in real time in the browser. Just seeing Claude generate frontend code live on the web and render it in the browser in real time felt magical. It was the first time I experienced laying out design guidelines and details all in pure English, and Claude Sonnet hammering it down into matching frontend code, which I can then visually check there and then.

The value of this built-in preview of frontend code is highlighted by how other frontier AI companies, such as OpenAI and Google, have quickly integrated a similar Canvas feature to interactively edit and preview code in real time on the web. The feature really provided a slight glimpse towards the agentic workflow of today: where you just provide the spec and goals, and the generation of code and checking by running the code are all handled by the AI tool.

Claude Code

If Artifacts were the visual glimpses that upgraded the typical AI chatbot web interface, Claude Code was the precursor to agentic workflows. Announced early 2024 and matured throughout 2025, Claude Code represents Anthropic’s answer to the biggest friction point in the early AI era: the context gap.

Previously, using a frontier model meant a tedious sequence of copying and pasting. We spent more time curating file context and pasting it into a chat window than we did architecting solutions. Claude Code eliminated this boundary entirely by bringing the intelligence directly into the developer’s native environment: the Command Line Interface (CLI). Literally, the AI models are running around your actual local directories, accessing and reading files, editing and creating them, and running commands on your computer.

From my perspective as a developer, the transition to using the Claude Code CLI was jarring—in the best way possible. Suddenly, I wasn’t “chatting” with an AI about my code; the AI was in my code. At first, it kind of felt scary that this AI tool, or agent, is literally working around actual files on my computer. But it’s such a powerful capability that going back to traditional chat interfaces seemed nearly impossible.

The magic is found in its autonomy. Instead of manually selecting the relevant files to provide context, Claude Code uses agentic behavior to move through the file system. When I ask it to “refactor the authentication middleware to support the new OAuth provider,” it doesn’t just wait for snippets; it actually refactors the code. It:

Explores the file tree to locate the middleware and relevant config files.
Reads the required context into its memory.
Executes the changes directly in the file system.
Runs the linter or tests (if instructed) to verify its own work.

This was the first time the term “agents” really made sense. It appears more like using a tool and more like working with a real developer.

Prompt engineering to Context engineering

One of the key takeaways from Anthropic’s session at AWS re:Invent 2025 was the industry-wide pivot from “Prompt Engineering” to “Context Engineering.”

With tools like Claude Code, the skill set has shifted. We no longer agonize over the perfect phrasing of a prompt. Instead, we focus on curating the environment in which the agent operates. This is where breakthroughs like SKILLS.md and the Model Context Protocol (MCP) came into play. By standardizing how Claude connects to external tools—whether it’s a GitHub repository, a local database, or a Slack channel—we are now providing the AI agent with access and context in more comprehensive ways. It can kind of feel like these innovations, taken together, are slowly building the limbs and sense of an autonomous AI agent. It can navigate files, gather information from other environments using these MCP and SKILLS.md tools, and implement changes to achieve the initial goal.

The Evolution of Reasoning: From Sonnet 3.5 to Opus 4.5

Model improvements of Anthropic Claude from the past years

(From AWS re:Invent 2025 – What Anthropic Learned Building AI Agents in 2025)

While tools like Claude Code and MCP provide the limbs for an AI agent to interact with the world, the brain ultimately determines success. If we look at the trajectory of Anthropic’s model releases over the last two years (as shown in the chart above), we see a fascinating story of how we arrived at today’s agentic reliability.

For most of 2024, the developer community—myself included—lived in the “Sonnet Era”, where Claude 3.5 Sonnet was the sweet spot; it was fast, cost-effective, and smart enough to handle single-file logic. As the chart illustrates, we saw incremental gains through late 2024 and early 2025 with Sonnet 3.6 and Sonnet 3.7. These models refined the daily workflow, making chat interactions snappier and Artifacts more precise.

Claude Opus 4.5

However, the real paradigm shift happened in the latter half of 2025 with the return of the Opus line. Specifically, Claude Opus 4.5 represents the massive spike we see at the far right of the timeline, pushing benchmark scores significantly past the 80-mark average. Crucially, this isn’t just a “smarter chatbot.” Instead, this is a heavy-lifting reasoning engine designed specifically for the agentic workflows we just discussed.

Moreover, the jump from Opus 4 to Opus 4.5 is distinct because of the extended thinking capabilities it enables. While Sonnet models are excellent at “doing,” Opus 4.5 is conversely excellent at “planning.” For instance, in my personal projects across IDEs like VSCode and Antigravity, what stood out was its reliability. As a result, it just hammers down and reliably outputs the code you requested. Furthermore, you can feel that it really listens to your prompts with little to no hallucination, simultaneously remaining quick and fast.

Ultimately, Opus 4.5 changed the dynamic. When plugged into Claude Code, Opus 4.5 doesn’t just rush to write code. Rather, it pauses to map out dependencies. Then, it anticipates how a change in utils.ts might break a test in a different directory. Finally, it formulates a plan before executing a single command. Therefore, that steep upward curve in the chart signifies the moment AI moved from being a junior coder to a senior software architect.

To highlight, just from a year ago, Sonnet 3.5 scored 49% on the SWE-bench Verified benchmark.

Evaluations of Anthropic models on SWE-bence Verified

(From AWS re:Invent 2025 – What Anthropic Learned Building AI Agents in 2025)

Claude in Chrome: The Final Piece of the Puzzle

Now, with the most recent release from Anthropic, if Claude Code gave the agent access to our local machines, the newly released Claude in Chrome extension gives it eyes and hands to interact with the web. Which clearly felt like the right next step, given that developer testing and exploration often happens within the browser

For a long time, “AI browser extensions” were synonymous with sidebars that summarized articles or drafted emails. Claude in Chrome creates a fundamental shift. It allows the model to use the browser: navigate URLs, click buttons, fill out forms, and manage tabs.

As a developer, the combination of this extension and Claude Code is what excites me the most. We can now bridge the gap between “writing code” and “testing code.” Imagine this workflow: I ask Claude Code to build a login form. It writes the React components locally. Then, using the Chrome extension, it opens the localhost server, enters test credentials, clicks the ‘Submit’ button, and verifies that the redirect actually occurs.

Beyond development, the automation potential is staggering. The extension can handle what I call “digital chores”:

Data Aggregation: Navigating through analytics dashboards to compile a weekly summary without opening a single tab.
Inbox Hygiene: Scanning Gmail to flag newsletters and marketing spam, presenting me with a bulk-delete list.
Research: Comparing specs across multiple product pages and normalizing the data into a spreadsheet.

However, Anthropic has been refreshingly transparent about the risks. This is a Beta feature that literally takes control of your browser. The documentation heavily emphasizes security, warning users not to use it for financial transactions or password management due to risks such as prompt injection. It’s a powerful tool, but one that compels our attention to be vigilant supervisors rather than passive users.

Once this matures, this seems to complete the agentic capabilities for AI workflows. An autonomous AI agent that follows your instructions can navigate your entire codebase to understand and make changes, then connect to the browser to test it all out on its own. Slowly and continuously, almost all of the developer tasks are being automated and handled by AI agents.

Conclusion

Looking back at 2025, the trajectory is clear. We started with Artifacts (seeing the code), graduated to Claude Code (integrating the code), and arrived at Agents (automating the workflow).

The deepest insight from AWS re:Invent and the past year of releases is that the role of the developer is evolving, not vanishing. We are moving away from being the ones who type the syntax to being the ones who design the systems and orchestrate the agents. We have shifted from Prompt Engineering to Context Engineering—building the environment where the strongest frontier coding models like Opus 4.5 can succeed.

Finally, as we look toward 2026, the barrier to entry for building software has never been lower, but the ceiling for what a single developer can build has never been higher. With agents now capable of traversing our filesystems and our browsers, the only limit left is the complexity of the problems we are brave enough to tackle.

References

Simon Willison on claude for a collection of Claude-related discussions and post by one of the creators of Django
Google’s new IDE Antigravity to explore more on agentic IDE tools and a free access to Claude Opus 4.5 model
Armin Ronacher on A Year Of Vibes for a reflection on AI tooling and vibe coding from the past year by the creator of Flask

Written by: Duncan Bandojo

Duncan F. Bandojo is a undergraduate of Computer Science at Polytechnic University of the Philippines, with an interest in backeend development and geospatial data analysis, and is currently diving into frontend development. He is passionate about building applications that leverage visual data (geospatial) to provide visual insights that genuinely helps people.

The Year of the Agent: Anthropic’s Claude AI Models and Agents

The Year of the Agent: Anthropic’s Claude AI Models and Agents

Artifacts and Claude Sonnet 3.5

Claude Code

Prompt engineering to Context engineering

The Evolution of Reasoning: From Sonnet 3.5 to Opus 4.5

Claude in Chrome: The Final Piece of the Puzzle

Conclusion

References

💝 Valentine’s Sale! Get 30% OFF Any Reviewer. Use coupon code: VDAYSALE2026 & 5% OFF Store Credits/Gift Cards

Learn AWS with our PlayCloud Hands-On Labs

$2.99 AWS and Azure Exam Study Guide eBooks

New AWS Generative AI Developer Professional Course AIP-C01

Learn GCP By Doing! Try Our GCP PlayCloud

Learn Serverless Security

Learn Azure with our Azure PlayCloud

FREE AI and AWS Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Follow Us On Linkedin

Written by: Duncan Bandojo

Our Community

What our students say about us?

The Year of the Agent: Anthropic’s Claude AI Models and Agents

The Year of the Agent: Anthropic’s Claude AI Models and Agents

Artifacts and Claude Sonnet 3.5

Claude Code

Prompt engineering to Context engineering

The Evolution of Reasoning: From Sonnet 3.5 to Opus 4.5

Claude in Chrome: The Final Piece of the Puzzle

Conclusion

References

💝 Valentine’s Sale! Get 30% OFF Any Reviewer. Use coupon code: VDAYSALE2026 & 5% OFF Store Credits/Gift Cards

Learn AWS with our PlayCloud Hands-On Labs

$2.99 AWS and Azure Exam Study Guide eBooks

New AWS Generative AI Developer Professional Course AIP-C01

Learn GCP By Doing! Try Our GCP PlayCloud

Learn Serverless Security

Learn Azure with our Azure PlayCloud

FREE AI and AWS Digital Courses

FREE AWS, Azure, GCP Practice Test Samplers

Subscribe to our YouTube Channel

Follow Us On Linkedin

Written by: Duncan Bandojo

Our Community

What our students say about us?

Did you find our content helpful?