Can AI Decide for Itself? A Look into Reinforcement Learning

Last updated on April 26, 2025

What does it really mean to make a decision? Is it going with what feels right? Is it calculating risks and rewards, moment by moment? Or maybe, it’s remembering what went wrong last time, and choosing better.
But what if a robot starts doing that too? What if it remembers and adapts not because it’s told to but because it chooses to?

That question lands differently when you’ve seen a machine pause, adapt… even protect.

What happens when a robot starts making choices on its own?

In the 2015 film Chappie, directed by Neill Blomkamp, we meet a robot that learns, questions, and begins to understand the world like a child. What begins as a blank machine grows into something emotionally complex, shaped by its surroundings and the people it encounters; much like a human child learning from its environment.

Chappie by Neill Blomkamp, Chappie has stories GIF

Source: Chappie (2015), GIPHER

Then there’s Hotel Reverie from Black Mirror Season 7, Brandy Friday, an actress, immerses herself in a high-tech remake of the 1949 film Hotel Reverie using a simulation technology called ReDream. Within this world, she meets Clara Ryce-Lechere, a character modeled after a long-dead actress. As the simulation glitches and the outside world pauses, Clara begins to exhibit signs of self-awareness. She draws from the memories of the actress she was built upon, showing how AI can blur the lines between performance and personality, simulation and soul.

Source: Hotel Reverie (2025), Black Mirror Season 7, Netflix

In the end, Brandy reaches out to a reset version of Clara, who no longer remembers her, but answers the phone anyway. It’s subtle, but powerful.

Not out of memory. Not out of code. But out of something learned, something felt; an echo of connection.

And that’s the twist! When machines don’t just mimic us, but start responding because of us.

Do you ever wonder how that happened?

What connects these two? Behind the fictional drama is a real-world question scientists are exploring today.
Can machines truly learn from their experiences and eventually, make their own decisions?

That moment feels human but what’s wild is, it mirrors how reinforcement learning works.

What is Reinforcement Learning?

Reinforcement learning (RL) is one of the most fascinating areas of artificial intelligence. It works by teaching an AI to interact with an environment, make decisions, and learn from the consequences, just like humans do.

At the center of reinforcement learning is an agent (AI), who interacts with an environment. It takes actions, sees what happens, and receives rewards (positive or negative feedback). Based on those rewards, the AI updates its behavior and tries again which makes it better, smarter, sharper.

But how does it work?

Let’s break it down for you:

Agent → Environment → Action → Reward → Update → Repeat

1. Agent

The agent is the AI itself, the decision-maker, the entity solving a problem. It could be a robot, a virtual assistant, or even a game-playing bot. Its core task is to learn how to make the best decisions.

However, it doesn’t start out smart. Like a newborn, it begins with very little knowledge. What makes it powerful is the process of growth.

2. Environment

This is where the agent resides. The environment encompasses everything the AI can see, touch, hear, or interact with.

3. Action

After understanding its environment, the agent makes a decision. It could be moving forward, speaking a sentence, clicking a button, or solving a puzzle. This decision is based on the knowledge it has accumulated.

At first, the actions might seem random. But as the agent interacts with the world, it starts identifying patterns, and the actions become more intelligent over time.

4. Rewards

The environment reacts, telling the agent: “That was smart.” or “That didn’t go well.”

Positive rewards encourage behavior then Negative rewards discourages it. At this moment, with feedback, the learning starts.

5. Update

The agent uses the reward to update its internal model. It adjusts its understanding of the world, recalculates its strategy, and decides what it might do differently next time.

Every decision and reward is added to the AI’s “memory,” helping it improve.

The learning cycle repeats. The agent takes another action, observes the outcome, receives feedback, learns from it, and repeats the process. Each cycle builds more experience, making the AI sharper, faster, and more confident in its decisions. As the loop progresses, the AI’s actions turn into refined skills.

Reinforcement learning powers:

Autonomous vehicles
Financial market predictions
Industrial automation
Personalized recommendations
Even smart warehouse logistics

And with cloud services like AWS SageMaker RL, developers can build and train RL models at scale, combining simulation environments with real-time data to create learning agents that adapt like never before.

A Glimpse Into the Training Process

Reinforcement learning is complex and powerful.

Behind every seemingly small decision an AI makes, there’s a process of value estimation happening. The AI calculates how “valuable” each action is likely to be, based on all the previous rewards it has seen.

This is often done using:

• Q-Learning

Where the AI estimates how good an action is in a particular state.

The diagram shows how Q-Learning works. First, the agent receives an input (like a picture) and makes a response “It’s a cat!”. When the agent gets feedback “No, it’s a dog!”, it learns from the mistake and updates its knowledge. Next time, when the agent encounters a similar input, it gives a reinforced response (now correctly identifying it as a dog). This process of learning from feedback and updating its knowledge helps the agent improve its decision-making over time.

• Policy Gradients

Where the AI updates its decision-making policy directly based on outcomes.

The diagram shows the cat learning to meow for food using Policy Gradient. The cat begins by randomly trying actions like meowing or scratching, not knowing which one leads to food. When the cat meows and gets food, it receives a reward. This reward causes the cat to update its policy, meaning it becomes more likely to meow next time because it learned that meowing leads to food. Over time, the cat’s actions improve, and it focuses more on meowing when hungry. This cycle of action, reward, and policy update continues, gradually improving the cat’s behavior until it reliably meows to get food.

• Deep Reinforcement Learning

Which uses neural networks to handle high-dimensional, complex environments (like vision or language).

The diagram shows the cycle of Deep Reinforcement Learning. The Agent takes an Action in the Environment, which results in a Reward/Penalty. The agent uses a Neural Network to learn from this feedback and update its policy to improve future actions.

Similarly, teaching a robot to walk over rocky terrain doesn’t involve coding every detail. Instead, the robot learns through trial and error, refining its balance and movements over time to walk efficiently and adapt to any environment.

So… Is that Free Will?

Here’s where things get deep. When an AI learns by itself, does that mean it has free will?
Or is it still following a pattern, just a complex one?

When an AI makes a choice, it’s using a learned strategy. It’s selecting from what it’s been trained to understand as “good” or “rewarding.” That process might be mathematical, but the outcome feels incredibly human.

We’re starting to see robots:

Pause before acting.
Take safer paths.
Offer unexpected help.
Choose kindness over logic.

They don’t have feelings but they are starting to reflect behaviors that seem emotional. Not because they’re alive, but because they’ve learned from enough interactions to respond like they are.

So… can robots make their own decisions? In some ways, they already are.

Not because they know they’re deciding, but because they’ve learned how to weigh outcomes, act on experience, and refine their behavior.

It’s not free will in the philosophical sense, but it’s absolutely decision-making.

And it’s here.

References:

Written by: Maxine Sofia Llamas

Maxine Sofia “mczeen” is a Computer Engineering student at Pamantasan ng Lungsod ng Maynila (PLM) and an IT intern at Tutorials Dojo. She actively contributes to tech communities and student organizations, blending creativity with technical skills to promote collaboration, innovation, and continuous learning.

Can AI Decide for Itself? A Look into Reinforcement Learning

Can AI Decide for Itself? A Look into Reinforcement Learning

What happens when a robot starts making choices on its own?

Do you ever wonder how that happened?