What is reinforcement learning?
Reinforcement learning is a type of machine learning that mainly aims at improving the decision-making skills of the model. It is used to solve a problem that needs a sequence of decisions to be taken in an uncertain and complex environment. For example, Playing a game of chess.
Consider you are solving a multilevel problem that has many different ways to solve it. For example, Controlling the robot’s speed and direction based on its current position, making business decisions like market demand and inventory management, etc. Here you are expected to choose the optimum solution with maximum reward.
At each step, You have to make the correct decisions. Again challenges here are these decisions vary depending upon the environment condition at that time. So it becomes more difficult when the environment is uncertain and complex.
So in such uncertain environments, the machine learning model should be capable enough to take correct decisions at every step towards reaching the goal. It is all about taking the best suitable action in that situation.
Difference between supervised, unsupervised and reinforcement learning:
Reinforcement Learning (RL) enables an agent to learn in an interactive environment by trial and error using feedback from its actions and experiences. The agent takes the action and observes its consequences and based upon the consequences it decides the next action.
Difference between supervised and reinforcement learning:
Though both supervised and reinforcement learning takes decisions based upon input and output data. However, the main difference here is,
Supervised learning is fed with the correct set of actions.
Whereas reinforcement learning isn’t fed with any such ready action set. It has to decide the course of action based on feedback. Agents’ every action is either rewarded or punished by the feedback mechanism. For each positive move, the agent is signaled with a reward or else punished for the wrong one.
Difference between Unsupervised and reinforcement learning:
The basic difference between unsupervised learning and reinforcement learning is the goal they are trying to achieve.
The goal of unsupervised learning is to find the pattern or similarity, odd data points in input data.
Whereas the goal of reinforcement learning is to find the best suitable action at each step that will maximize the cumulative reward of the agent.
Process of reinforcement learning?
The typical reinforcement machine learning workflow consist of following steps;
- An agent takes action in an environment.
- The environment interprets the action and prepares the feedback (State and reward information).
- Based on the current state and feedback, the agent to decide the next move from the available set of actions.
The main intent of the agent before making any decision is to yield maximum rewards.
Let us consider the example of a driver-less car.
- The goal of a driverless car (Agent) is to reach the specified destination safely by following all the traffic rules (Maximum cumulative reward).
- Every decision of the car will be based upon Environment (conditions like traffic density, signal color, road conditions, weather conditions, etc) and its state (Engine started, Moving, stopped).
- For each move (Action) of the car, it interacts with the environment and gets the feedback (either a reward or punishment). The car will receive a reward if it is following the correct route and traffic rules. Otherwise, it will receive a punishment (Jumping traffic signal, Overspeeding, going in the wrong direction, etc).
So when the car starts (State) it starts interacting with the environment, It starts receiving feedback. Now based upon the state of the car and received feedback, It will decide the next action i.e whether to move or stop.
Maximizing the total reward is the only intention behind every decision of agent.
In the machine learning context, each component has a specific term know as elements of reinforcement learning.
Elements of reinforcement machine learning:
- Agent: The agent is the one who makes the decision based on feedback. In our example, The decision-making unit fitted in driverless cars is an agent.
- Environment: It is the surrounding in which the agent operates. Like traffic density, signal color, road condition are the environment.
- State: It is the current situation or condition of an agent. Like the car is moving or stopped.
- Action: To attain maximum reward agent can decide to change the state by performing an action. Taking turns or slowing down the speed before the speed breaker.
- Reward: Once the agent takes any action environment gives it feedback (Either a reward or punishment). Suppose the signal color becomes green and the car starts to move then it will receive positive feedback (reward). But if the car remains to stand still even after a green signal then it will get a punishment.
- Value: Since the goal of the agent is to get maximum rewards. So before doing any action or changing the state agent measures its future reward.
- Policy: It is the decision making strategy of an agent. Based on the policy and current state agent decides the next move. e.g If the signal color is red then the car should stop or Prefer the shortest route to reach the destination.
Reinforcement learning approaches:
In the above example, we have followed the policy of selecting the shortest route. However, there are chances that this policy is not optimal. It might happen that the shortest route may take more time to reach because of bad road conditions. So the lesson here is, we should select a policy that is optimal i.e It should balance both distances as well as time to reach.
So there are two main approaches of reinforcement machine learning;
- Policy-Based: In this approach, all the available policies are explored and the optimum one is selected. We try to come up with such a policy that will eventually help us to gain maximum future rewards.
- Value-Based: Here focus is to collect the maximum cumulative reward. Here actions are decided based upon the reward value.
Algorithms and frameworks used in reinforcement machine learning
Markov Decision process:
It is the mathematical framework for defining a solution in reinforcement learning. Unlike other forms of learning, it is a multi stage decision making process
It follows the policy-based approach. Explore all the policy and select the optimal one that will give us maximum reward.
MDP is more powerful than other algorithms because your policy will allow you to do optimal actions even if something went wrong along the way. Other algorithms just follow the plan after you find the best strategy.
Q-learning is also a policy-based reinforcement learning algorithm. It is used to find an optimal policy in the sense maximizing the expected value of the total reward over any and all successive steps, starting from the current state.
- Reinforcement learning is a type of machine learning that mainly aims at improving the decision-making skills of the model.
- It is used in solving problems that need a sequence of decisions to be taken in an uncertain environment.
- It is all about taking the best suitable action in that state which will give maximum rewards.
- Supervised learning is fed with the correct set of answers whereas reinforcement learning isn’t fed with any such ready action set. The agent has to decide the next action based upon feedback and the current state.
- The goal of unsupervised learning is to find the pattern or similarity, odd data points in input data.
- Whereas the goal of unsupervised learning is to find the best suitable action at each step that will maximize the cumulative reward of the agent.
- The policy-based RL approach tries to explore all the policies and choose the optimum one.
- Whereas the value-based approach tries to select the action with maximum reward.