What is the difference between model-based and model-free reinforcement learning

To answer this question, lets revisit the components of an MDP, the most typical decision making framework for RL. An MDP is typically defined by a 4-tuple (𝑆,𝐴,𝑅,𝑇)(S,A,R,T) where 𝑆S is the state/observation space of an environment𝐴A is the set of actions the agent can choose between𝑅(𝑠,𝑎)R(s,a) is a function that returns the reward received for taking action 𝑎a in state 𝑠s𝑇(𝑠′|𝑠,𝑎)T(s′|s,a) is a…

더 보기