These articles introduce the use of Pytorch Conduct PPO（ Near end strategy optimization ） Algorithm programming . This article is from the Internet PPO Learning practice is writing while learning , Hope to smooth out the whole process .
This article begins with a general introduction to writing PPO The flow of the algorithm and the files used .
Study PPO The foundation of algorithmic programming ：Python,pytorch, Reinforcement learning , Introduction to strategy gradient algorithm ,PPO Theoretical knowledge of . Here are some learning references ：
Intuitive understanding PPO Algorithm
PPO Algorithm 【 Theory Chapter 】
PPO The algorithm is easy to understand
Strategy gradient descent algorithm
Strengthen learning and knowledge arrangement
Refer to the online tutorial for practice , First, the training code is divided into 4 File , Namely main.py,ppo.py,network.py and arguments.py.
arguments.py： Parsing command line arguments ,main Function can call .
main.py： Executable file , Use arguments.py Parsing command line arguments , Initialize the environment and PPO Model .
PPO.py： preservation PPO Model
network.py： Used in PPO Defined in the model Actor-Critic The neural network module of the network , It contains a feedforward neural network .
Actor-Critic The model is periodically saved to a binary file ppo_actor.pth and ppo_critic.pth in , You can load them as you test or continue training .
The test code is mainly located in eval_policy.py in , from main.py call .
eval_policy.py： Test the trained strategy in the specified environment , This module is completely independent of all other files .
Reference resources ：
Coding PPO from Scratch with PyTorch