prl.agents package¶
Submodules¶
prl.agents.agents module¶
-
class
A2CAdvantage[source]¶ Bases:
prl.agents.agents.AdvantageAdvantage function from Asynchronous Methods for Deep Reinforcement Learning.
-
class
A2CAgent(policy_network, value_network, agent_id='A2C_agent')[source]¶ Bases:
prl.agents.agents.ActorCriticAgentAdvantage Actor Critic agent.
-
class
ActorCriticAgent(policy_network, value_network, advantage, agent_id='ActorCritic_agent')[source]¶ Bases:
prl.agents.agents.AgentBasic actor-critic agent.
-
act(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray) – state from the environment.Return type: ndarrayReturns: Action to execute on the environment.
-
id¶ Agent UUID
-
train_iteration(env, n_steps=32, discount_factor=1.0)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
EnvironmentABC) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
Advantage[source]¶ Bases:
prl.typing.AdvantageABC,abc.ABCBase class for advantage functions.
-
class
Agent[source]¶ Bases:
prl.typing.AgentABC,abc.ABCBase class for all agents
-
act(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray) – state from the environment.Return type: ndarrayReturns: Action to execute on the environment.
-
id¶ Agent UUID
Return type: str
-
play_episodes(env, episodes)[source]¶ Method for playing full episodes used usually to train agents.
Parameters: - env (
Environment) – Environment - episodes (
int) – Number of episodes to play.
Return type: Returns: History object representing episodes history
- env (
-
play_steps(env, n_steps, storage)[source]¶ Method for performing some number of steps in the environments. Appends new states to existing storage :type env:
Environment:param env: Environment :type n_steps:int:param n_steps: Number of steps to play :type storage:Storage:param storage: Storage (Memory, History) of the earlier games (used to perform first action)Return type: StorageReturns: History with appended states, actions, rewards, etc
-
post_train_cleanup(env, **kwargs)[source]¶ Performs cleaning up fields that are no longer needed after training to keep agent lightweight.
Parameters: - env (
Environment) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
pre_train_setup(env, **kwargs)[source]¶ Performs pre-training setup. This method should handle non-repeatable part of training an agent.
Parameters: - env (
Environment) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
test(env)[source]¶ Method for playing full episode used to test agents. Reward in the returned history is the true reward from the environments. This method is used mostly for testing the agent.
Parameters: env – Environment Return type: HistoryReturns: History object representing episode history
-
train(env, n_iterations, callback_list=None, **kwargs)[source]¶ Trains the agent using environment. Also handles callbacks during training.
Parameters: - env (
Environment) – Environment to train on - n_iterations (
int) – Maximum number of iterations to train - callback_list (
Optional[list]) – List of callbacks - kwargs – other arguments passed to train_iteration, pre_train_setup and post_train_cleanup
- env (
-
train_iteration(env, **kwargs)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
Environment) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
CrossEntropyAgent(policy_network, agent_id='crossentropy_agent')[source]¶ Bases:
prl.agents.agents.AgentAgent using cross entropy algorithm
-
act(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray) – state from the environment.Return type: ndarrayReturns: Action to execute on the environment.
-
id¶ Agent UUID
-
train_iteration(env, n_episodes=32, percentile=75)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
EnvironmentABC) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
DQNAgent(q_network, replay_buffer_size=10000, start_epsilon=1.0, end_epsilon=0.05, epsilon_decay=1000, training_set_size=64, target_network_copy_iter=100, steps_between_training=10, agent_id='DQN_agent')[source]¶ Bases:
prl.agents.agents.AgentAgent using DQN algorithm
-
act(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray) – state from the environment.Return type: ndarrayReturns: Action to execute on the environment.
-
id¶ Agent UUID
-
pre_train_setup(env, discount_factor=1.0, **kwargs)[source]¶ Performs pre-training setup. This method should handle non-repeatable part of training an agent.
Parameters: - env (
EnvironmentABC) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
train_iteration(env, discount_factor=1.0)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
EnvironmentABC) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
GAEAdvantage(lambda_)[source]¶ Bases:
prl.agents.agents.AdvantageAdvantage function from High-Dimensional Continuous Control Using Generalized Advantage Estimation.
-
class
REINFORCEAgent(policy_network, agent_id='REINFORCE_agent')[source]¶ Bases:
prl.agents.agents.AgentAgent using REINFORCE algorithm
-
act(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray) – state from the environment.Return type: ndarrayReturns: Action to execute on the environment.
-
id¶ Agent UUID
-
pre_train_setup(env, discount_factor=1.0, **kwargs)[source]¶ Performs pre-training setup. This method should handle non-repeatable part of training an agent.
Parameters: - env (
EnvironmentABC) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
train_iteration(env, n_episodes=32, discount_factor=1.0)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
EnvironmentABC) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
RandomAgent(agent_id='random_agent', replay_buffer_size=100)[source]¶ Bases:
prl.agents.agents.AgentAgent performing random actions
-
act(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray) – state from the environment.Returns: Action to execute on the environment.
-
id¶ Agent UUID
-
pre_train_setup(env, **kwargs)[source]¶ Performs pre-training setup. This method should handle non-repeatable part of training an agent.
Parameters: - env (
Environment) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
train_iteration(env, discount_factor=1.0)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
Environment) – Environment - **kwargs – Kwargs passed from train() method
- env (
-