prl.agents package

Submodules

prl.agents.agents module

class A2CAdvantage[source]

Bases: prl.agents.agents.Advantage

Advantage function from Asynchronous Methods for Deep Reinforcement Learning.

calculate_advantages(rewards, baselines, dones, discount_factor)[source]
Return type:ndarray
class A2CAgent(policy_network, value_network, agent_id='A2C_agent')[source]

Bases: prl.agents.agents.ActorCriticAgent

Advantage Actor Critic agent.

class ActorCriticAgent(policy_network, value_network, advantage, agent_id='ActorCritic_agent')[source]

Bases: prl.agents.agents.Agent

Basic actor-critic agent.

act(state)[source]

Makes a step based on current environments state

Parameters:state (ndarray) – state from the environment.
Return type:ndarray
Returns:Action to execute on the environment.
id

Agent UUID

train_iteration(env, n_steps=32, discount_factor=1.0)[source]

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:
  • env (EnvironmentABC) – Environment
  • **kwargs – Kwargs passed from train() method
class Advantage[source]

Bases: prl.typing.AdvantageABC, abc.ABC

Base class for advantage functions.

calculate_advantages(rewards, baselines, dones, discount_factor)[source]
Return type:ndarray
class Agent[source]

Bases: prl.typing.AgentABC, abc.ABC

Base class for all agents

act(state)[source]

Makes a step based on current environments state

Parameters:state (ndarray) – state from the environment.
Return type:ndarray
Returns:Action to execute on the environment.
id

Agent UUID

Return type:str
play_episodes(env, episodes)[source]

Method for playing full episodes used usually to train agents.

Parameters:
  • env (Environment) – Environment
  • episodes (int) – Number of episodes to play.
Return type:

History

Returns:

History object representing episodes history

play_steps(env, n_steps, storage)[source]

Method for performing some number of steps in the environments. Appends new states to existing storage :type env: Environment :param env: Environment :type n_steps: int :param n_steps: Number of steps to play :type storage: Storage :param storage: Storage (Memory, History) of the earlier games (used to perform first action)

Return type:Storage
Returns:History with appended states, actions, rewards, etc
post_train_cleanup(env, **kwargs)[source]

Performs cleaning up fields that are no longer needed after training to keep agent lightweight.

Parameters:
  • env (Environment) – Environment
  • **kwargs – Kwargs passed from train() method
pre_train_setup(env, **kwargs)[source]

Performs pre-training setup. This method should handle non-repeatable part of training an agent.

Parameters:
  • env (Environment) – Environment
  • **kwargs – Kwargs passed from train() method
test(env)[source]

Method for playing full episode used to test agents. Reward in the returned history is the true reward from the environments. This method is used mostly for testing the agent.

Parameters:env – Environment
Return type:History
Returns:History object representing episode history
train(env, n_iterations, callback_list=None, **kwargs)[source]

Trains the agent using environment. Also handles callbacks during training.

Parameters:
  • env (Environment) – Environment to train on
  • n_iterations (int) – Maximum number of iterations to train
  • callback_list (Optional[list]) – List of callbacks
  • kwargs – other arguments passed to train_iteration, pre_train_setup and post_train_cleanup
train_iteration(env, **kwargs)[source]

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:
  • env (Environment) – Environment
  • **kwargs – Kwargs passed from train() method
class CrossEntropyAgent(policy_network, agent_id='crossentropy_agent')[source]

Bases: prl.agents.agents.Agent

Agent using cross entropy algorithm

act(state)[source]

Makes a step based on current environments state

Parameters:state (ndarray) – state from the environment.
Return type:ndarray
Returns:Action to execute on the environment.
id

Agent UUID

train_iteration(env, n_episodes=32, percentile=75)[source]

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:
  • env (EnvironmentABC) – Environment
  • **kwargs – Kwargs passed from train() method
class DQNAgent(q_network, replay_buffer_size=10000, start_epsilon=1.0, end_epsilon=0.05, epsilon_decay=1000, training_set_size=64, target_network_copy_iter=100, steps_between_training=10, agent_id='DQN_agent')[source]

Bases: prl.agents.agents.Agent

Agent using DQN algorithm

act(state)[source]

Makes a step based on current environments state

Parameters:state (ndarray) – state from the environment.
Return type:ndarray
Returns:Action to execute on the environment.
id

Agent UUID

pre_train_setup(env, discount_factor=1.0, **kwargs)[source]

Performs pre-training setup. This method should handle non-repeatable part of training an agent.

Parameters:
  • env (EnvironmentABC) – Environment
  • **kwargs – Kwargs passed from train() method
train_iteration(env, discount_factor=1.0)[source]

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:
  • env (EnvironmentABC) – Environment
  • **kwargs – Kwargs passed from train() method
class GAEAdvantage(lambda_)[source]

Bases: prl.agents.agents.Advantage

Advantage function from High-Dimensional Continuous Control Using Generalized Advantage Estimation.

calculate_advantages(rewards, baselines, dones, discount_factor)[source]
Return type:ndarray
class REINFORCEAgent(policy_network, agent_id='REINFORCE_agent')[source]

Bases: prl.agents.agents.Agent

Agent using REINFORCE algorithm

act(state)[source]

Makes a step based on current environments state

Parameters:state (ndarray) – state from the environment.
Return type:ndarray
Returns:Action to execute on the environment.
id

Agent UUID

pre_train_setup(env, discount_factor=1.0, **kwargs)[source]

Performs pre-training setup. This method should handle non-repeatable part of training an agent.

Parameters:
  • env (EnvironmentABC) – Environment
  • **kwargs – Kwargs passed from train() method
train_iteration(env, n_episodes=32, discount_factor=1.0)[source]

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:
  • env (EnvironmentABC) – Environment
  • **kwargs – Kwargs passed from train() method
class RandomAgent(agent_id='random_agent', replay_buffer_size=100)[source]

Bases: prl.agents.agents.Agent

Agent performing random actions

act(state)[source]

Makes a step based on current environments state

Parameters:state (ndarray) – state from the environment.
Returns:Action to execute on the environment.
id

Agent UUID

pre_train_setup(env, **kwargs)[source]

Performs pre-training setup. This method should handle non-repeatable part of training an agent.

Parameters:
  • env (Environment) – Environment
  • **kwargs – Kwargs passed from train() method
train_iteration(env, discount_factor=1.0)[source]

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:
  • env (Environment) – Environment
  • **kwargs – Kwargs passed from train() method

Module contents