prl.agents package¶

Submodules¶

prl.agents.agents module¶

class A2CAdvantage[source]¶

Bases: prl.agents.agents.Advantage

Advantage function from Asynchronous Methods for Deep Reinforcement Learning.

calculate_advantages(rewards, baselines, dones, discount_factor)[source]¶

Return type:	`ndarray`

class A2CAgent(policy_network, value_network, agent_id='A2C_agent')[source]¶

Bases: prl.agents.agents.ActorCriticAgent

Advantage Actor Critic agent.

class ActorCriticAgent(policy_network, value_network, advantage, agent_id='ActorCritic_agent')[source]¶

Bases: prl.agents.agents.Agent

Basic actor-critic agent.

act(state)[source]¶

Makes a step based on current environments state

Parameters:	state (`ndarray`) – state from the environment.
Return type:	`ndarray`
Returns:	Action to execute on the environment.

id¶: Agent UUID

train_iteration(env, n_steps=32, discount_factor=1.0)[source]¶

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:	env (`EnvironmentABC`) – Environment **kwargs – Kwargs passed from train() method

class Advantage[source]¶

Bases: prl.typing.AdvantageABC, abc.ABC

Base class for advantage functions.

calculate_advantages(rewards, baselines, dones, discount_factor)[source]¶

Return type:	`ndarray`

class Agent[source]¶

Bases: prl.typing.AgentABC, abc.ABC

Base class for all agents

act(state)[source]¶

Makes a step based on current environments state

Parameters:	state (`ndarray`) – state from the environment.
Return type:	`ndarray`
Returns:	Action to execute on the environment.

id¶

Agent UUID

Return type:	`str`

play_episodes(env, episodes)[source]¶

Method for playing full episodes used usually to train agents.

Parameters:	env (`Environment`) – Environment episodes (`int`) – Number of episodes to play.
Return type:	`History`
Returns:	History object representing episodes history

play_steps(env, n_steps, storage)[source]¶

Method for performing some number of steps in the environments. Appends new states to existing storage :type env: Environment :param env: Environment :type n_steps: int :param n_steps: Number of steps to play :type storage: Storage :param storage: Storage (Memory, History) of the earlier games (used to perform first action)

Return type:	`Storage`
Returns:	History with appended states, actions, rewards, etc

post_train_cleanup(env, **kwargs)[source]¶

Performs cleaning up fields that are no longer needed after training to keep agent lightweight.

Parameters:	env (`Environment`) – Environment **kwargs – Kwargs passed from train() method

pre_train_setup(env, **kwargs)[source]¶

Performs pre-training setup. This method should handle non-repeatable part of training an agent.

Parameters:	env (`Environment`) – Environment **kwargs – Kwargs passed from train() method

test(env)[source]¶

Method for playing full episode used to test agents. Reward in the returned history is the true reward from the environments. This method is used mostly for testing the agent.

Parameters:	env – Environment
Return type:	`History`
Returns:	History object representing episode history

train(env, n_iterations, callback_list=None, **kwargs)[source]¶

Trains the agent using environment. Also handles callbacks during training.

Parameters:	env (`Environment`) – Environment to train on n_iterations (`int`) – Maximum number of iterations to train callback_list (`Optional`[`list`]) – List of callbacks kwargs – other arguments passed to train_iteration, pre_train_setup and post_train_cleanup

train_iteration(env, **kwargs)[source]¶

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:	env (`Environment`) – Environment **kwargs – Kwargs passed from train() method

class CrossEntropyAgent(policy_network, agent_id='crossentropy_agent')[source]¶

Bases: prl.agents.agents.Agent

Agent using cross entropy algorithm

act(state)[source]¶

Makes a step based on current environments state

Parameters:	state (`ndarray`) – state from the environment.
Return type:	`ndarray`
Returns:	Action to execute on the environment.

id¶: Agent UUID

train_iteration(env, n_episodes=32, percentile=75)[source]¶

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:	env (`EnvironmentABC`) – Environment **kwargs – Kwargs passed from train() method

class DQNAgent(q_network, replay_buffer_size=10000, start_epsilon=1.0, end_epsilon=0.05, epsilon_decay=1000, training_set_size=64, target_network_copy_iter=100, steps_between_training=10, agent_id='DQN_agent')[source]¶

Bases: prl.agents.agents.Agent

Agent using DQN algorithm

act(state)[source]¶

Makes a step based on current environments state

Parameters:	state (`ndarray`) – state from the environment.
Return type:	`ndarray`
Returns:	Action to execute on the environment.

id¶: Agent UUID

pre_train_setup(env, discount_factor=1.0, **kwargs)[source]¶

Performs pre-training setup. This method should handle non-repeatable part of training an agent.

Parameters:	env (`EnvironmentABC`) – Environment **kwargs – Kwargs passed from train() method

train_iteration(env, discount_factor=1.0)[source]¶

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:	env (`EnvironmentABC`) – Environment **kwargs – Kwargs passed from train() method

class GAEAdvantage(lambda_)[source]¶

Bases: prl.agents.agents.Advantage

Advantage function from High-Dimensional Continuous Control Using Generalized Advantage Estimation.

calculate_advantages(rewards, baselines, dones, discount_factor)[source]¶

Return type:	`ndarray`

class REINFORCEAgent(policy_network, agent_id='REINFORCE_agent')[source]¶

Bases: prl.agents.agents.Agent

Agent using REINFORCE algorithm

act(state)[source]¶

Makes a step based on current environments state

Parameters:	state (`ndarray`) – state from the environment.
Return type:	`ndarray`
Returns:	Action to execute on the environment.

id¶: Agent UUID

pre_train_setup(env, discount_factor=1.0, **kwargs)[source]¶

Performs pre-training setup. This method should handle non-repeatable part of training an agent.

Parameters:	env (`EnvironmentABC`) – Environment **kwargs – Kwargs passed from train() method

train_iteration(env, n_episodes=32, discount_factor=1.0)[source]¶

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:	env (`EnvironmentABC`) – Environment **kwargs – Kwargs passed from train() method

class RandomAgent(agent_id='random_agent', replay_buffer_size=100)[source]¶

Bases: prl.agents.agents.Agent

Agent performing random actions

act(state)[source]¶

Makes a step based on current environments state

Parameters:	state (`ndarray`) – state from the environment.
Returns:	Action to execute on the environment.

id¶: Agent UUID

pre_train_setup(env, **kwargs)[source]¶

Performs pre-training setup. This method should handle non-repeatable part of training an agent.

Parameters:	env (`Environment`) – Environment **kwargs – Kwargs passed from train() method

train_iteration(env, discount_factor=1.0)[source]¶

Performs single training iteration. This method should contain repeatable part of training an agent.

Parameters:	env (`Environment`) – Environment **kwargs – Kwargs passed from train() method

prl.agents package¶

Submodules¶

prl.agents.agents module¶

Module contents¶