prl.agents package¶
Submodules¶
prl.agents.agents module¶
-
class
A2CAdvantage
[source]¶ Bases:
prl.agents.agents.Advantage
Advantage function from Asynchronous Methods for Deep Reinforcement Learning.
-
class
A2CAgent
(policy_network, value_network, agent_id='A2C_agent')[source]¶ Bases:
prl.agents.agents.ActorCriticAgent
Advantage Actor Critic agent.
-
class
ActorCriticAgent
(policy_network, value_network, advantage, agent_id='ActorCritic_agent')[source]¶ Bases:
prl.agents.agents.Agent
Basic actor-critic agent.
-
act
(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray
) – state from the environment.Return type: ndarray
Returns: Action to execute on the environment.
-
id
¶ Agent UUID
-
train_iteration
(env, n_steps=32, discount_factor=1.0)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
EnvironmentABC
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
Advantage
[source]¶ Bases:
prl.typing.AdvantageABC
,abc.ABC
Base class for advantage functions.
-
class
Agent
[source]¶ Bases:
prl.typing.AgentABC
,abc.ABC
Base class for all agents
-
act
(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray
) – state from the environment.Return type: ndarray
Returns: Action to execute on the environment.
-
id
¶ Agent UUID
Return type: str
-
play_episodes
(env, episodes)[source]¶ Method for playing full episodes used usually to train agents.
Parameters: - env (
Environment
) – Environment - episodes (
int
) – Number of episodes to play.
Return type: Returns: History object representing episodes history
- env (
-
play_steps
(env, n_steps, storage)[source]¶ Method for performing some number of steps in the environments. Appends new states to existing storage :type env:
Environment
:param env: Environment :type n_steps:int
:param n_steps: Number of steps to play :type storage:Storage
:param storage: Storage (Memory, History) of the earlier games (used to perform first action)Return type: Storage
Returns: History with appended states, actions, rewards, etc
-
post_train_cleanup
(env, **kwargs)[source]¶ Performs cleaning up fields that are no longer needed after training to keep agent lightweight.
Parameters: - env (
Environment
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
pre_train_setup
(env, **kwargs)[source]¶ Performs pre-training setup. This method should handle non-repeatable part of training an agent.
Parameters: - env (
Environment
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
test
(env)[source]¶ Method for playing full episode used to test agents. Reward in the returned history is the true reward from the environments. This method is used mostly for testing the agent.
Parameters: env – Environment Return type: History
Returns: History object representing episode history
-
train
(env, n_iterations, callback_list=None, **kwargs)[source]¶ Trains the agent using environment. Also handles callbacks during training.
Parameters: - env (
Environment
) – Environment to train on - n_iterations (
int
) – Maximum number of iterations to train - callback_list (
Optional
[list
]) – List of callbacks - kwargs – other arguments passed to train_iteration, pre_train_setup and post_train_cleanup
- env (
-
train_iteration
(env, **kwargs)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
Environment
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
CrossEntropyAgent
(policy_network, agent_id='crossentropy_agent')[source]¶ Bases:
prl.agents.agents.Agent
Agent using cross entropy algorithm
-
act
(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray
) – state from the environment.Return type: ndarray
Returns: Action to execute on the environment.
-
id
¶ Agent UUID
-
train_iteration
(env, n_episodes=32, percentile=75)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
EnvironmentABC
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
DQNAgent
(q_network, replay_buffer_size=10000, start_epsilon=1.0, end_epsilon=0.05, epsilon_decay=1000, training_set_size=64, target_network_copy_iter=100, steps_between_training=10, agent_id='DQN_agent')[source]¶ Bases:
prl.agents.agents.Agent
Agent using DQN algorithm
-
act
(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray
) – state from the environment.Return type: ndarray
Returns: Action to execute on the environment.
-
id
¶ Agent UUID
-
pre_train_setup
(env, discount_factor=1.0, **kwargs)[source]¶ Performs pre-training setup. This method should handle non-repeatable part of training an agent.
Parameters: - env (
EnvironmentABC
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
train_iteration
(env, discount_factor=1.0)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
EnvironmentABC
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
GAEAdvantage
(lambda_)[source]¶ Bases:
prl.agents.agents.Advantage
Advantage function from High-Dimensional Continuous Control Using Generalized Advantage Estimation.
-
class
REINFORCEAgent
(policy_network, agent_id='REINFORCE_agent')[source]¶ Bases:
prl.agents.agents.Agent
Agent using REINFORCE algorithm
-
act
(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray
) – state from the environment.Return type: ndarray
Returns: Action to execute on the environment.
-
id
¶ Agent UUID
-
pre_train_setup
(env, discount_factor=1.0, **kwargs)[source]¶ Performs pre-training setup. This method should handle non-repeatable part of training an agent.
Parameters: - env (
EnvironmentABC
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
train_iteration
(env, n_episodes=32, discount_factor=1.0)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
EnvironmentABC
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
-
class
RandomAgent
(agent_id='random_agent', replay_buffer_size=100)[source]¶ Bases:
prl.agents.agents.Agent
Agent performing random actions
-
act
(state)[source]¶ Makes a step based on current environments state
Parameters: state ( ndarray
) – state from the environment.Returns: Action to execute on the environment.
-
id
¶ Agent UUID
-
pre_train_setup
(env, **kwargs)[source]¶ Performs pre-training setup. This method should handle non-repeatable part of training an agent.
Parameters: - env (
Environment
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-
train_iteration
(env, discount_factor=1.0)[source]¶ Performs single training iteration. This method should contain repeatable part of training an agent.
Parameters: - env (
Environment
) – Environment - **kwargs – Kwargs passed from train() method
- env (
-