prl.storage package

Submodules

prl.storage.storage module

class History(initial_state, action_type, initial_length=512)[source]

Bases: prl.storage.storage.Storage, prl.typing.HistoryABC

An object which is used to keep the episodes history (used within Environment class and by some agents). Agent can use this object to keep history of past episodes, calculate returns, total rewards, etc. and sample batches from it.

Object also supports indexing and slicing because it supports python Sequence protocol, so functions working on sequences like random.choice can be also used on history.

Parameters:
  • initial_state (ndarray) – initial state from enviroment
  • action_type (type) – numpy type of action (e.g. np.int32)
  • initial_length (int) – initial length of a history
get_actions()[source]

Returns an array of all actions.

Return type:ndarray
Returns:array of all actions
get_dones()[source]

Returns an array of all done flags.

Return type:ndarray
Returns:array of all done flags
get_last_state()[source]

Returns only the last state.

Return type:ndarray
Returns:last state
get_number_of_episodes()[source]

Returns a number of full episodes in history.

Return type:int
Returns:number of full episodes in history
get_returns(discount_factor=1.0, horizon=inf)[source]

Calculates returns for each step.

Return type:ndarray
Returns:array of discounted returns for each step
get_rewards()[source]

Returns an array of all rewards.

Return type:ndarray
Returns:array of all rewards
get_states()[source]

Returns an array of all states.

Return type:ndarray
Returns:array of all states
get_summary()[source]
Return type:(<class ‘float’>, <class ‘float’>, <class ‘int’>)
get_total_rewards()[source]

Calculates sum of all rewards for each episode and reports it for each state, so every state in one episode has the same value of total reward. This can be useful for filtering states for best episodes (e.g. in Cross Entropy Algorithm).

Return type:ndarray
Returns:total reward for each state
new_state_update(state)[source]

Overwrites newest state in the History

Parameters:state (ndarray) – state array.
sample_batch(replay_buffer_size, batch_size=64, returns=False, next_states=False)[source]

Samples batch of examples from the Storage.

Parameters:
  • replay_buffer_size (int) – length of a replay buffor to sample examples from
  • batch_size (int) – number of returned examples
  • returns (bool) – if True, the method will return the returns from each step instead of the rewards
  • next_states (bool) – if True, the method will return also next states (i.e. for DQN algorithm)
Returns:

states, actions, rewards, dones, (new_states)

Return type:

batch of samples from history in form of a tuple with np.ndarrays in order

update(action, reward, done, state)[source]

Updates the object with latest states, reward, actions and done flag.

Parameters:
  • action (ndarray) – action executed by the agent
  • reward (Real) – reward from environments
  • done (bool) – done flag from environments
  • state (ndarray) – new state returned by wrapped environments after executing action
class Memory(initial_state, action_type, maximum_length=1000)[source]

Bases: prl.storage.storage.Storage, prl.typing.StorageABC

An object to be used as replay buffer. Doesn’t contain full episodes and acts as limited FIFO queue. Implemented as double size numpy arrays with duplicated data to support very fast slicing and sampling at the cost of higher memory usage.

Parameters:
  • initial_state (ndarray) – initial state from enviroment
  • action_type – numpy type of action (e.g. np.int32)
  • maximum_length (int) – maximum number of examples to keep in queue
clear(initial_state)[source]
get_actions()[source]

Returns an array of all actions.

Return type:ndarray
Returns:array of all actions
get_dones()[source]

Returns an array of all done flags.

Return type:ndarray
Returns:array of all done flags
get_last_state()[source]

Returns only the last state.

Return type:ndarray
Returns:last state
get_rewards()[source]

Returns an array of all rewards.

Return type:ndarray
Returns:array of all rewards
get_states(include_last=False)[source]

Returns an array of all states.

Return type:ndarray
Returns:array of all states
new_state_update(state)[source]

Overwrites newest state in the History

Parameters:state – state array.
sample_batch(replay_buffor_size, batch_size=64, returns=False, next_states=False)[source]

Samples batch of examples from the Storage.

Parameters:
  • replay_buffer_size – length of a replay buffor to sample examples from
  • batch_size (int) – number of returned examples
  • returns (bool) – if True, the method will return the returns from each step instead of the rewards
  • next_states (bool) – if True, the method will return also next states (i.e. for DQN algorithm)
Returns:

states, actions, rewards, dones, (new_states)

Return type:

batch of samples from history in form of a tuple with np.ndarrays in order

update(action, reward, done, state)[source]

Updates the object with latest states, reward, actions and done flag.

Parameters:
  • action – action executed by the agent
  • reward – reward from environments
  • done – done flag from environments
  • state – new state returned by wrapped environments after executing action
class Storage[source]

Bases: prl.typing.StorageABC, abc.ABC

get_actions()[source]

Returns an array of all actions.

Return type:ndarray
Returns:array of all actions
get_dones()[source]

Returns an array of all done flags.

Return type:ndarray
Returns:array of all done flags
get_last_state()[source]

Returns only the last state.

Return type:ndarray
Returns:last state
get_rewards()[source]

Returns an array of all rewards.

Return type:ndarray
Returns:array of all rewards
get_states()[source]

Returns an array of all states.

Return type:ndarray
Returns:array of all states
new_state_update(state)[source]

Overwrites newest state in the History

Parameters:state – state array.
sample_batch(replay_buffor_size, batch_size, returns, next_states)[source]

Samples batch of examples from the Storage.

Parameters:
  • replay_buffer_size – length of a replay buffor to sample examples from
  • batch_size (int) – number of returned examples
  • returns (bool) – if True, the method will return the returns from each step instead of the rewards
  • next_states (bool) – if True, the method will return also next states (i.e. for DQN algorithm)
Returns:

states, actions, rewards, dones, (new_states)

Return type:

batch of samples from history in form of a tuple with np.ndarrays in order

update(action, reward, done, state)[source]

Updates the object with latest states, reward, actions and done flag.

Parameters:
  • action – action executed by the agent
  • reward – reward from environments
  • done – done flag from environments
  • state – new state returned by wrapped environments after executing action
calculate_returns(all_rewards, dones, horizon, discount_factor, _index)[source]
calculate_total_rewards(all_rewards, dones, _index)[source]

Module contents