prl.storage package¶

Submodules¶

prl.storage.storage module¶

class History(initial_state, action_type, initial_length=512)[source]¶

Bases: prl.storage.storage.Storage, prl.typing.HistoryABC

An object which is used to keep the episodes history (used within Environment class and by some agents). Agent can use this object to keep history of past episodes, calculate returns, total rewards, etc. and sample batches from it.

Object also supports indexing and slicing because it supports python Sequence protocol, so functions working on sequences like random.choice can be also used on history.

Parameters:	initial_state (`ndarray`) – initial state from enviroment action_type (`type`) – numpy type of action (e.g. np.int32) initial_length (`int`) – initial length of a history

get_actions()[source]¶

Returns an array of all actions.

Return type:	`ndarray`
Returns:	array of all actions

get_dones()[source]¶

Returns an array of all done flags.

Return type:	`ndarray`
Returns:	array of all done flags

get_last_state()[source]¶

Returns only the last state.

Return type:	`ndarray`
Returns:	last state

get_number_of_episodes()[source]¶

Returns a number of full episodes in history.

Return type:	`int`
Returns:	number of full episodes in history

get_returns(discount_factor=1.0, horizon=inf)[source]¶

Calculates returns for each step.

Return type:	`ndarray`
Returns:	array of discounted returns for each step

get_rewards()[source]¶

Returns an array of all rewards.

Return type:	`ndarray`
Returns:	array of all rewards

get_states()[source]¶

Returns an array of all states.

Return type:	`ndarray`
Returns:	array of all states

get_summary()[source]¶

Return type:	(<class ‘float’>, <class ‘float’>, <class ‘int’>)

get_total_rewards()[source]¶

Calculates sum of all rewards for each episode and reports it for each state, so every state in one episode has the same value of total reward. This can be useful for filtering states for best episodes (e.g. in Cross Entropy Algorithm).

Return type:	`ndarray`
Returns:	total reward for each state

new_state_update(state)[source]¶

Overwrites newest state in the History

Parameters:	state (`ndarray`) – state array.

sample_batch(replay_buffer_size, batch_size=64, returns=False, next_states=False)[source]¶

Samples batch of examples from the Storage.

Parameters:	replay_buffer_size (`int`) – length of a replay buffor to sample examples from batch_size (`int`) – number of returned examples returns (`bool`) – if True, the method will return the returns from each step instead of the rewards next_states (`bool`) – if True, the method will return also next states (i.e. for DQN algorithm)
Returns:	states, actions, rewards, dones, (new_states)
Return type:	batch of samples from history in form of a tuple with np.ndarrays in order

update(action, reward, done, state)[source]¶

Updates the object with latest states, reward, actions and done flag.

Parameters:	action (`ndarray`) – action executed by the agent reward (`Real`) – reward from environments done (`bool`) – done flag from environments state (`ndarray`) – new state returned by wrapped environments after executing action

class Memory(initial_state, action_type, maximum_length=1000)[source]¶

Bases: prl.storage.storage.Storage, prl.typing.StorageABC

An object to be used as replay buffer. Doesn’t contain full episodes and acts as limited FIFO queue. Implemented as double size numpy arrays with duplicated data to support very fast slicing and sampling at the cost of higher memory usage.

Parameters:	initial_state (`ndarray`) – initial state from enviroment action_type – numpy type of action (e.g. np.int32) maximum_length (`int`) – maximum number of examples to keep in queue

clear(initial_state)[source]¶

get_actions()[source]¶

Returns an array of all actions.

Return type:	`ndarray`
Returns:	array of all actions

get_dones()[source]¶

Returns an array of all done flags.

Return type:	`ndarray`
Returns:	array of all done flags

get_last_state()[source]¶

Returns only the last state.

Return type:	`ndarray`
Returns:	last state

get_rewards()[source]¶

Returns an array of all rewards.

Return type:	`ndarray`
Returns:	array of all rewards

get_states(include_last=False)[source]¶

Returns an array of all states.

Return type:	`ndarray`
Returns:	array of all states

new_state_update(state)[source]¶

Overwrites newest state in the History

Parameters:	state – state array.

sample_batch(replay_buffor_size, batch_size=64, returns=False, next_states=False)[source]¶

Samples batch of examples from the Storage.

Parameters:	replay_buffer_size – length of a replay buffor to sample examples from batch_size (`int`) – number of returned examples returns (`bool`) – if True, the method will return the returns from each step instead of the rewards next_states (`bool`) – if True, the method will return also next states (i.e. for DQN algorithm)
Returns:	states, actions, rewards, dones, (new_states)
Return type:	batch of samples from history in form of a tuple with np.ndarrays in order

update(action, reward, done, state)[source]¶

Updates the object with latest states, reward, actions and done flag.

Parameters:	action – action executed by the agent reward – reward from environments done – done flag from environments state – new state returned by wrapped environments after executing action

class Storage[source]¶

Bases: prl.typing.StorageABC, abc.ABC

get_actions()[source]¶

Returns an array of all actions.

Return type:	`ndarray`
Returns:	array of all actions

get_dones()[source]¶

Returns an array of all done flags.

Return type:	`ndarray`
Returns:	array of all done flags

get_last_state()[source]¶

Returns only the last state.

Return type:	`ndarray`
Returns:	last state

get_rewards()[source]¶

Returns an array of all rewards.

Return type:	`ndarray`
Returns:	array of all rewards

get_states()[source]¶

Returns an array of all states.

Return type:	`ndarray`
Returns:	array of all states

new_state_update(state)[source]¶

Overwrites newest state in the History

Parameters:	state – state array.

sample_batch(replay_buffor_size, batch_size, returns, next_states)[source]¶

Samples batch of examples from the Storage.

Parameters:	replay_buffer_size – length of a replay buffor to sample examples from batch_size (`int`) – number of returned examples returns (`bool`) – if True, the method will return the returns from each step instead of the rewards next_states (`bool`) – if True, the method will return also next states (i.e. for DQN algorithm)
Returns:	states, actions, rewards, dones, (new_states)
Return type:	batch of samples from history in form of a tuple with np.ndarrays in order

update(action, reward, done, state)[source]¶

Updates the object with latest states, reward, actions and done flag.

Parameters:	action – action executed by the agent reward – reward from environments done – done flag from environments state – new state returned by wrapped environments after executing action

calculate_returns(all_rewards, dones, horizon, discount_factor, _index)[source]¶

calculate_total_rewards(all_rewards, dones, _index)[source]¶

prl.storage package¶

Submodules¶

prl.storage.storage module¶

Module contents¶