prl.storage package¶
Submodules¶
prl.storage.storage module¶
-
class
History
(initial_state, action_type, initial_length=512)[source]¶ Bases:
prl.storage.storage.Storage
,prl.typing.HistoryABC
An object which is used to keep the episodes history (used within
Environment
class and by some agents). Agent can use this object to keep history of past episodes, calculate returns, total rewards, etc. and sample batches from it.Object also supports indexing and slicing because it supports python Sequence protocol, so functions working on sequences like random.choice can be also used on history.
Parameters: - initial_state (
ndarray
) – initial state from enviroment - action_type (
type
) – numpy type of action (e.g. np.int32) - initial_length (
int
) – initial length of a history
-
get_actions
()[source]¶ Returns an array of all actions.
Return type: ndarray
Returns: array of all actions
-
get_dones
()[source]¶ Returns an array of all done flags.
Return type: ndarray
Returns: array of all done flags
-
get_number_of_episodes
()[source]¶ Returns a number of full episodes in history.
Return type: int
Returns: number of full episodes in history
-
get_returns
(discount_factor=1.0, horizon=inf)[source]¶ Calculates returns for each step.
Return type: ndarray
Returns: array of discounted returns for each step
-
get_rewards
()[source]¶ Returns an array of all rewards.
Return type: ndarray
Returns: array of all rewards
-
get_states
()[source]¶ Returns an array of all states.
Return type: ndarray
Returns: array of all states
-
get_total_rewards
()[source]¶ Calculates sum of all rewards for each episode and reports it for each state, so every state in one episode has the same value of total reward. This can be useful for filtering states for best episodes (e.g. in Cross Entropy Algorithm).
Return type: ndarray
Returns: total reward for each state
-
new_state_update
(state)[source]¶ Overwrites newest state in the History
Parameters: state ( ndarray
) – state array.
-
sample_batch
(replay_buffer_size, batch_size=64, returns=False, next_states=False)[source]¶ Samples batch of examples from the Storage.
Parameters: - replay_buffer_size (
int
) – length of a replay buffor to sample examples from - batch_size (
int
) – number of returned examples - returns (
bool
) – if True, the method will return the returns from each step instead of the rewards - next_states (
bool
) – if True, the method will return also next states (i.e. for DQN algorithm)
Returns: states, actions, rewards, dones, (new_states)
Return type: batch of samples from history in form of a tuple with np.ndarrays in order
- replay_buffer_size (
-
update
(action, reward, done, state)[source]¶ Updates the object with latest states, reward, actions and done flag.
Parameters: - action (
ndarray
) – action executed by the agent - reward (
Real
) – reward from environments - done (
bool
) – done flag from environments - state (
ndarray
) – new state returned by wrapped environments after executing action
- action (
- initial_state (
-
class
Memory
(initial_state, action_type, maximum_length=1000)[source]¶ Bases:
prl.storage.storage.Storage
,prl.typing.StorageABC
An object to be used as replay buffer. Doesn’t contain full episodes and acts as limited FIFO queue. Implemented as double size numpy arrays with duplicated data to support very fast slicing and sampling at the cost of higher memory usage.
Parameters: - initial_state (
ndarray
) – initial state from enviroment - action_type – numpy type of action (e.g. np.int32)
- maximum_length (
int
) – maximum number of examples to keep in queue
-
get_actions
()[source]¶ Returns an array of all actions.
Return type: ndarray
Returns: array of all actions
-
get_dones
()[source]¶ Returns an array of all done flags.
Return type: ndarray
Returns: array of all done flags
-
get_rewards
()[source]¶ Returns an array of all rewards.
Return type: ndarray
Returns: array of all rewards
-
get_states
(include_last=False)[source]¶ Returns an array of all states.
Return type: ndarray
Returns: array of all states
-
new_state_update
(state)[source]¶ Overwrites newest state in the History
Parameters: state – state array.
-
sample_batch
(replay_buffor_size, batch_size=64, returns=False, next_states=False)[source]¶ Samples batch of examples from the Storage.
Parameters: - replay_buffer_size – length of a replay buffor to sample examples from
- batch_size (
int
) – number of returned examples - returns (
bool
) – if True, the method will return the returns from each step instead of the rewards - next_states (
bool
) – if True, the method will return also next states (i.e. for DQN algorithm)
Returns: states, actions, rewards, dones, (new_states)
Return type: batch of samples from history in form of a tuple with np.ndarrays in order
-
update
(action, reward, done, state)[source]¶ Updates the object with latest states, reward, actions and done flag.
Parameters: - action – action executed by the agent
- reward – reward from environments
- done – done flag from environments
- state – new state returned by wrapped environments after executing action
- initial_state (
-
class
Storage
[source]¶ Bases:
prl.typing.StorageABC
,abc.ABC
-
get_actions
()[source]¶ Returns an array of all actions.
Return type: ndarray
Returns: array of all actions
-
get_dones
()[source]¶ Returns an array of all done flags.
Return type: ndarray
Returns: array of all done flags
-
get_rewards
()[source]¶ Returns an array of all rewards.
Return type: ndarray
Returns: array of all rewards
-
get_states
()[source]¶ Returns an array of all states.
Return type: ndarray
Returns: array of all states
-
new_state_update
(state)[source]¶ Overwrites newest state in the History
Parameters: state – state array.
-
sample_batch
(replay_buffor_size, batch_size, returns, next_states)[source]¶ Samples batch of examples from the Storage.
Parameters: - replay_buffer_size – length of a replay buffor to sample examples from
- batch_size (
int
) – number of returned examples - returns (
bool
) – if True, the method will return the returns from each step instead of the rewards - next_states (
bool
) – if True, the method will return also next states (i.e. for DQN algorithm)
Returns: states, actions, rewards, dones, (new_states)
Return type: batch of samples from history in form of a tuple with np.ndarrays in order
-
update
(action, reward, done, state)[source]¶ Updates the object with latest states, reward, actions and done flag.
Parameters: - action – action executed by the agent
- reward – reward from environments
- done – done flag from environments
- state – new state returned by wrapped environments after executing action
-