prl.environments package¶
Submodules¶
prl.environments.environments module¶
-
class
Environment
(env, environment_id='Environment_wrapper', state_transformer=<prl.transformers.state_transformers.NoOpStateTransformer object>, reward_transformer=<prl.transformers.reward_transformers.NoOpRewardTransformer object>, action_transformer=<prl.transformers.action_transformers.NoOpActionTransformer object>, expected_episode_length=512, dump_history=False)[source]¶ Bases:
prl.typing.EnvironmentABC
,abc.ABC
Interface for wrappers for gym-like environments. It can use
StateTransformer
andRewardTransformer
to shape states and rewards to a convenient form for the agent. It can also useActionTransformer
to change representation from the suitable to the agent to the required by the environments.Environment also keeps the history of current episode, so it doesn’t have to be implemented on the agent side. All the transformers can use this history to transform states, actions and rewards.
Parameters: - env (
Env
) – Environment with gym like API - environment_id (
str
) – ID of the env - state_transformer (
StateTransformerABC
) – Object of the classStateTransformer
- reward_transformer (
RewardTransformerABC
) – Object of the classRewardTransformer
- action_transformer (
ActionTransformerABC
) – Object of the classActionTransformer
-
action_space
¶ action_space object from the
action_transformer
Type: Returns Return type: Space
-
action_transformer
¶ Action transformers can be used to change the representation of actions like changing the coordinate system or feeding only a difference from the last action for continuous action space. ActionTransformer is used to change representation from the suitable to the agent to the required by the wrapped environments.
Return type: ActionTransformerABC
Returns: ActionTransformer
object
-
id
¶ Environment UUID
-
observation_space
¶ observation_space object from the
state_transformer
Type: Returns Return type: Space
-
reset
()[source]¶ Resets the environments to initial state and returns this initial state.
Return type: ndarray
Returns: New state
-
reward_transformer
¶ Reward transformer object for reward shaping like taking the sign of the original reward or adding reward for staying on track in a car racing game.
Return type: RewardTransformerABC
Returns: RewardTransformer
object
-
state_history
¶ Current episode history
Type: Returns Return type: HistoryABC
-
state_transformer
¶ StateTransformer object for state transformations. It can be used for changing representation of the state. For example it can be used for simply subtracting constant vector from the state, stacking the last N states or transforming image into compressed representation using autoencoder.
Return type: StateTransformer
Returns: StateTransformer
object
-
step
(action)[source]¶ Transform and perform a given action in the wrapped environment. Returns transformed states and rewards from wrapped environment.
Parameters: action ( ndarray
) – Action executed by the agent.Returns: New state reward: Reward we get from performing the action is done: Is the simulation finished info: Additional diagnostic information Return type: observation Note
When true_reward flag is set to True it returns non-transformed reward for the testing purposes.
- env (
-
class
FrameSkipEnvironment
(env, environment_id='frameskip_gym_environment_wrapper', state_transformer=<prl.transformers.state_transformers.NoOpStateTransformer object>, reward_transformer=<prl.transformers.reward_transformers.NoOpRewardTransformer object>, action_transformer=<prl.transformers.action_transformers.NoOpActionTransformer object>, expected_episode_length=512, n_skip_frames=0, cumulative_reward=False)[source]¶ Bases:
prl.environments.environments.Environment
Environment wrapper skipping frames from original environment. Action executed by the agent is repeated on the skipped frames.
Parameters: - env (
Env
) – Environment with gym like API - environment_id (
str
) – ID of the env - state_transformer (
StateTransformer
) – Object of the class StateTransformer - reward_transformer (
RewardTransformer
) – Object of the class RewardTransformer - action_transformer (
ActionTransformer
) – Object of the class ActionTransformer - n_skip_frames (
int
) – Number of frames to skip on each step. - cumulative_reward – If True, reward returned from step() method is cumulative reward from the skipped steps.
-
step
(action)[source]¶ Transform and perform a given action in the wrapped environment. Returns transformed states and rewards from wrapped environment.
Parameters: action ( ndarray
) – Action executed by the agent.Returns: New state reward: Reward we get from performing the action is done: Is the simulation finished info: Additional diagnostic information Return type: observation Note
When true_reward flag is set to True it returns non-transformed reward for the testing purposes.
- env (
-
class
TimeShiftEnvironment
(env, environment_id='timeshift_gym_environment_wrapper', state_transformer=<prl.transformers.state_transformers.NoOpStateTransformer object>, reward_transformer=<prl.transformers.reward_transformers.NoOpRewardTransformer object>, action_transformer=<prl.transformers.action_transformers.NoOpActionTransformer object>, expected_episode_length=512, lag=1)[source]¶ Bases:
prl.environments.environments.Environment
Environment wrapper creating lag between action passed to step() method by the agent and action execution in the environment. First ‘lag’ actions are sampled from action_space.
Parameters: - env (
Env
) – Environment with gym like API - environment_id (
str
) – ID of the env - state_transformer (
StateTransformer
) – Object of the class StateTransformer - reward_transformer (
RewardTransformer
) – Object of the class RewardTransformer - action_transformer (
ActionTransformer
) – Object of the class ActionTransformer (don’t use - not implemented action transformation)
Note
Class doesn’t have implemented action transformation.
-
reset
()[source]¶ Resets the environments to initial state and returns this initial state.
Return type: ndarray
Returns: New state
-
step
(action)[source]¶ Transform and perform a given action in the wrapped environment. Returns transformed states and rewards from wrapped environment.
Parameters: action ( ndarray
) – Action executed by the agent.Returns: New state reward: Reward we get from performing the action is done: Is the simulation finished info: Additional diagnostic information Return type: observation Note
When true_reward flag is set to True it returns non-transformed reward for the testing purposes.
- env (
-
class
TransformedSpace
(shape=None, dtype=None, transformed_state=None)[source]¶ Bases:
gym.core.Space
Class created to handle Environments using StateTransformers as the observation space is not directly specified in such a system.
-
contains
(state)[source]¶ This method is not available as TransformedSpace object can’t estimate whether x is contained by the state representation. It is caused because TransformedSpace object infers the state properties.
-
sample
()[source]¶ Return sample state. Object of this class returns always the same object. It needs to be created every sample. When used inside Environment with StateTransformer every call of property observation_space cause the initialization of new object, so another sample is returned.
Returns: Transformed state
-