-
Action space
:- Cartesian space: hand 6D pose
(x,y,z),(roll,pitch,yaw)
(ifuse_IK=1
) - Joints space: torso and arm joint positions (if
use_IK=0
) - You can choose which arm to control with flag
control_arm
, when instatiatingiCub Env
- Cartesian space: hand 6D pose
-
Observation space
:- hand 6D pose
- hand linear velocity
- arm and torso joint positions
-
Action space
:- Cartesian space: hand 6D pose
(x,y,z),(roll,pitch,yaw)
(ifuse_IK=1
) - Joints space (if
use_IK=0
)
- Cartesian space: hand 6D pose
-
Observation space
:- gripper 6D pose
- gripper linear velocity
- joint positions
The different task environments concerning the same robot differ mostly for:
- the extended observations provided;
- the reward function.
-
Action space
:- Cartesian space: hand 6D pose
(x,y,z),(roll,pitch,yaw)
(ifuse_IK=1
) - Joints space: torso and arm joint positions (if
use_IK=0
)
- Cartesian space: hand 6D pose
-
Observation space
:- iCub's observation space
- object absolute 6D pose
- object relative 6D pose wrt hand
The reward function is given by:
- distance between hand and the object (target) position
- plus a bonus when the hand is close to the object
Here is the code used to compute the reward function:
robot_obs, _ = self._robot.get_observation()
world_obs, _ = self._world.get_observation()
d = goal_distance(np.array(robot_obs[:3]), np.array(world_obs[:3]))
reward = -d
if d <= self._target_dist_min:
reward += np.float32(1000.0) + (100 - d*80)
-
Action space
:- Cartesian space: hand 6D pose
(x,y,z),(roll,pitch,yaw)
(ifuse_IK=1
) - Joints space: torso and arm joint positions (if
use_IK=0
)
- Cartesian space: hand 6D pose
-
Observation space
:- iCub's observation space
- object absolute 6D pose
- object linear velocity
- object relative 6D pose wrt hand
- target absolute pose
The reward function is given by:
- distance
d1
between hand and object (target) position - distance
d2
between object position and target position - plus a bonus when the hand is close to the object
The distances are encapsulated into the reward functions in two ways.
Note: current implementation of pushing task works only when initial positions of object and target are fixed. The implementation may changed.
Here is the code used to compute the reward function #0:
handPos = self._icub.getObservation()[0:3]
objPos, _ = p.getBasePositionAndOrientation(self._objID)
d1 = goal_distance(np.array(handPos), np.array(objPos))
d2 = goal_distance(np.array(objPos), np.array(self._tg_pose))
reward = -d1 -d2
if d2 <= self._target_dist_min:
reward += np.float32(1000.0)
Here is the code used to compute the reward function #1:
handPos = self._icub.getObservation()[0:3]
objPos, _ = p.getBasePositionAndOrientation(self._objID)
d1 = goal_distance(np.array(handPos), np.array(objPos))
d2 = goal_distance(np.array(objPos), np.array(self._tg_pose))
rew1 = 0.125
rew2 = 0.25
if d1 > 0.1:
reward = rew1*(1 - d1/self._init_dist_hand_obj)
else:
reward = rew1*(1 - d1/self._init_dist_hand_obj) + rew2*(1 - d2/self._max_dist_obj_tg)
if d2 <= self._target_dist_min:
reward += np.float32(1000.0)
Hindsight Experience Replay (HER) implementation of the iCub Push Env.
-
Action space
:- Cartesian space: hand 6D pose
(x,y,z),(roll,pitch,yaw)
(ifuse_IK=1
) - Joints space: torso and arm joint positions (if
use_IK=0
)
- Cartesian space: hand 6D pose
-
Observation space
:- iCub's observation space
- object 6D pose
- object linear velocity
- object relative 6D pose wrt hand
- target 3D pose
-
Achieved goal
:- object 3D pose
-
Desired goal
:- target 3D pose
-
Action space
:- Cartesian space: hand 6D pose
(x,y,z),(roll,pitch,yaw)
(ifuse_IK=1
) - Joints space (if
use_IK=0
)
- Cartesian space: hand 6D pose
-
Observation space
:- Panda's observation space
- object absolute 6D pose
- object relative 6D pose wrt hand
In this environment the reward function is given by:
- the distance between the end-effector and the desired position
- plus a bonus when the end-effector is close to the desired position
Here is the code used to compute the reward function:
robot_obs, _ = self._robot.get_observation()
world_obs, _ = self._world.get_observation()
d = goal_distance(np.array(robot_obs[:3]), np.array(world_obs[:3]))
reward = -d
if d <= self._target_dist_min:
reward = np.float32(1000.0) + (100 - d*80)
-
Action space
:- Cartesian space: hand 6D pose
(x,y,z),(roll,pitch,yaw)
(ifuse_IK=1
) - Joints space (if
use_IK=0
)
- Cartesian space: hand 6D pose
-
Observation space
:- Panda's observation space
- object absolute 6D pose
- object relative 6D pose wrt hand
- target 3D pose
The reward function is given by:
- distance
d1
between hand and object (target) position - distance
d2
between object position and target position - plus a bonus when the hand is close to the object
Here is the code used to compute the reward function:
robot_obs, _ = self._robot.get_observation()
world_obs, _ = self._world.get_observation()
d1 = goal_distance(np.array(robot_obs[:3]), np.array(world_obs[:3]))
d2 = goal_distance(np.array(world_obs[:3]), np.array(self._target_pose))
reward = -d1 - d2
if d2 <= self._target_dist_min:
reward = np.float32(1000.0) + (100 - d2*80)
Hindsight Experience Replay (HER) implementation of the Panda Push Env.
-
Action space
:- Cartesian space: hand 6D pose
(x,y,z),(roll,pitch,yaw)
(ifuse_IK=1
) - Joints space (if
use_IK=0
)
- Cartesian space: hand 6D pose
-
Observation space
:- Panda's observation space
- object absolute 6D pose
- object relative 6D pose wrt hand
- target 3D pose
-
Achieved goal
:- object 3D pose
-
Desired goal
:- target 3D pose