Actuator Gain PRM causing issues at scale

Discussion in 'Simulation' started by Felix Su, Feb 5, 2020.

  1. I am trying to modify the actuator force gain's P-Term for the ShadowHand environment (Actuator XML here: https://github.com/openai/gym/blob/...gym/envs/robotics/assets/hand/shared.xml#L232).

    I am trying to duplicate the domain randomization listed in Table 1 of the OpenAI Learning Dextrous In-Hand Manipulation Paper (https://arxiv.org/pdf/1808.00177.pdf).

    To do this I wrote the following code:

    Code:
    for actuator_name in sim.model.actuator_names:
        scale = np.exp(np.random.uniform(np.log(0.75), np.log(1.5)))
        sim.model.actuator_gainprm[sim.model.actuator_name2id(actuator_name)][0] *= scale
    
    This causes no problems when I run a few rollouts locally, but when I run domain randomization at scale using a machine learning model that trains on thousands of rollouts, my training stops in the middle and the CPU stops doing work without throwing any errors.

    Is there something incorrect that I am doing to the actuator gain that is causing instability int he environment? Also, is there any situation where the environment can fail silently without errors?