Solver instead of RL - the Reacher model

Discussion in 'Simulation' started by Henryk, Feb 16, 2017.

  1. Regarding the model

    https://github.com/openai/gym/blob/master/gym/envs/mujoco/assets/reacher.xml

    Using the code https://github.com/joschu/modular_rl I trained an RL algorithm (TRPO) which effectively learns the task

    https://gym.openai.com/envs/Reacher-v1

    (the task consists in getting the arm to the marked point and at the same time minimize the torque - rewards are given per step and a precise definition of a reward is in https://github.com/openai/gym/blob/master/gym/envs/mujoco/reacher.py#L14)

    Is it a realistic idea to phrase the task as a constraint for the solver inside of MuJoCo and get a similar solution not via the RL algorithm, but directly from the solver? What would be the right starting point? Overall, I would be glad to see a non-RL solution to this simple task (if such a solution exists). Sorry for a naive question.
     
    Last edited: Feb 16, 2017
  2. Emo Todorov

    Emo Todorov Administrator Staff Member

    Yes this is a trivial control problem, normally solved by Jacobian methods in Robotics. You can also trick MuJoCo to solve the problem for you. Add a soft equality constraint between the hand and the target location. This will generate constraint forces (mjData.qfrc_constraint) that propel the arm to the target and make it stop there. Now you can take these forces and use them as controls in the original model which doesn't have the equality constraint.