Regarding the model https://github.com/openai/gym/blob/master/gym/envs/mujoco/assets/reacher.xml Using the code https://github.com/joschu/modular_rl I trained an RL algorithm (TRPO) which effectively learns the task https://gym.openai.com/envs/Reacher-v1 (the task consists in getting the arm to the marked point and at the same time minimize the torque - rewards are given per step and a precise definition of a reward is in https://github.com/openai/gym/blob/master/gym/envs/mujoco/reacher.py#L14) Is it a realistic idea to phrase the task as a constraint for the solver inside of MuJoCo and get a similar solution not via the RL algorithm, but directly from the solver? What would be the right starting point? Overall, I would be glad to see a non-RL solution to this simple task (if such a solution exists). Sorry for a naive question.
Yes this is a trivial control problem, normally solved by Jacobian methods in Robotics. You can also trick MuJoCo to solve the problem for you. Add a soft equality constraint between the hand and the target location. This will generate constraint forces (mjData.qfrc_constraint) that propel the arm to the target and make it stop there. Now you can take these forces and use them as controls in the original model which doesn't have the equality constraint.