Simulation timestep doubt

Discussion in 'Simulation' started by Michael J Mathew, Aug 10, 2016.

  1. Hi,

    I generated an optimal control sequence for a cartpole swingup using direct collocation method with a time horizon of 10 seconds and having a control point at every 0.005 second. The dynamics constraints uses the values of mass and inertia from the xml file given for the cartpole(downloaded from resources). The control sequence works well and good for a toy simulation done on Matlab. I am trying to apply the same values to the model inside Mujoco in a feed forward fashion. For this, I have made the "time-step" as 0.005 inside the xml file before giving out the control sequence. But the cart pole does not stabilize like in the Matlab. Apart from that, the simulation takes more than 10 seconds. I guess the problem could be the mismatch between the control trajectory and time step in Mujoco. I tried another control sequence at 0.01 seconds, in this case the simulation runs faster but does not stabilize either. The Mujoco simulation just slows down as I reduce the timestep. I understand that if the time step of the control sequence and that of Mujoco does not match, everything can go wrong. Is there any additional settings to taken care to match the results in Matlab and Mujoco?

    Best,

    Mike
     
  2. Emo Todorov

    Emo Todorov Administrator Staff Member

    I am not quire sure what you are doing. Do you have your own simulation in MATLAB or are you somehow calling MuJoCo from MATLAB? Assuming it is the former, have you verified that the two dynamical systems are identical? You could do that by setting the position, velocity and control to identical values and checking if the resulting accelerations are identical up to numerical precision (you should check for multiple states and controls). MuJoCo has a very rich modeling language, and you could have a setting somewhere in the XML that affects the dynamics and makes it different from a generic cart-pole.

    MuJoCo should run many times faster than realtime on a cart-pole. So there must be something wrong with the way you are using it to integrate the dynamics. Simulations with ~50 degrees of freedom, ~10 contacts and 0.001 timestep run faster-than-realtime.

    As for interpolation of control sequences, assuming you do it carefully and the control sequence is reasonably smooth, you should be able to change the timestep to some extent and get similar behavior; but how much you can change it depends on the system. The cart-pole is an unstable under-actuated system, where even a small deviation will grow over time unless you have a feedback controller to correct for it. It sounds like you are doing open-loop control.
     
    Kyokushin likes this.
  3. Thanks for the reply.
    I am using the derivative.cpp file to get local linearization of a system. But when I try to obtain the next step from this linearized model, there is quite some deviation for the velocity term. This is for the same cart-pole system: it has four states (2 position, 2 velocities) and one control

    Current state: [qpos_old, qvel_old],
    Control: u_old

    Next state obtained by using model.step() from q_old with control u_old => [qpos_nxt, qvel_nxt]

    Local linearization obtained by using derivative.cpp in the order
    dinv/dpos, dinv/dvel, dinv/dacc, dacc/dpos, dacc/dvel, dacc/dfrc, I convert each of them into square matrices of size model->nv

    my linearized system matrices: A = [zeros(nv), eye(nv); dacc/dpos, dacc/dvel]
    B = [dacc/dfrc] , (only second column is taken)

    Now I get the next state by:
    [tmp_qpos_nxt, tmp_qvel_nxt] = [qpos_old, qvel_old] + (A*[qpos_old, qvel_old] + B*u_old)*model->time_step

    my tmp_qpos_nxt and qpos_nxt matches to almost 2 decimal places but qvel_nxt does not match tmp_qvel_nxt

    Could you please tell me if there is anything wrong with this A and B or is it expected?

    an example:

    values from derivative.cpp
    [[ 9.73080624e+05 -5.78418872e+02 0.00000000e+00 -1.24536790e+01]
    [ 3.89232750e+04 -3.66134991e+01 -3.55271368e-09 4.99999964e-02]
    [ 3.92877384e+02 1.26545638e+00 1.26545285e+00 6.43646334e-01]
    [ -2.49258969e+03 1.41987016e+00 4.90060231e+03 1.65522174e+01]
    [ -9.97037157e+01 9.40387963e-02 1.96024344e+02 -2.62569301e-01]
    [ 2.56136445e-03 -5.03632691e-03 -5.03587216e-03 1.56355009e+00]]

    uold = 250

    A =
    [[ 0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00]
    [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]
    [ -2.49258969e+03 1.41987016e+00 -9.97037157e+01 9.40387963e-02]
    [ 4.90060231e+03 1.65522174e+01 1.96024344e+02 -2.62569301e-01]]

    B =
    [[ 0. ]
    [ 0. ]
    [-0.00503633]
    [ 1.56355009]]

    State current [qpos_old, qvel_old] = [ 0.4 , 3.14159265, 100, -5.]

    result from model.step() => [qpos_nxt, qvel_nxt] = [ 0.50128427, 3.13964866, 101.28426817, -1.9439917 ]

    state from A and B : [tmp_qpos_nxt, tmp_qvel_nxt] = [ 0.5, 3.13659265, 89.03657042, 16.61989736]

    I know such doubts are quite simple are arise due to lack of examples, but it would indeed great if you could provide some simple examples to work in Mujoco.
     
  4. Emo Todorov

    Emo Todorov Administrator Staff Member

    Hard to tell what is going on by looking at those numbers. Several sanity checks you can do:

    -- derivative.cpp outputs diagnostics that indicate how accurate the numerical derivatives are for a given system and epsilon; what numbers do you get?

    -- derivative.cpp runs a passive dynamics simulation internally and computes derivatives for many states along the trajectory; I am assuming you rearranged the code so as to compute the A and B matrices for the state you are currently in (qpos_old, qvel_old)?

    -- you could write your own finite-difference code for this system, or even better, derive these derivatives analytically (the cart-pole is simple enough) and compare to the output of derivative.cpp;

    -- what happens if you set the control to 0? This will tell you if the errors are coming from A or B;

    -- what numerical integrator are you using? If it is Rk4, it does 4 smaller steps internally, so the result will be more accurate than the linear approximation. In general the accelerations are more sensitive than the velocities, so the linearization errors there could be bigger. In other words, what you are seeing could be a genuine property of the nonlinear dynamics combined with large timestep. If I had to guess, my guess would be that the problem is elsewhere, but I don't really know without working through it.


    Re examples, it would be great if more advanced users on the forum started posting tutorial examples. My priority for now is to finish the programming guide and reference documentation, and unfortunately I always end up working on the next release instead of properly documenting the current one :)
     
  5. --diagnostic values given by derivative.cpp for above state and control.
    G2*F2 - I : -6.9
    G2 - G2' : -inf
    G1 - G1' : 8.1e+80
    F2 - F2' : -inf
    G1 + G2*F1 : -6.6
    G0 + G2*F0 : -1.6
    F1 + F2*G1 : -5.5
    F0 + F2*G0 : -6.7

    -- I have written my own finite difference function and I was comparing that with the output of derivative.cpp. Since derivative.cpp averages over several steps and takes into account normalization of quaternions and all, I was using it as a reference code and then these doubts arose.

    --Using Rk4 didn't show much improvement rather than for a 3 or 4rth decimal place.

    --When I put 0 control, the states does match with almost no error, which means there are no errors in A or B matrix when I give zero control

    On a separate note: What does the data pointer mjtNum* body_invweight0 indicate? It was not clear from the comment given (mean inv inert in qpos0). Can I expect Matlab hooks for the upcoming mujoco release?
     
  6. Emo Todorov

    Emo Todorov Administrator Staff Member

    If you set the control to 0 and the state is ok, this means that the A matrix is fine but the B matrix is not.

    Looking at your diagnostics, the value 8.1e+80 means that something is wrong.

    Are you computing the derivatives for the state you are currently in, or somehow averaging, like derivative.cpp does? In order to get the correct A and B matrices, you should *not* average. This is a nonlinear system, therefore A and B are state-dependent and must be computed for the state you are currently in.

    I just tested it with a simple cartpole, and the diagnostics are fine on my end. My model is attached.

    The code sample on the website does not deal properly with 0/0 and log(0). Please replace the last line in the function relnorm() with:

    return (double) mju_log10(mju_max(mjMINVAL,L1res/mju_max(mjMINVAL,L1base)));

    I will also make the change on the website. This is not the reason for your problems though. With my cartpole, and with relnorm modified as above, I am getting:

    accuracy: log10(residual L1 relnorm)
    ------------------------------------
    G2*F2 - I : -14
    G2 - G2' : -14
    G1 - G1' : -14
    F2 - F2' : -14
    G1 + G2*F1 : -14
    G0 + G2*F0 : -12
    F1 + F2*G1 : -14
    F0 + F2*G0 : -12

    -14 = log10(mjMINVAL) is used when the residual gets too small; it means that the result is exact for all intents and purposes.

    Re MATLAB, it will be left out of 1.40 because that release is already too late (happening this weekend) but it will be added soon afterwards as a new code sample, along with some VR demos.
     

    Attached Files:

  7. There is one other line that causes problem in derivative.cpp for the diagnostics.
    So if you run derivative.cpp for a couple of more times without any change in code, you will get this strange results:
    accuracy: log10(residual L1 relnorm)
    ------------------------------------
    G2*F2 - I : -14
    G2 - G2' : -14
    G1 - G1' : -14
    F2 - F2' : -14
    G1 + G2*F1 : -14
    G0 + G2*F0 : -12
    F1 + F2*G1 : -14
    F0 + F2*G0 : -14

    accuracy: log10(residual L1 relnorm)
    ------------------------------------
    G2*F2 - I : -14
    G2 - G2' : -14
    G1 - G1' : -1.1e+182
    F2 - F2' : -14
    G1 + G2*F1 : -14
    G0 + G2*F0 : -12
    F1 + F2*G1 : -14
    F0 + F2*G0 : -14

    accuracy: log10(residual L1 relnorm)
    ------------------------------------
    G2*F2 - I : -14
    G2 - G2' : -14
    G1 - G1' : 1.5e+100
    F2 - F2' : -14
    G1 + G2*F1 : -14
    G0 + G2*F0 : -12
    F1 + F2*G1 : -14
    F0 + F2*G0 : -14

    Line 251 for G-G'
    originally it is : error[2] += relnorm(mat, G1, nv*nv); => this causes some junk value to get added I guess. Hence the diagnostic was wrong.
    changed it to : error[2] = relnorm(mat, G1, nv*nv); => with this the problem is fixed, so it will be good if you change in the website as well.
    regarding A and B, I am still looking. But your suggestion of not averaging it makes sense, let me try that way.
     
  8. I made my finite difference work, thought it predicts positions with a maximum deviation of +-0.0025 the velocities does deviate to a maximum of +-8 or even more occasionally. Can we get any better than this? This is a plot for the same cart pole system with a series of 1000 random inputs within actuator limits.[​IMG]
     
  9. Emo Todorov

    Emo Todorov Administrator Staff Member

    Thanks for catching that bug, it is not fixed on the website.

    The accuracy of the finite difference approximation depends on epsilon. It is normally a U-shaped curve: big error for big epsilon, small error for the right epsilon, and bigger error again for very small epsilon (the latter is due to round-off errors). So it helps to experiment with epsilon for a given system. You can also do centered finite differences instead of one-sided finite differences; that improves accuracy, at the expense of doing twice the computation.

    For your system though, the diagnostics from derivative.cpp look great, so as long as you don't average (and get rid of any bugs) you should be fine.

    As I mentioned earlier, the time-step you use for integration also matters when the system is non-linear. So try a smaller time step.