Add trajectory-based dynamics model #158

natolambert · 2022-07-26T20:34:17Z

TODO for this WIP PR:

Types of changes

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Motivation and Context / Related issue

I'm collaborating with some folks on Berkeley looking to apply the trajectory-based model to real world robotics, so I wanted to integrate it into this library to give it more longevity.

The paper is here. The core of the paper is proposing a long-term prediction focused dynamics model. The parametrization is:

$$ s_{t+1} = f_\theta(s_0, t, \phi),$$

where $\phi$ are closed form control parameters (e.g. PID)

Potentially this #66 , I think we will need to modify the replay buffer to

store control parameter vector
store time indices (which may be close with the trajectory formulation)

How Has This Been Tested (if it applies)

I am going to build a notebook to validate and demonstrate it, currently it is a fork of the PETS example. I will iterate

Checklist

The documentation is up-to-date with the changes I made.
I have read the CONTRIBUTING document and completed the CLA (see CONTRIBUTING).
All tests passed, and additional code has been covered with new tests.

natolambert · 2022-07-26T21:10:40Z

@luisenp do you know with the current replay buffer trajectory storing, if at training time it will be easy to get the "trajectory time index" corresponding to the step of the episode?

The trajectory-based model is trained on the full trajectory and each sub-trajectory of it, so I will need a tool to get all sub trajectories at the minimum. Happy to discuss on a call if it helps!

natolambert · 2022-07-26T21:32:44Z

Also, I'm not sure what to do with pre-commits, they're failing because I am using a different version of python I think?

gi[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
t puAn unexpected error has occurred: CalledProcessError: command: ('/Users/nato/miniconda3/envs/mbrl/bin/python', '-mvirtualenv', '/Users/nato/.cache/pre-commit/repor4ja1kkq/py_env-python3.7', '-p', 'python3.7')
return code: 1
expected return code: 0
stdout:
    RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.7'

Though if I change it, it wants me to stage them 😥

[ERROR] Your pre-commit configuration is unstaged.
git add .pre-commit-config.yaml to fix this.

natolambert · 2022-07-26T23:39:00Z

@robertocalandra @albertwilcox may be interested.

luisenp · 2022-07-27T13:26:19Z

Hi @natolambert, thanks for the PR! I have a few questions:

I'm wondering if having a new replay buffer class for this is overkill, couldn't we just store the control parameter and time index as part of the state in a regular replay buffer?
By getting "all subtrajectories", are you referring to all N*(N+1) / 2 subtrajectories of a trajectory of length N? My first though is that the cleanest would be to create a new iterator class that gets the full trajectories from the replay buffer, and then serves the subtrajectories in its loop.

Also, can you try checking out branch lep.change_precommit_yaml and let me know if this solves the precommit issues you are having?

natolambert · 2022-07-27T22:13:57Z

I really like the sounds of that, let me try and make those additions. It removes complexity in a way that I think is fitting.

natolambert · 2022-08-02T23:14:07Z

This seems like a good direction, I'm realizing I will need to add another class similar to OneDTransitionRewardModel because the prediction formulation is as follows (rather than one step):

$$ s_{t+h} = f_\theta(s_t, h, \psi)$$.

Not sure if I need a new MLP function, if I do it will just be for different trajectory propagation options (because it's no longer recursive, but rather all elements in one forward pass).

Will push more changes shortly. Relatedly, why is it called OneDTransitionRewardModel? that's always confused me.

I guess maybe we can spoof it as follows:
Inputs: state = state + time index, action = control parameters
Outputs: state at future time index

With the right propagation function, nothing needs to change, including the replay buffer. We'll see if I can get it to work.

luisenp · 2022-08-03T14:53:06Z

This seems like a good direction, I'm realizing I will need to add another class similar to OneDTransitionRewardModel because the prediction formulation is as follows (rather than one step):

$$ s_{t+h} = f_\theta(s_t, h, \psi)$$.

Not sure if I need a new MLP function, if I do it will just be for different trajectory propagation options (because it's no longer recursive, but rather all elements in one forward pass).

Will push more changes shortly. Relatedly, why is it called OneDTransitionRewardModel? that's always confused me.

I guess maybe we can spoof it as follows: Inputs: state = state + time index, action = control parameters Outputs: state at future time index

With the right propagation function, nothing needs to change, including the replay buffer. We'll see if I can get it to work.

TBH, I don't love the name OneDTransitionRewardModel either. A previous version was called ProprioceptiveModel, but an earlier user pointed out that in some cases the same type model could be used for inputs that wouldn't be considered proprioceptive. We brainstormed a bit on what a general name for this model would be, and this was the winning option. It essentially refers to the fact that the input to this model is a 1D array with [states, actions] and the output is a 1D array with [next_states, rewards].

natolambert · 2022-08-10T02:37:08Z

Your feedback is making me think I may not actually need any of the new code I proposed other than some agent definitions for the environment I want to use.

Hacking all the input-output pairs to be correct has gotten pretty far. Let me finish this, then we can see if we want to add more (which will make it clearer for people who are interested). The clear difference is trajectory propagation when compared with the GaussianMLP class.

natolambert · 2022-08-10T02:47:33Z

Got "loss going down" which is always an exciting point. I will add more validation soon.

natolambert · 2022-08-10T21:55:49Z

@luisenp I got the initial notebook done. I've actually set it up so I can reset the rest of the files, and just merge the notebook when we are happy with it. I think the difference in the MBRL-Lib reacher env vs the one used in the paper is causing some numerical instability, but the general principle is close to being validated!

Here's a long-term prediction accuracy comparing the one-step model to the traj-based model.
I'm not 100% sure I'm predicting with the traj-based model right, because I couldn't use the wrappers. It predicts by passing an initial state, control parameters, and a vector of time indices.

Let me know if you look at the structure of the notebook!

Some things I want to get to:

set up batched inference so I can use ensembles (any tips on this?),
try a different environment,
see if more data helps numerical stability.

Here's the plot comparing long-term prediction accuracy.

luisenp · 2022-08-11T13:32:28Z

That's a nice plot! I'll take a closer look at this soon (hopefully tomorrow Friday, if I have time). Thanks!

luisenp

Thanks a lot for working on this @natolambert. Left a bunch of comments, let me know if you have any questions :)

.pre-commit-config.yaml

luisenp · 2022-08-15T14:36:24Z

mbrl/planning/linear_feedback.py

+        self.state_mapping = None   # can set to run PID on specific variables
+
+
+        # TODO: fix dimensionality with P


Is this TODO no longer relevant? What about the commented out code?

In the case that I remove all of this code from the library for now, and leave it in the notebook I think is fine.

luisenp · 2022-08-15T14:36:39Z

mbrl/planning/linear_feedback.py

+        self.target = target
+        self.prev_error = 0
+        self.error = 0
+        # self.cum_error = 0


Lingering commented out code.

luisenp · 2022-08-15T14:39:17Z

mbrl/planning/linear_feedback.py

+        # self.cum_error = 0
+        # self.I_count = 0
+
+    def act(self, obs: np.array) -> np.ndarray:


Can we add docstrings for this method (example)? Also, it looks like this method does not accept a batch of observations, can this be supported?

luisenp · 2022-08-15T14:39:46Z

mbrl/planning/linear_feedback.py

+
+        self.error = q_des - q
+        P_value = self.Kp * self.error
+        I_value = 0  # TODO: implement I and D part


Is this TODO still relevant? What about the commented out code?

luisenp · 2022-08-15T14:42:51Z

mbrl/planning/linear_feedback.py

+        target: np.ndarray,
+    ):
+        """
+        :param dim: dimensionality of state and control signal


Looks like this docstring is outdated. Also can we rename variables so they are using lower_case capitalization?

mbrl/planning/linear_feedback.py

mbrl/types.py

luisenp · 2022-08-15T15:12:05Z

notebooks/traj_based_model.ipynb

@@ -0,0 +1,726 @@
+{


Could you write a minimal version of this notebook that mostly imports stuff to train your model? Don't worry about adding any text or explanations, I can help with that later. I just want to have a clear picture of how your contributions in this PR work.

Do you still want this? I can also do this. I think now that I removed a lot of changes it should be simpler to follow.

mbrl/util/replay_buffer.py

natolambert · 2022-08-15T19:28:24Z

@luisenp I guess the most relevant question is if any of the code in-the-works here interests you in the library or if we should just go notebook-only.

I ended up removing most of them. I can add a commit that reset's everything except for the notebook. I re-used all vanilla stuff in a way that could be slightly confusing. In the case when people are interested in this, we could add official code that makes it much simpler to use. Some of the things I'm using in potentially non-intended manner:

trajectory storing in an extra replay buffer because trajectory-based model trains on sub-trajectories,
putting control parameters + time horizon into actions (the naming discrepancy could be confusing for people),
had to deconstruct util.rollout_agent_trajectories to store the data I needed (even if I just wanted one replay buffer).

I think that until the model gets more traction, notebook only is probably okay (I will reset all the other changes in the PR). It shows a good research use-case for how the library can be used flexibility.

luisenp · 2022-08-15T20:21:13Z

If everything you need to run this example is in the notebook, then that's definitely a good starting point! In that case I can focus on reviewing the notebook more carefully. We don't need to remove the other files yet, maybe some of them will be useful later (thinking of the PID agent here).

That said, it would be useful if you can remove any superfluous stuff from the notebook, and maybe add some short text highlighting the parts I should focus more on when reviewing?

natolambert · 2022-08-15T20:32:25Z

Yeah will do, in that case I will do a full pass to clean things up (including addressing the things in the PID agent and making it more compatible with the style guidelines / precommits).

natolambert · 2022-08-16T22:49:54Z

@luisenp added tests, made batch friendly, and cleaned everything.

luisenp

Left some small comments. I got interrupted by some meetings, but I'll go back to the notebook and leave feedback later today. Thanks a lot for the changes!

luisenp · 2022-08-19T14:55:39Z

mbrl/planning/linear_feedback.py

-        self.error = 0
-        # self.cum_error = 0
-        # self.I_count = 0
+        This method optimizes a full sequence of length ``self.planning_horizon`` and returns


I think this might be a bit of copy paste? :)

To be clear, I'm referring to some text in the docstring that doesn't seem applicable to this method.

Ah! Yeah I will fix this. Missed this earlier!

luisenp · 2022-08-19T15:05:23Z

tests/core/test_planning.py

+    act = pid.act(init_obs)
+
+    # check action computation
+    assert act == pytest.approx(-7.043, 0.1)


Is this true regardless of the value of init_obs? Or is it hard coded for this seed?

luisenp · 2022-08-19T15:06:12Z

tests/core/test_planning.py

+    act1 = pid.act(init_obs)
+    next_obs = np.random.randn(4)
+    act2 = pid.act(next_obs)
+    assert act1 + act2 == pytest.approx([-6.141, -2.207], 0.1)


Is this true regardless of the value of init_obs? Or is it hard coded for this seed?

luisenp · 2022-08-19T15:07:40Z

tests/core/test_planning.py

+    next_obs = np.random.randn(4, batch_dim)
+    act2 = pid.act(next_obs)
+
+    assert (act1 + act2)[0] == pytest.approx([-7.155, 1.260, 8.679, -0.047, -1.962], 0.1)


Is this true regardless of the value of init_obs? Or is it hard coded for this seed?

Should be hardcoded from seed. Should verify on your machine / another machine before merging.

(we use tests of this style at huggingface)

I'm always a bit scared that things like this will break with version changes. I'm OK with hard coding the input also, if you don't mind.

I can do that, for something 4 dimensional or less that's super manageable.

If it's useful, you can also add some sort of file with a set of known input/output pairs. We do this for some tests in theseus.

Yeah that's possible.

I'll push a change now. I am also changing the PID parameter generation to be ones rather than random to make deterministic.

natolambert · 2022-08-24T01:00:11Z

After our discussion today, I think I am happy with this as a compromise:

transfer the notebook to colab, and link to colab in readme + link to source in my mbrl-lib fork (after minor improvements).
make this PR just for PID control.

If it gets solid uptake, we can polish it further. Thoughts?

luisenp · 2022-08-24T13:21:43Z

Hi Nathan, that sounds good, I'm OK with this plan.

natolambert · 2022-08-24T21:31:11Z

Colab is here: https://colab.research.google.com/drive/15lodC9KyzzQCv9hQY3wtAe-yYOdk9vZB?usp=sharing
My dev repo is here: https://github.com/natolambert/mbrl-lib-dev/tree/traj-model (will merge into my main branch when we're done here)

natolambert · 2022-08-24T22:09:31Z

Any thoughts on adding a vanilla reward function, like no_termination (for unrolling states where you don't care about reward, or for in reacher where there is no reward function right now).

def no_reward(act: torch.Tensor, next_obs: torch.Tensor) -> torch.Tensor:
    return torch.Tensor(0)

init PR for trajectory based model

77e34b2

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 26, 2022

natolambert changed the title ~~Add trajectory-based dynamics model~~ [WIP] Add trajectory-based dynamics model Jul 26, 2022

clean model file, add replay buffer class

9c6a241

natolambert added 2 commits August 9, 2022 14:55

setup data collection in notebook, will do precommits soon

64e5245

Merge remote-tracking branch 'origin' into traj-model

18f7b34

natolambert added 2 commits August 9, 2022 19:48

notebook loss goes down

bd9aa62

initial notebook added

4ead491

luisenp suggested changes Aug 15, 2022

View reviewed changes

natolambert mentioned this pull request Aug 15, 2022

[Feature Request] Output Normalization / Scaling #164

Open

natolambert added 6 commits August 15, 2022 13:57

add ensemble support to notebook

2818f46

substantially clean notebook, add text

85f3a16

remove unused changes

28308ab

clean PID implementation

196431c

minor text changes

2cb94ac

make batch friendly, add tests

4b25138

lint tests

ebd5751

reset precommits to main

55fdec4

luisenp reviewed Aug 19, 2022

View reviewed changes

natolambert changed the title ~~[WIP] Add trajectory-based dynamics model~~ Add trajectory-based dynamics model Aug 19, 2022

natolambert added 3 commits August 19, 2022 10:52

make tests deterministic

8a28b01

make tests deterministic

96876e2

fix docstring

e708945

natolambert added 2 commits August 24, 2022 15:12

fix notebook (some content wasn't saved on final save)

baa128c

add colab to readme

8f1682a

natolambert mentioned this pull request Aug 31, 2022

PID code and Update Readme #165

Merged

7 tasks

natolambert closed this Aug 31, 2022

		self.state_mapping = None # can set to run PID on specific variables


		# TODO: fix dimensionality with P

Add trajectory-based dynamics model #158

Add trajectory-based dynamics model #158

Conversation

natolambert commented Jul 26, 2022 • edited Loading

Types of changes

Motivation and Context / Related issue

How Has This Been Tested (if it applies)

Checklist

natolambert commented Jul 26, 2022

natolambert commented Jul 26, 2022 • edited Loading

natolambert commented Jul 26, 2022

luisenp commented Jul 27, 2022

natolambert commented Jul 27, 2022

natolambert commented Aug 2, 2022 • edited Loading

luisenp commented Aug 3, 2022 • edited Loading

natolambert commented Aug 10, 2022

natolambert commented Aug 10, 2022

natolambert commented Aug 10, 2022 • edited Loading

luisenp commented Aug 11, 2022

luisenp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natolambert commented Aug 15, 2022 • edited Loading

luisenp commented Aug 15, 2022

natolambert commented Aug 15, 2022

natolambert commented Aug 16, 2022

luisenp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natolambert Aug 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natolambert commented Aug 24, 2022

luisenp commented Aug 24, 2022

natolambert commented Aug 24, 2022

natolambert commented Aug 24, 2022

natolambert commented Jul 26, 2022 •

edited

Loading

natolambert commented Jul 26, 2022 •

edited

Loading

natolambert commented Aug 2, 2022 •

edited

Loading

luisenp commented Aug 3, 2022 •

edited

Loading

natolambert commented Aug 10, 2022 •

edited

Loading

natolambert commented Aug 15, 2022 •

edited

Loading

natolambert Aug 22, 2022 •

edited

Loading