-
Notifications
You must be signed in to change notification settings - Fork 157
Conversation
@luisenp do you know with the current replay buffer trajectory storing, if at training time it will be easy to get the "trajectory time index" corresponding to the step of the episode? The trajectory-based model is trained on the full trajectory and each sub-trajectory of it, so I will need a tool to get all sub trajectories at the minimum. Happy to discuss on a call if it helps! |
Also, I'm not sure what to do with
Though if I change it, it wants me to stage them 😥
|
@robertocalandra @albertwilcox may be interested. |
Hi @natolambert, thanks for the PR! I have a few questions:
Also, can you try checking out branch |
I really like the sounds of that, let me try and make those additions. It removes complexity in a way that I think is fitting. |
This seems like a good direction, I'm realizing I will need to add another class similar to $$ s_{t+h} = f_\theta(s_t, h, \psi)$$. Not sure if I need a new MLP function, if I do it will just be for different trajectory propagation options (because it's no longer recursive, but rather all elements in one forward pass). Will push more changes shortly. Relatedly, why is it called I guess maybe we can spoof it as follows: With the right propagation function, nothing needs to change, including the replay buffer. We'll see if I can get it to work. |
TBH, I don't love the name |
Your feedback is making me think I may not actually need any of the new code I proposed other than some agent definitions for the environment I want to use. Hacking all the input-output pairs to be correct has gotten pretty far. Let me finish this, then we can see if we want to add more (which will make it clearer for people who are interested). The clear difference is trajectory propagation when compared with the |
@luisenp I got the initial notebook done. I've actually set it up so I can reset the rest of the files, and just merge the notebook when we are happy with it. I think the difference in the MBRL-Lib reacher env vs the one used in the paper is causing some numerical instability, but the general principle is close to being validated! Here's a long-term prediction accuracy comparing the one-step model to the traj-based model. Let me know if you look at the structure of the notebook! Some things I want to get to:
|
That's a nice plot! I'll take a closer look at this soon (hopefully tomorrow Friday, if I have time). Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this @natolambert. Left a bunch of comments, let me know if you have any questions :)
mbrl/planning/linear_feedback.py
Outdated
self.state_mapping = None # can set to run PID on specific variables | ||
|
||
|
||
# TODO: fix dimensionality with P |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO no longer relevant? What about the commented out code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that I remove all of this code from the library for now, and leave it in the notebook I think is fine.
mbrl/planning/linear_feedback.py
Outdated
self.target = target | ||
self.prev_error = 0 | ||
self.error = 0 | ||
# self.cum_error = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lingering commented out code.
mbrl/planning/linear_feedback.py
Outdated
# self.cum_error = 0 | ||
# self.I_count = 0 | ||
|
||
def act(self, obs: np.array) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add docstrings for this method (example)? Also, it looks like this method does not accept a batch of observations, can this be supported?
mbrl/planning/linear_feedback.py
Outdated
|
||
self.error = q_des - q | ||
P_value = self.Kp * self.error | ||
I_value = 0 # TODO: implement I and D part |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still relevant? What about the commented out code?
mbrl/planning/linear_feedback.py
Outdated
target: np.ndarray, | ||
): | ||
""" | ||
:param dim: dimensionality of state and control signal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this docstring is outdated. Also can we rename variables so they are using lower_case
capitalization?
@@ -0,0 +1,726 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you write a minimal version of this notebook that mostly imports stuff to train your model? Don't worry about adding any text or explanations, I can help with that later. I just want to have a clear picture of how your contributions in this PR work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still want this? I can also do this. I think now that I removed a lot of changes it should be simpler to follow.
@luisenp I guess the most relevant question is if any of the code in-the-works here interests you in the library or if we should just go notebook-only. I ended up removing most of them. I can add a commit that reset's everything except for the notebook. I re-used all vanilla stuff in a way that could be slightly confusing. In the case when people are interested in this, we could add official code that makes it much simpler to use. Some of the things I'm using in potentially non-intended manner:
I think that until the model gets more traction, notebook only is probably okay (I will reset all the other changes in the PR). It shows a good research use-case for how the library can be used flexibility. |
If everything you need to run this example is in the notebook, then that's definitely a good starting point! In that case I can focus on reviewing the notebook more carefully. We don't need to remove the other files yet, maybe some of them will be useful later (thinking of the PID agent here). That said, it would be useful if you can remove any superfluous stuff from the notebook, and maybe add some short text highlighting the parts I should focus more on when reviewing? |
Yeah will do, in that case I will do a full pass to clean things up (including addressing the things in the PID agent and making it more compatible with the style guidelines / precommits). |
@luisenp added tests, made batch friendly, and cleaned everything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some small comments. I got interrupted by some meetings, but I'll go back to the notebook and leave feedback later today. Thanks a lot for the changes!
mbrl/planning/linear_feedback.py
Outdated
self.error = 0 | ||
# self.cum_error = 0 | ||
# self.I_count = 0 | ||
This method optimizes a full sequence of length ``self.planning_horizon`` and returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be a bit of copy paste? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm referring to some text in the docstring that doesn't seem applicable to this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Yeah I will fix this. Missed this earlier!
tests/core/test_planning.py
Outdated
act = pid.act(init_obs) | ||
|
||
# check action computation | ||
assert act == pytest.approx(-7.043, 0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true regardless of the value of init_obs
? Or is it hard coded for this seed?
tests/core/test_planning.py
Outdated
act1 = pid.act(init_obs) | ||
next_obs = np.random.randn(4) | ||
act2 = pid.act(next_obs) | ||
assert act1 + act2 == pytest.approx([-6.141, -2.207], 0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true regardless of the value of init_obs? Or is it hard coded for this seed?
tests/core/test_planning.py
Outdated
next_obs = np.random.randn(4, batch_dim) | ||
act2 = pid.act(next_obs) | ||
|
||
assert (act1 + act2)[0] == pytest.approx([-7.155, 1.260, 8.679, -0.047, -1.962], 0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true regardless of the value of init_obs? Or is it hard coded for this seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be hardcoded from seed. Should verify on your machine / another machine before merging.
(we use tests of this style at huggingface)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always a bit scared that things like this will break with version changes. I'm OK with hard coding the input also, if you don't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, for something 4 dimensional or less that's super manageable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's useful, you can also add some sort of file with a set of known input/output pairs. We do this for some tests in theseus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's possible.
I'll push a change now. I am also changing the PID parameter generation to be ones rather than random to make deterministic.
After our discussion today, I think I am happy with this as a compromise:
If it gets solid uptake, we can polish it further. Thoughts? |
Hi Nathan, that sounds good, I'm OK with this plan. |
Colab is here: https://colab.research.google.com/drive/15lodC9KyzzQCv9hQY3wtAe-yYOdk9vZB?usp=sharing |
Any thoughts on adding a vanilla reward function, like
|
TODO for this WIP PR:
Types of changes
Motivation and Context / Related issue
I'm collaborating with some folks on Berkeley looking to apply the trajectory-based model to real world robotics, so I wanted to integrate it into this library to give it more longevity.
The paper is here. The core of the paper is proposing a long-term prediction focused dynamics model. The parametrization is:
where$\phi$ are closed form control parameters (e.g. PID)
Potentially this #66 , I think we will need to modify the replay buffer to
How Has This Been Tested (if it applies)
I am going to build a notebook to validate and demonstrate it, currently it is a fork of the PETS example. I will iterate
Checklist