[Performance] Prioritised TensorDict replay buffers use for loops over the batch dimension #1574

matteobettini · 2023-09-25T09:56:14Z

In prioritised tensordict replay buffers, the update_tensordict_priority, method performs a for loop over the batch dimension

rl/torchrl/data/replay_buffers/replay_buffers.py

Lines 760 to 763 in 434fe58

    
           priority = torch.tensor( 
        
               [self._get_priority(td) for td in data], 
        
               dtype=torch.float, 
        
               device=data.device,

This causes significant slowdowns as this is the vectorised dimension used in the training pipelines and can get to really high sizes.

This method is called every time the buffer is extended or the priorities are updated.

The text was updated successfully, but these errors were encountered:

vmoens · 2023-10-01T05:12:14Z

Any update on this? Can I help?

matteobettini · 2023-10-01T09:32:55Z

I am not super sure about all the cases that were envisioned when the component was created this way or why we were doing this.
If you can give me some context i can try to fix it or if you already know an easy fix feel free to do it.

vmoens · 2023-10-01T19:25:07Z

We just wanted to be able to handle lists of tensordicts i guess, and it should work ok with a list storage and stuff that you can't necessarily stack well, no more insight than this I'm afraid.
But in general I think that if all tests pass we should be covered.

matteobettini · 2023-10-02T08:12:46Z

i don't think this is currently the case as the update_tensordict_priority expects TensorDictBase since in its code it calls methods such as ndim, get and so on.

while extend is compatible with lists, if extends is called with a list that is not stackable as a lazy stack, it will fail when the data is passed to update_tensordict_priority

furthermore, if extends is called with a list, the _data and index fileds are not set, so it is not clear where the update_tensordict_priority should be able to find them.

i can try to do some patch work to make update_tensordict_priority and _get_priority work vectorized, but i am very much confused by how this class is supposed to work and its contracts

vmoens · 2023-10-03T19:55:23Z

If the method is broken with list it won't be bc-breaking if we don't support lists anymore, and any fix that enables the passing of lists to extend will be a "new feature" (since now it isn't a feature).
In other words: i'm ok with considering that everything passed to that method is a td of some sort

matteobettini · 2023-10-03T21:08:41Z

Ok I'll do this in #1598

matteobettini added the bug Something isn't working label Sep 25, 2023

matteobettini assigned vmoens Sep 25, 2023

matteobettini mentioned this issue Oct 3, 2023

[BugFix] Vectorized priority update in replay buffers #1598

Merged

vmoens closed this as completed Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Prioritised TensorDict replay buffers use for loops over the batch dimension #1574

[Performance] Prioritised TensorDict replay buffers use for loops over the batch dimension #1574

matteobettini commented Sep 25, 2023

vmoens commented Oct 1, 2023

matteobettini commented Oct 1, 2023

vmoens commented Oct 1, 2023

matteobettini commented Oct 2, 2023

vmoens commented Oct 3, 2023

matteobettini commented Oct 3, 2023

[Performance] Prioritised TensorDict replay buffers use for loops over the batch dimension #1574

[Performance] Prioritised TensorDict replay buffers use for loops over the batch dimension #1574

Comments

matteobettini commented Sep 25, 2023

vmoens commented Oct 1, 2023

matteobettini commented Oct 1, 2023

vmoens commented Oct 1, 2023

matteobettini commented Oct 2, 2023

vmoens commented Oct 3, 2023

matteobettini commented Oct 3, 2023