Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy resampling executes non-lazy transforms without executing pending transforms #6450

Closed
atbenmurray opened this issue Apr 28, 2023 · 34 comments
Assignees
Labels
Design discussions related to the generic API designs

Comments

@atbenmurray
Copy link
Contributor

Describe the bug
Lazy resampling will execute the following pipeline incorrectly:

d = torch.zeros((1, 40, 40, 40))

xform = mt.Compose(
    [
        mt.Flip(spatial_axis=0),
        mt.Rotate90(k=1),
        mt.Rand2DElastic(3, (5, 10)),
        mt.Zoom(zoom=0.8),
        mt.Spacing(pixdim=1.2),
    ],
    lazy_evaluation=True,
    verbose=True
)
r = xform(d)

My understanding is that this was implemented deliberately to provide improved performance on some of the existing pipelines that are being benchmarked, but it will break a lot of our users pipelines if the non-lazy transforms rely on the pending transforms to have been run:
. Noise will undergo spatial distortion that was not intended
. Non-lazy spatial transforms will output the wrong result

While we want the best performance on our pipelines, we shouldn't expect our users to have to modify their pipelines to execute correctly.

The most obvious solution is a new flag for Compose. We've discussed it as a 1.3 feature but now we need it for this release.

class Compose(Randomizable, InvertibleTransform):
    def __init__(
        self,
        transforms: Sequence[Callable] | Callable | None = None,
        map_items: bool = True,
        unpack_items: bool = False,
        lazy: bool | None = False,
        options: ??? | None = None,
        overrides: dict | None = None,
        logger_name: str | None = None,

Options can be thought of like compile flags for a c++ compiler.
We should pick a way of specifying them that is extensible enough for our purpose and easy for the user:

"optimize: lazy_to_front, another_option: 3" # option string
["optimize: lazy_to_front", another_option: 3"] # option string array
{"optimize": "lazy_to_front", another_option: 3"} #json-like

*** Flags for 1.2 ***
We only need one or two flags for 1.2
. "lazy_to_front", "lazy_to_back"

These should physically reorder the transforms for execution. It can reshuffle the transforms in a way similar to RandomOrder.

@myron This should cover the benchmark performance that you need, yes?
@wyli This seems implementable in #6257 fairly simply. Given that RandomOrder works fine during inverse AFAIK, it means we also don't need to lose invert

@atbenmurray atbenmurray mentioned this issue Apr 28, 2023
3 tasks
@atbenmurray atbenmurray self-assigned this Apr 28, 2023
@wyli
Copy link
Contributor

wyli commented Apr 28, 2023

Thanks and agree that adding an option will allow for different use cases, one of the use cases is to achieve fast but less precise results compared with the non-lazy...

{"optimize": "lazy_to_front", another_option: 3"} #json-like

Looks highly extensible

@myron
Copy link
Collaborator

myron commented Apr 28, 2023

hi @atbenmurray I don't understand the issue well

. Noise will undergo spatial distortion that was not intended
. Non-lazy spatial transforms will output the wrong result

which Noise? there is no intensity Noise transform here. or do you mean something else?

--
Also

 "lazy_to_front", "lazy_to_back"
These should physically reorder the transforms for execution.

Will it move all lazy transforms to the front? (then it won't be the right sequence of transforms ,) . If I understand it correctly - to move ALL lazy trasforms to front/back, then it will be applicable only to some special cases (and not in general), since we can't simply re-arrange transforms.

A common transform list I use is following (with Indentities for current "lazy_evaluation" version)

  • LoadImageD
  • CropForeground based on image
  • Indentity(image, label)
  • SpacingD (resample)
  • NormalizeIntensityd #not sure if I need Identity before
  • SpatialPadd (image, label)
  • Indentity(label)
  • RandCropByLabelClassesd (based on label classes)
  • RandomAffine
  • RandomFlip
  • Indentity(image)
  • RandomSmooth
  • RandomIntensityNoise

I don't think we can simply re-arrange lazy (spatial) transforms to the front.


I personally think we should do lazy transform explicit. Anyone who wants to use them must create a separate list of spatial transforms with Compose( spatial_transforms, lazy=True), and then use that new compose_transform as a step between main transforms

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 29, 2023

@myron thanks for the detailed response! In general, people shouldn't have to use this compose option. The primary purpose for it is to work around the situation we have now for the upcoming release. As I understand it, we have benchmarked pipelines that rely on the dev branch behaviour.

The dev branch produces incorrect output in the general case of having lazy and non lazy transforms mixed together, because, unless you use Identity/Identityd (becoming ApplyPending/ApplyPendingd) to enforce ordering, all non-lazy transforms will be executed first.

This decision was taken to improve performance on a benchmark that had a Lambda transform in it (as far as I am aware from the other conversations), but it will break our user's pipelines in the general case. move_lazy_first can't work for the above pipeline but move_lazy_last might.

I'm doing some experiments with your pipeline so that I fully understand the implications

@wyli I guess I really need to get a complete list of the pipelines this behaviour was done for so we can be sure that this fix would sort those pipelines for benchmarking while allowing default behaviour to not break our users' pipelines

@wyli
Copy link
Contributor

wyli commented Apr 29, 2023

I think we just use the option as you suggested to allow for the flexibility in general, the user can choose what to use for their own pipelines. it'll be impossible to have an exhaustive list of use cases.

by the way I disagree with "The dev branch behaviour has a major issue in that unless you use Identity/Identityd (becoming ApplyPending/ApplyPendingd) to enforce ordering", could you rephrase to avoid misleading info?

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 29, 2023

I personally think we should do lazy transform explicit. Anyone who wants to use them must create a separate list of spatial > transforms with Compose( spatial_transforms, lazy=True), and then use that new compose_transform as a step between main transforms

The goal has always been to allow it to work without breaking anything in the general case. That way it is accessible just as a switch that can be flipped at the policy level rather than having to rewrite pipelines. The user is always then free to get very fancy with how they organise their transforms to get the extra benefit, but move_lazy_last, for example, would have that effect.

@atbenmurray
Copy link
Contributor Author

by the way I disagree with "The dev branch behaviour has a major issue in that unless you use Identity/Identityd (becoming ApplyPending/ApplyPendingd) to enforce ordering", could you rephrase to avoid misleading info?

I've modified it to say 'will produce incorrect output'.

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 29, 2023

@wyli I really need to see the pipelines that the dev branch behaviour was intended for. Surely you can point me at them. The flag isn't a fix unless we know it fixes those specific cases.

@atbenmurray
Copy link
Contributor Author

Great! So if we eyeball these and verify that moving all the lazy transforms last is the equivalent of dev branch behaviour, then this is the right modification for the PR and we should be able to avoid needing to re-benchmark

@wyli
Copy link
Contributor

wyli commented Apr 29, 2023

I've modified it to say 'will produce incorrect output'.

I think I understand your point, but my point is that, there are 200+ transforms in monai, even if it's in non-lazy mode, you can easily compose certain sequences of some of them that may not work as you think (especially if you don't read the documentation)... we can't say that the monai codebase is incorrect or has major issue, right?

@wyli
Copy link
Contributor

wyli commented Apr 29, 2023

also, we'll never achieve exactly the same result by design, like the padding modes, output data types (quantisations), and interpolation modes. In most use cases, there's a trade-off for speed/interpolation quality/memory footprint, in my opinion, we just need more documentation about the details and let the user decides what transform sequence to use.

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 29, 2023

I've modified it to say 'will produce incorrect output'.

I think I understand your point, but my point is that, there are 200+ transforms in monai, even if it's in non-lazy mode, you can easily compose certain sequences of some of them that may not work as you think (especially if you don't read the documentation)... we can't say that the monai codebase is incorrect or has major issue, right?

That's why the default behaviour should be as has always been stated in the design and presentations and discussions, i.e. that the ordering of transform execution is unmodified, and that any changes to transform ordering should come from option flags.
That way, the majority of people can always just switch the lazy flag on and have the best chance of it working as expected.

Anyway, our plan is to add the option flag which should allow the benchmarked pipelines to run as discussed, and have the default behaviour be to not modify transform order. There won't be any need to document anything as an issue in docstrings or release notes, because it won't be an issue, just an option that allows you to move all the lazy transforms last if you want to do so.

draft of a docstring fragment for compose options:

option 'move_lazy_last': This option reorders the transforms in a given pipeline so that they are executed after all non-lazy transforms. This can be helpful for performance and image quality, as it results in fewer resampling operations. Note that this option should not be used with pipelines that require ordering to be preserved, such as if you have a mixed sequence of lazy and non-lazy spatial transforms that must be performed in the specified order. If your pipeline uses ApplyPending transforms, they effectively split the list into sub-lists for the purposes of reordering and reordering only happens within the sub-list

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 29, 2023

Ok, so after trying a couple of different interpretations of the dev lazy mode, I know what move_lazy_last means:

So:

'LX' means lazy transform with id X (id doesn't mean anything here, just helps with clarity in the examples)
'NY' means non-lazy transform with id Y
'AZ' means ApplyPending/ApplyPendingD with id Z

Given a list of transforms:

['LA', 'NB', 'LC', 'AD', 'LE', 'NF']

The presence of an ApplyPending splits the transform list into multiple sublists

['LA', 'NB', 'LC'], ['AD'], ['LE', 'NF']

Each sublist is then sorted so non-lazy transforms go first, lazy transforms go next, and then the ApplyPending goes last:

['NB', 'LA', 'LC'], ['AD'], ['NF', 'LE']

Which then concatenates back to:

['NB', 'LA', 'LC', 'AD', 'NF', 'LE']

This precisely mimics the behaviour in dev and makes sense from a user perspective; it is easy to explain that an ApplyPending effectively splits the list of transforms into sublists and schedules each separately.

I've implemented this and I am running a wide selection of tests on it.

@wyli
Copy link
Contributor

wyli commented Apr 30, 2023

On the dev branch there is no reordering of the user-defined transforms with or without lazy resampling, you can turn on the verbose=True to confirm this.
So now I'm not sure why you need reordering to mimic the dev branch behaviour.

If you are talking about the 'applied_operations' #6371 #6439, these are about inversing of a compose using compose.inverse(data). The inverse is not supported and a warning message will be raised on the dev branch. Please see the test case for this use case

assert inverted is None, "invert LambdaD + lazy is not supported"

It has nothing to do with the transform call, that is compose.__call__

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 30, 2023

There is effective reordering on dev. Think about the interaction between pending transforms and transforms executed immediately.

Given the following pipeline on dev:

[Flip, Spacing, Rand3DElastic, Rotate]

Flip is executed lazily

data.pending_operations = [Flip]
executed = []

Spacing is executed lazily

data.pending_operations = [Flip, Spacing]
executed = []

Rand3DElastic is executed non-lazily, without the pending transforms being executed first. This means that Rand3DElastic is the first to change the actual data (non-lazy execution), ie out of order

data.pending_operations = [Flip, Spacing]
executed = [Rand3DElastic]

Rotate is executed lazily

data.pending_operations = [Flip, Spacing, Rotate]
executed = [Rand3DElastic]

Finally, the pending transforms are executed

data.pending_operations = []
executed = [Rand3DElastic, Flip, Spacing, Rotate]

Rand3DElastic is the first transform to be executed on the data; it should have been the 3rd

It doesn't matter that the logging shows that Rand3DElastic was visited in order. It is the first transform that is executed on the data. That is what I mean by effective reordering.

This should not be the default behaviour, because we want users to switch on lazy without it breaking their pipelines.

@wyli
Copy link
Contributor

wyli commented Apr 30, 2023

The default behaviour is no lazy resampling, and the user should read the tutorials and examples before considering enabling lazy resampling.

The definition of 'breaking their pipelines' is unclear here. as I mentioned earlier, the padding mode, interpolation mode, output data type may be different compared with the non-lazy in many use cases. But they are not wrong. In v1.2, to fully leverage lazy resampling and get the optimal outcome, the existing preprocessing transforms should be revisited, instead of simply setting lazy_evaluation=True (it'll either not provide any speedup or not provide exactly the same preprocessing result).

And for your reordering example, the behaviour will be different when it's used with a cache dataset. Again I don't think your reordering is broken, as long as the program's behaviour is consistent with its documentation, then it's a feature not a bug. The user should have all the freedom to implement their preprocessing logic.

So what I'm suggesting is that in one of the option for the option kwargs we keep exactly the current Dev behaviour.

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 30, 2023

On the dev branch there is no reordering of the user-defined transforms with or without lazy resampling, you can turn on the verbose=True to confirm this. So now I'm not sure why you need reordering to mimic the dev branch behaviour.

If you are talking about the 'applied_operations' #6371 #6439, these are about inversing of a compose using compose.inverse(data). The inverse is not supported and a warning message will be raised on the dev branch. Please see the test case for this use case

assert inverted is None, "invert LambdaD + lazy is not supported"

It has nothing to do with the transform call, that is compose.__call__

The example that I showed you above shows that this is a breaking change. If I put Rand3DElastic 3rd, I want it to execute 3rd.

#6371 makes my point for me

To quote:
"for example, the following LambdaD pushes to the applied_operations while pending_operations is trying to combine multiple lazy operations:"

Disabling inverse is a sticking plaster solution that reduces the viable feature set. The actual issue is what is mentioned in the quote, which I'll paraphrase as 'LambdaD is being executed out of order'.

This highlights the importance of discussing changes to the proposed design. #6371 was opened, patched, PRed and merged without you requesting a review from me. If you had, I could have pointed out that it isn't just a problem for inversion, but it will break existing pipelines where the user mixes lazy and non-lazy spatial/cropping and other transforms, and that it shouldn't be necessary to disable the invert in these circumstances. Essentially, all we have to do is ensure that the forward execution order is respected for inverse. We already do this quite neatly for Compose-based subclasses such as RandomOrder.

The full solution is what I have described above, but I'll repeat it here briefly:

  1. Perform the transform reordering as described above
  2. Log the reordered transforms in metadata, in the same way that RandomOrder does
  3. Invert then pops the reordered transform indices and applies them in that order

The following should work fine in the upcoming commit:

xform = mt.Compose(
    [
        mt.Spacing(pixdim=1.2),
        mt.RandFlip(),
        mt.Lambda(lambda x: x+1, lambda x: x-1),
        mt.RandAffine(),
    ],
    lazy=True,
    options={'reorder': 'lazy_after_non_lazy'}
)
r = xform(d)
o = xform.inverse(r)

@wyli
Copy link
Contributor

wyli commented Apr 30, 2023

Perform the transform reordering as described above

This reordering is already ambiguous for cache Vs non-cache use cases, it'll be 'breaking the pipeline' by your definition.

But I don't think it's a bug, it's a feature...the cache + lazy feature interactions are documented.

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 30, 2023

Perform the transform reordering as described above

This reordering is already ambiguous for cache Vs non-cache use cases, it'll be 'breaking the pipeline' by your definition.

But I don't think it's a bug, it's a feature...the cache + lazy feature interactions are documented.

It isn't ambiguous for cache vs non-cache. There are clear rules. Caching will only execute transforms up to the first random transform. It then calls the compose pipeline with those transforms only. The reordering rules can be applied for those transforms. The loaded data is then run with the remaining part of the pipeline. The reordering rules can be applied for those transforms.

Even if this were the case, there are plenty of users who won't be using the caching / persistent mechanisms, so this is an aside from the core points I made above.

It is a bug that

[Flip, Spacing, Rand3DElastic, Rotate]

gets effectively executed as

[Rand3DElastic, Flip, Spacing, Rotate]

This is not what users expect and won't produce the results they intended for their pipeline.

It is also not necessary to disable inversion in the case of

    [
        mt.Spacing(pixdim=1.2),
        mt.RandFlip(),
        mt.Lambda(lambda x: x+1, lambda x: x-1),
        mt.RandAffine(),
    ],

this can be made to work fine.

@wyli
Copy link
Contributor

wyli commented Apr 30, 2023

It isn't ambiguous for cache vs non-cache. There are clear rules.

Agreed, that's what we have on the Dev branch, that's what we have for lazy resampling on the Dev branch.

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 30, 2023

It is a bug that

[Flip, Spacing, Rand3DElastic, Rotate]

gets effectively executed as

[Rand3DElastic, Flip, Spacing, Rotate]

The default behaviour should be that no reordering occurs, because it breaks what the user has defined for their pipeline. Reordering must be opt in.

PR will allow original reordering, reordering, inversion of reordering, for both cache and non-cache.

@wyli
Copy link
Contributor

wyli commented Apr 30, 2023

there are plenty of users who won't be using the caching / persistent mechanisms

I don't think that's a valid/good assumption for monai now and future

@atbenmurray
Copy link
Contributor Author

atbenmurray commented Apr 30, 2023

there are plenty of users who won't be using the caching / persistent mechanisms

I don't think that's a valid/good assumption for monai now and future

I only point that out because you mentioned that rather than addressing the core point and in the previous paragraph I explained that the behaviour is clearly defined for caching / non-caching

I've already shown in previous discussions, presentations and so forth that we can integrate caching into the same mechanism that does things like specifying out of order performance optimisations.

@wyli
Copy link
Contributor

wyli commented Apr 30, 2023

ok, in the interests of time for releasing v1.2, just to reiterate my understanding:

  1. to me the changes that you propose are not fundamentally different from the current dev branch, however, they should be well-documented with docstrings and tutorials, in order to be released in v1.2. (The core reviewers decide whether the proposed changes are 'well-documented' or not)

  2. the current dev branch's lazy resampling has been successfully evaluated with the major use cases that we have, and we observed good speedup with equivalent model training quality. The new change should be able to preserve the same performance. (the core reviewers decide whether the proposed changes are 'preserving the performance')

  3. because of point 1 and 2, we don't describe the current deb branch's API as a major issue or bug in the discussions/documentation. The proposed changes are usability enhancements for the current dev branch.

If you don't agree with any of the above, let's have an offline discussion with the wider team...

@atbenmurray
Copy link
Contributor Author

I'd phrase as the following:

  1. The PR default behaviour is no reordering, as this allows users pipelines to work as intended in the general case
  2. There is a option flag that moves lazy transforms after non lazy ones, constrained by the next apply pending (this should duplicate dev behaviour so we can avoid having to re-benchmark everything)
  3. The release notes / documentation describes the effect of applying the lazy_to_last flag and the circumstances under which it should be used

But yes, in essence, it should be fine

@wyli
Copy link
Contributor

wyli commented May 1, 2023

Sure, that's good. Implementation-wise, do you plan to include all the code changes in PR #6257 and what's the timeline?

If needed I can help run some benchmarks and integration tests with all the testing environments we currently have.

@atbenmurray
Copy link
Contributor Author

Yes, PR #6257 should come across in its entirety. It has the default lazy mode, the lazy_to_last flag mode, and the restored ability to run invert. I hope to get the commit done by afternoon / late afternoon today.

@atbenmurray
Copy link
Contributor Author

I note that in the pipeline

[Flip, Spacing, Rand3DElastic, Rotate]

Rand3DElastic doesn't appear in the applied_operations list. Is this intentional?

If not, I can resolve that while I am at it.

@wyli
Copy link
Contributor

wyli commented May 1, 2023

Theres work in progress #1793 for that, but need a major rework to use the latest API

@atbenmurray
Copy link
Contributor Author

Ok, so it looks like at least some of the pipelines result in moving execution order around on a per key basis. As a result, it won't be possible to have inversion work for that mode without largely rewriting inverse.

Example:

# NX = None lazy transform X
# LY = Lazy transform Y
# AZ = ApplyPending Z
# **i = Transform only operates on key i
[NA, LB, LCx, ADx, NE, LF]

# NA
pending_x: [], applied_x: [NA]
pending_y: [], applied_y: [NA]

# LB
pending_x: [LB], applied_x: [NA]
pending_y: [LB], applied_y: [NA]

# LCx
pending_x: [LB, LCx], applied_x: [NA]
pending_y: [LB], applied_y: [NA]

# ADx
pending_x: [], applied_x: [NA, LB, LCx]
pending_y: [LB], applied_y: [NA]

# NE
pending_x: [], applied_x: [NA, LB, LCx, NE]
pending_y: [LB], applied_y: [NA, NE]

# LF
pending_x: [LF], applied_x: [NA, LB, LCx, NE]
pending_y: [LB, LF], applied_y: [NA, NE]

# final
applied_x: [NA, LB, LCx, NE, LF]  # executes LB before NE
applied_y: [NA, NE, LB, LF]  # executes NE before LB

So I intend to do the following:

  1. Have options=None preserve all lazy ordering
  2. Have options={'reorder': 'lazy_to_end'} that allows for reordering, but done in a way that dictionary pipelines keep a consistent order for all keys. This can be inverted.
  3. Have options={'reorder': 'lazy_to_end_no_invert'} that provides the dev version that breaks the assumption that keys stay in a consistent order. We'll raise a ValueError if you attempt to run inverse on a compose instance with this flag set.

Then for 1.3, I'll have to look at reimplementing the invert mechanism to work with inconsistent ordering on different dictionary keys. It might be that the full lazy implementation already handles this but I'll have to check.

@atbenmurray
Copy link
Contributor Author

The commits for this are now on #6257

@atbenmurray
Copy link
Contributor Author

atbenmurray commented May 3, 2023

@wyli Ok so there is still an issue.
At the moment, on dev, RandCropByPosNegLabel{d} requires that you use ApplyPending{d} on the labels before you call it, but then calls lazily, so it is added onto pending transforms.

Vanilla lazy mode requires that the user does not have to modify their pipeline in order to run lazy=True, so RandCropByPosNegLabel{d} and similar transforms don't execute lazily, forcing all pending transforms to be applied before it executes.

The only way that I can think to handle this is to have transforms such as RandCropByPosNegLabel{d} perform apply_pending on the label_key themselves before executing in lazy fashion.

@wyli
Copy link
Contributor

wyli commented May 3, 2023

@wyli Ok so there is still an issue. At the moment, on dev, RandCropByPosNegLabel{d} requires that you use ApplyPending{d} on the labels before you call it, but then calls lazily, so it is added onto pending transforms.

Vanilla lazy mode requires that the user does not have to modify their pipeline in order to run lazy=True, so RandCropByPosNegLabel{d} and similar transforms don't execute lazily, forcing all pending transforms to be applied before it executes.

The only way that I can think to handle this is to have transforms such as RandCropByPosNegLabel{d} perform apply_pending on the label_key themselves before executing in lazy fashion.

all those cases are warned via this util:

def check_non_lazy_pending_ops(
input_array: NdarrayOrTensor, name: None | str = None, raise_error: bool = False
) -> None:
"""
Check whether the input array has pending operations, raise an error or warn when it has.

although it's easy to change the warning into an automatic apply_pending, I don't think the cropping transforms should have the logic of calling apply_pending internally.

For me, warning people isn't enough. lazy with options=None should never break a pipeline.

I'm also not keen on the idea of cropping transforms calling apply_pending internally. It can't happen anyway because the existence of overrides makes that impossible.

Transforms that require it should be able to indicate that they do through the LazyTrait interface. I'm testing something out that solves the issue in the in_order pipeline

@wyli
Copy link
Contributor

wyli commented May 3, 2023

the root cause is that the crop sampling locations are defined in the image space instead of the physical space. in the end the coordinates should be defined in the original physical space and they should carry the geometric information during the preprocessing then there's no ambiguity.

@KumoLiu KumoLiu added the Design discussions related to the generic API designs label Dec 7, 2023
@Project-MONAI Project-MONAI locked and limited conversation to collaborators Jan 5, 2024
@vikashg vikashg converted this issue into discussion #7370 Jan 5, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Design discussions related to the generic API designs
Projects
None yet
Development

No branches or pull requests

4 participants