Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to normalize data if I have dataset not for everyday? #14

Open
manapshymyr-OB opened this issue Dec 20, 2022 · 14 comments
Open

How to normalize data if I have dataset not for everyday? #14

manapshymyr-OB opened this issue Dec 20, 2022 · 14 comments

Comments

@manapshymyr-OB
Copy link

Can you please have a look at the last two comments on this issue #12?

I have a dataset with shapes:

(142, 8, 1048576)
(159, 8, 1048576)
(151, 8, 1048576)

How should I normalize such a dataset?

@VSainteuf
Copy link
Owner

hi @manapshymyr-OB ,
If your dataset has time series of varying lengths, you can normalise with statistics of shape C.
So you compute the mean and standard deviation of each channel, on all dates and pixels.
Cheers

@manapshymyr-OB
Copy link
Author

@VSainteuf in that case I will have an array with shape 1xC, right?

@manapshymyr-OB
Copy link
Author

manapshymyr-OB commented Dec 22, 2022

@VSainteuf i think I figured out (I got 8x1048576, because my imgae 1024 by 1024). So for each pixel of each channel, I have mean and std.
Is it okay if my data phase such as shape txcxpixel_size if I am not going to use dates.json?

@VSainteuf
Copy link
Owner

@VSainteuf in that case I will have an array with shape 1xC, right?

Yes

@VSainteuf i think I figured out (I got 8x1048576, because my imgae 1024 by 1024). So for each pixel of each channel, I have mean and std.
Is it okay if my data phase such as shape txcxpixel_size if I am not going to use dates.json?

Yes, but you are not processing the complete image at once right ? You are supposed to crop the image with the polygons of each agricultural parcel in your AOI. Then your dataset will have shape NxTxCxS with N the number of parcels, and S the varying number of pixels in each parcel.

@manapshymyr-OB
Copy link
Author

@VSainteuf in that case I will have an array with shape 1xC, right?

Yes

@VSainteuf i think I figured out (I got 8x1048576, because my imgae 1024 by 1024). So for each pixel of each channel, I have mean and std.
Is it okay if my data phase such as shape txcxpixel_size if I am not going to use dates.json?

Yes, but you are not processing the complete image at once right ? You are supposed to crop the image with the polygons of each agricultural parcel in your AOI. Then your dataset will have shape NxTxCxS with N the number of parcels, and S the varying number of pixels in each parcel.

I have the Planet dataset, which is already cropped by geometry

@VSainteuf
Copy link
Owner

OK I'm not sure what the question is anymore, let me know if you need clarification on one point!

@manapshymyr-OB
Copy link
Author

I am still confused regarding the normalization shapes. Now I have samples with different sizes in terms of the temporal, while the same for channel and pixel size (they are the same because I translated them into 1024X1024), so >> Tx10x1048576. I am trying to normalize channel-wise, and not sure about the shape of the mean array. I am concatenating all npy-s into one so it will be (Tx10x1048576) and calculate the mean for each channel. Would it be the size of (10, )? I

@VSainteuf
Copy link
Owner

yes if you have time series of varying length the best option is to compute the channelwise mean across all samples and dates. So you end up with mean and std of shape (C,)

@manapshymyr-OB
Copy link
Author

manapshymyr-OB commented May 31, 2023

yes if you have time series of varying length the best option is to compute the channelwise mean across all samples and dates. So you end up with mean and std of shape (C,)

I am getting this error:
stack expects each tensor to be equal size, but got [34, 10, 64] at entry 0 and [46, 10, 64] at entry 1. Is the reason for this error that the dataset does not have the same temporal resolution? 1.npy may have 39 X 10 X 1048576, while the 2.npy 50X 10 X 1048576. Is this fine? If not how should I process these? Should I try https://github.com/VSainteuf/utae-paps this one?

@manapshymyr-OB
Copy link
Author

I am encountering this error during the validation step (I would not ask this if I got this during the training too...).
Error:
File "/home/adminko/PycharmProjects/pytorch-psetae/models/pse.py", line 134, in masked_mean out = out * mask RuntimeError: The size of tensor a (29) must match the size of tensor b (10) at non-singleton dimension 1
So then I printed shapes and got this during the training:
Out shape torch.Size([64, 87, 64])
Mask shape torch.Size([87, 64])
Out shape torch.Size([64, 87, 64])
Mask shape torch.Size([87, 64])

and during the validation:
Validation . . .
Out shape torch.Size([64, 29, 64])
Mask shape torch.Size([29, 10, 64])
What can be the reason?
So my sample size is 29x10x1048576 and mean size is 10, .

@VSainteuf
Copy link
Owner

are you giving different arguments to the train and val data loaders ? Any idea why the behaviour is different between train and val ?

@manapshymyr-OB
Copy link
Author

are you giving different arguments to the train and val data loaders ? Any idea why the behaviour is different between train and val ?

They are exactly the same. I found out that during the training the shape of the a tensor is 5, while on val it equal to 2. IDK why.... For now, I made a strange workaround...
if len(a) == 2: out, mask = a extra = b if len(extra) == 2: extra, bm = extra else: out, mask = a, b
here I am applying out, mask = a, b for both cases... it is executing...

@manapshymyr-OB
Copy link
Author

Any idea how to work with varying sizes of unordered temporal data?

@manapshymyr-OB
Copy link
Author

Any idea how to work with varying sizes of unordered temporal data?

@VSainteuf any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants