Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible bug with zero padding feature and C2R transforms of large size #180

Open
nschaeff opened this issue Jul 4, 2024 · 2 comments
Open

Comments

@nschaeff
Copy link

nschaeff commented Jul 4, 2024

In some cases, for large sizes (more below), using a R2C and C2R transform, I obtain wrong FFT results with the zeroPadding feature in frequency domain. If I remove the zeroPadding feature, everything works well.
Maybe I'm not doing things correctly, the padding for R2C format is not clearly described.

On an AMD MI210, it fails for nfft > 4096 (not all values, though). nfft=6144 fails, for instance.
On an nvidia A30, it fails for larger values, nfft >= 12288 (some smaller values fail too, like 11264).

Steps to reproduce:

nfft = 12288;  // other values fail too, and some smaller values also, like 4094 on MI210, but 4095 works on MI210.
kmax = 1;  // maximum frequency represented. can be almost anything, as long as zero padding is performed.
config.FFTdim = 1; //FFT dimension: 1D, but we use a second dimension to get non-unit strides.
config.size[0] = nfft;
config.isInputFormatted = 1;		// out-of-place: separate buffer for input and output
config.inverseReturnToInputBuffer = 1;
config.inputBufferStride[0] = nfft;		// spatial data layout
config.bufferStride[0] = nfft/2 + 1;	// spectral data layout
config.doublePrecision = 1;    // fails also in single precision
if (1) {   // disable this block for the FFT to work correctly again
  config.performZeropadding[0] = 1;
  config.frequencyZeroPadding = 1;
  config.fft_zeropad_left[0] = kmax + 1;			// first zero element
  config.fft_zeropad_right[0] = nfft - kmax;		// first non-zero element (I tried also with nfft/2+1, does not seem to matter)
}
config.numberBatches = 16;   // this value does not seem to matter
config.performR2C = 1;

Performing FFT in both direction gives wrong results.

@nschaeff
Copy link
Author

nschaeff commented Jul 5, 2024

I suspect it happens in the situation where all the following three conditions are realized together: padding + multiple uploads + R2C

DTolm added a commit that referenced this issue Jul 5, 2024
-Force the callback version of R2C if zeropadding is enabled. The even decomposition version is hard to make work for proper zeropadding, as it is read as complex numbers and won't support odd zeropadding lengths trivially.
@DTolm
Copy link
Owner

DTolm commented Jul 5, 2024

Hello,

Yes, there is an issue with multiple uploads even R2C behaving incorrectly with zeropadding when using an even decomposition algorithm that represents R2C/C2R as a C2C of half-length with some post-processing. It has issues with odd zeropadding sizes as it reads data as complex values.
There is another algorithm implemented that does real transforms as internal callbacks that I enabled to substitute the aforementioned even R2C/C2R algorithm if zeropadding is requested. It uses 2x more temporary memory in multiple uploads, but the size of memory transferred and performance should be similar to the even R2C algorithm.

As for the fft_zeropad_right[0] - the complex output of R2C or input of C2R are assumed to be conjugate, so it should be <= nfft/2+1, all values above will have the same result as nfft/2+1. If chosen < nfft/2+1, the system will have this format: [1,1, 0 ... 0, 1,1] for [0, nfft/2+1) elements, and then the complex conjugate will be read in reverse as: [1, 0 ... 0, 1,1].

Best regards,
Dmitrii

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants